There’s perhaps no hotter buzzword in our industry—and therefore no hotter hashtag—than big data. Not just big data, there’s open data versus proprietary data and then there’s Linked Data which makes up the Semantic Web or Web 3.0 which is the way to structure and interlink it all for meaning. This is good because, particularly with the Internet of Things, the scope of big data is as wide and seemingly unending as the universe itself.
And big data means value. It gives businesses, cities and even countries competitive edge over one another, offering insights into real users and citizens and how they relate to each other to create a more accurate “big picture” than we could ever have imagined before. While it’s more fun to talk about the gadgets, robots and drones of IoT, it’s truly the big data opportunity, and how companies leverage it (see video at the end of this article) that will drive this interconnected trend into our factories, our homes, our cars, our schools and our hospitals. In the next few years, it won’t be important that everyone has 50 devices and it won’t even be so important that those devices will produce 50 times more data. What will matter is how you make sense of that data.
There’s no doubt that from a developer’s standpoint, the breadth of big data can seem both exciting and overwhelming. And there’s a dizzying number of big data analytics tools to help us interpret and connect for sensible linked data. In fact, with no right tool, it may seem better to just create your own. We do not recommend that. Instead of starting from scratch, today we offer you some tips on how to take advantage of existing Linked Data networks to make sense of this world of data and the people living in it.
Founder of Apiwise and 15-year veteran of the API space, Dimitri van Hees said that, beyond these buzzwords, big data is really all about three things: Quality, Quantity and Price. He went onto mention that Privacy is an obvious fourth, but that was outside the realm of his APIcon talk.
How do you judge the Quality of Data?
- Is it machine-readable?
- Is it accessible via web services?
- Is it up-to-date?
- Is it verifiable?
How do you judge the Quantity of Data?
- Can we store the data?
- Can we process it? Is it processible and do we have the power to compute it all?
- Is it a representative cross-section of data to mean anything?
How do you judge the Price of Data?
- What does it cost?
- Do we need an API key? (Then not called Open Data)
- Are we allowed to use it at all?
- Can we use it for commercial use? (Even more important after the Supreme Court didn’t take up Google v. Oracle.)
Van Hees finds that folks are often overwhelmed by the idea of big data — “I can’t do that. I don’t have access to that. I’m not Google.” His work at Freshheads involved working with clients to actualize any online ambitions, including taking advantage of big data. And he’s found that just about everything is possible, and new big data sets are being released every day.
Big Data: There’s an App for That
He offers the following tools that allow you to access large big data sets like Wikipedia for free or almost free, which you can then combine with your own data. Both alone and combined with your own accrued data, these tools allow you to reach the optimal level of Linked Data to finally make sense of it all:
DBPedia: the crowd-sourced, Linked Data version of Wikipedia, in an effort to extract structured information from the world’s encyclopedia:
- Not very up-to-date, around twice a year synchronized with Wikipedia
- But a free RDF download
- SPARQL endpoint for route access to the database [dbpedia-owl:wikiPageID]
- DBPedia lookup API
- “In my opinion, Wikipedia is not very stable, but it’s data out there,” van Hees said.
Dandelion API: Semantic text analytics as a service:
- Semantic APIs like this help you make Like Queries for similar semantic context
- 1,000 free requests per day
- Provides links through DBPedia.
Gruff: a free, downloadable grapher-based triple-store RDF browser:
- another semantic API
- free and you can connect SPARQL endpoints without having to know anything about the SPARQL query language
- visualizes the relationship between data, including how they are connected and similarities based on Wikipedia information.
Three awesome ways to collect, refine and organize data better
Make refining big data a group activity: Bond together with your fellow data collectors in a public effort to publish more open data or at least to share it in non-competing, mutually beneficial situations. Choose to use the same Linked Data tool so everyone has more refined access. And have data mining parties and hackathons that work to clean up open data.
Linked Data only works if it’s consistent: LOD2 talks often about how, in order for Linked Data to work, it needs to have standard semantic formats like RDFs [resource description framework] and reusable vocabulary, as well as visualization and mashup tools like above. Only then can it not only make sense but be machine readable too.
Or you can do it all yourself! Van Hees warns you to be inventive but don’t reinvent the wheel. If you lack some sort of data, get it yourself by putting sensors or beacons out or by running a social campaign requesting information. The idea is to get as creative as you can while taking advantage of all the linked data already available to you.
What ways have you taken advantage of all the data that’s rushing our way? Comment below or tweet your tricks to us @ProgrammableWeb!