Founder of Apiwise and 15-year veteran of the API space, Dimitri van Hees began his APIcon talk by putting this up on the screen: 0641011744. Is it a number? A registration number? A New Zealand phone number? Maybe a flight number? Perhaps the new coordinates to get back to the island of Lost? Last ten digits of a bank account? Dutch people would immediately recognize it as a phone number. But what about the other 7.019 billion people in the world?
Digits like letters and the words they make up need context and meaning for both humans and machines to make sense of them. Linked Data, which makes up The Semantic Web or Web 3.0 looks to be the common way to define what data means and that everything is just the same, providing links to external resources like apps and even adding the possibility to query multiple datasets.
This piece looks to help you define what is Linked Data, what it means for your business, and to give you a case study to see it in action.
What is Linked Data?
Manu Sporny points out how the Internet has provided us with an infinite amount of data from spreadsheets to videos to images to the websites that often bring them all together, linking from one thing to another and allowing us to constantly discover more information. However, this way of connecting everything—which is somewhat human-friendly—isn’t machine-readable. It has become the job of the developer to identify these things— such as name, location, mood—in a way computers can understand. But then you come across the problem of many different formats used to express data on the web, and you need to come up with a way to link all this disparate information together. This is why you relate it so each property (like name) gets related to a value (like Bob.)
The next step is to link all of this information together, allowing the computer to recognize relationships between data sets. Like with Google’s Knowledge Graph and Facebook’s Open Graph Protocol, you can create a graph of information where each node is something you are talking about, with more information being shared and potential connections made between these nodes and across websites.
Sporny explains that “So Linked Data is data expressed on a website that can traverse via links to other websites. And that means that the Internet or the Web becomes this global information repository,” where you can ask the Internet questions like “Who is Bob’s parent?” and it will be able to start on one website, follow the link to another website, and find the answer to a question for you.
However, how would the computer know which Bob is Bob? The URL acts as a universal ID mechanism, where Bob’s URL is connected via a “parents” URL to Larry, Bob’s parent. Similarly, with Google and Facebook’s graphs, it goes through information found on a page to explain to the computer what a page is about.
And now that we know what it is and what it can be, let’s jump right into a case study to see Linked Data in use!
Linking Linked Data and Berners-Lee’s Five-Star Model for Open Data
Van Hees referenced Tim Berners-Lee, inventor of the Web and a Linked Data initiator, and his Five-Star Model for Open Data:
- Make your stuff on the web available under whatever format under an open license.
- Make your data available in a structured way and machine-readable. (Excel instead of a faxed copy.)
- Use non-proprietary formats.
- Use URIs to denote things so that people can point at your resource description framework standards.
- Link your RDF and all data to other data to provide context.
Van Hees analyzed the foundation for modern open data by considering the open data needs and expectations of the target audiences:
- Who are a part of the Linked Data landscape? The open data community, the Linked Open Data community, the API community, data publishers and data consumers.
- What do the data consumers demand? Provide us with the developer-friendliest way to access your data—like via API—and we might just use it.
- What do data publishers want to know? How should we publish our data, and what are the costs and benefits of doing it this way?
- What matters to the open data community? It doesn’t matter how it’s published. We are satisfied with it just being open.
Van Hees was a part of the Dutch Linked Open Data Program, which brought together universities, governments, public-private partnerships, and Freshheads, the place he was working as a data specialist at the time and one of the very few commercial parties. Beyond this, many data communities joined the cause: the Open Data Community, the Linked Open Data Community, the API community, and data publishers and consumers.
Van Hees says that while governments like the PR of open data, it’s simply a high investment in time and IT infrastructure and the benefits and return on investment aren’t often understood. Add to this the challenge that the vast majority of government data is exposed via .CSV. “As a result, the quality of most data sets are at best three stars,” he said.
“I don’t believe in the five stars of Tim Berners-Lee. As a developer, it’s not very useful. But you can add a sixth star,” which he places after the third star of the model. “Provide online access via web services so developers can use your stuff the way they are used to—via a RESTful JSON API. If you go from a dot-CSV to a simple JSON API, it’s quite easy—especially with the tools out there—to transform a CSV to a web service. So in that way, you can already see if people actually do want to use your data or not.”
He says that then, if folks are interested, you can take on Berners-Lee’s fifth (now sixth) star by adding context to your data so it can be linked to other data sets in a way that developers know what that data means and Linked Open Data advocates can convert it to triples, store it in triplestores, and use SPARQL (the standard query language for querying the triplestores behind the Semantic Web) if they still want to. JSON-LD is the specification for Linked Data that makes this possible.
As shown in the diagram below, triples are one of the fundamental building blocks of Linked Data and the Semantic Web. They are comprised of a subject, an object, and a predicate. The predicate adds context by describing the relationship between the subject and the object. Alone, triples are of limited value. But when chained together (as shown below) in such a way that the object of one triple can be the subject of another triple, thereby linking two or more triples together, it isn’t hard to imagine how private triples (internal to an organization) can be chained to publicly available triples (on the Web) to form significant, highly crawlable, graphs of data. For example, crawling the graph of Linked private and public data below, it’s relatively easy to discover that Kyle lives in the state of Massachusetts which is governed by Charlie Baker, a member of the Republican (GOP) Party.
The majority of the information that an organization can gather from a graph like this resides on the Web instead of within the organization’s systems. At the bare minimum, Linked Data eases the burden on companies to collect and maintain huge sets of data within their own databases.
“The Semantic Web does exist and goes hand in hand with APIs. Using six stars instead of five, APIs are actually part of the deployment scheme while making life easier for data publishers and data consumers. Let’s bridge the gap, bring the best of both worlds, and let’s change the Web together,” van Hees said.