How Linked Data Solved A Digital Age Marketing Problem

Founder of Apiwise and 15-year veteran of the API space, Dimitri van Hees began his APIcon talk by putting this up on the screen: 0641011744. Is it a number? A registration number? A New Zealand phone number? Maybe a flight number? Perhaps the new coordinates to get back to the island of Lost? Last ten digits of a bank account? Dutch people would immediately recognize it as a phone number. But what about the other 7.019 billion people in the world?

Digits like letters and the words they make up need context and meaning for both humans and machines to make sense of them.  Linked Data, which makes up The Semantic Web or Web 3.0 looks to be the common way to define what data means and that everything is just the same, providing links to external resources like apps and even adding the possibility to query multiple datasets.

This piece looks to help you define what is Linked Data, what it means for your business, and to give you a case study to see it in action.

What is Linked Data?

Manu Sporny points out how the Internet has provided us with an infinite amount of data from spreadsheets to videos to images to the websites that often bring them all together, linking from one thing to another and allowing us to constantly discover more information. However, this way of connecting everything—which is somewhat human-friendly—isn’t machine-readable. It has become the job of the developer to identify these things— such as name, location, mood—in a way computers can understand. But then you come across the problem of many different formats used to express data on the web, and you need to come up with a way to link all this disparate information together. This is why you relate it so each property (like name) gets related to a value (like Bob.)

The next step is to link all of this information together, allowing the computer to recognize relationships between data sets. Like with Google’s Knowledge Graph and Facebook’s Open Graph Protocol, you can create a graph of information where each node is something you are talking about, with more information being shared and potential connections made between these nodes and across websites.

Sporny explains that “So Linked Data is data expressed on a website that can traverse via links to other websites. And that means that the Internet or the Web becomes this global information repository,” where you can ask the Internet questions like “Who is Bob’s parent?” and it will be able to start on one website, follow the link to another website, and find the answer to a question for you.

However, how would the computer know which Bob is Bob? The URL acts as a universal ID mechanism, where Bob’s URL is connected via a “parents” URL to Larry, Bob’s parent. Similarly, with Google and Facebook’s graphs, it goes through information found on a page to explain to the computer what a page is about.

To learn more about Linked Data and how it is used and created on the web, Sporny recommends checking out the information presented on JSON-LD and RDFA.

And now that we know what it is and what it can be, let’s jump right into a case study to see Linked Data in use!

Linking Linked Data and Berners-Lee’s Five-Star Model for Open Data

Van Hees referenced Tim Berners-Lee, inventor of the Web and a Linked Data initiator, and his Five-Star Model for Open Data:

  1. Make your stuff on the web available under whatever format under an open license.
  2. Make your data available in a structured way and machine-readable. (Excel instead of a faxed copy.)
  3. Use non-proprietary formats.
  4. Use URIs to denote things so that people can point at your resource description framework standards.
  5. Link your RDF and all data to other data to provide context.

Van Hees analyzed the foundation for modern open data by considering the open data needs and expectations of the target audiences:

  • Who are a part of the Linked Data landscape? The open data community, the Linked Open Data community, the API community, data publishers and data consumers.
  • What do the data consumers demand? Provide us with the developer-friendliest way to access your data—like via API—and we might just use it.
  • What do data publishers want to know? How should we publish our data, and what are the costs and benefits of doing it this way?
  • What matters to the open data community? It doesn’t matter how it’s published. We are satisfied with it just being open.

Van Hees was a part of the Dutch Linked Open Data Program, which brought together universities, governments, public-private partnerships, and Freshheads, the place he was working as a data specialist at the time and one of the very few commercial parties. Beyond this, many data communities joined the cause: the Open Data Community, the Linked Open Data Community, the API community, and data publishers and consumers.

Van Hees says that while governments like the PR of open data, it’s simply a high investment in time and IT infrastructure and the benefits and return on investment aren’t often understood. Add to this the challenge that the vast majority of government data is exposed via .CSV. “As a result, the quality of most data sets are at best three stars,” he said.

“I don’t believe in the five stars of Tim Berners-Lee. As a developer, it’s not very useful. But you can add a sixth star,” which he places after the third star of the model. “Provide online access via web services so developers can use your stuff the way they are used to—via a RESTful JSON API. If you go from a dot-CSV to a simple JSON API, it’s quite easy—especially with the tools out there—to transform a CSV to a web service. So in that way, you can already see if people actually do want to use your data or not.”

He says that then, if folks are interested, you can take on Berners-Lee’s fifth (now sixth) star by adding context to your data so it can be linked to other data sets in a way that developers know what that data means and Linked Open Data advocates can convert it to triples, store it in triplestores, and use SPARQL (the standard query language for querying the triplestores behind the Semantic Web) if they still want to. JSON-LD is the specification for Linked Data that makes this possible. 

As shown in the diagram below, triples are one of the fundamental building blocks of Linked Data and the Semantic Web. They are comprised of a subject, an object, and a predicate. The predicate adds context by describing the relationship between the subject and the object. Alone, triples are of limited value. But when chained together (as shown below) in such a way that the object of one triple can be the subject of another triple, thereby linking two or more triples together, it isn’t hard to imagine how private triples (internal to an organization) can be chained to publicly available triples (on the Web) to form significant, highly crawlable, graphs of data.  For example, crawling the graph of Linked private and public data below, it’s relatively easy to discover that Kyle lives in the state of Massachusetts which is governed by Charlie Baker, a member of the Republican (GOP) Party.
linked data semantic web diagram
The majority of the information that an organization can gather from a graph like this resides on the Web instead of within the organization’s systems. At the bare minimum, Linked Data eases the burden on companies to collect and maintain huge sets of data within their own databases.

“The Semantic Web does exist and goes hand in hand with APIs. Using six stars instead of five, APIs are actually part of the deployment scheme while making life easier for data publishers and data consumers. Let’s bridge the gap, bring the best of both worlds, and let’s change the Web together,” van Hees said.

Jennifer Riggins Writer, marketer and luddite in a technical world. Obsessed with helping tech and startups sell their value to us laypeople, improve efficiency, management practices, and message. Learning something new and laughing every single day.

Comments

Comments(2)

pperera

Hi Jennifer, Thank you for sharing these insights. I would only comment that as this thinking spreads we risk overlooking an important fundamental design. Instead of having "Kyle", "John" and "Charlie" directly linked to each other or other parties and linked to, say, a location, we first reify the context in which parties are related to each other, to locations and for that matter to any other data.

In other words, Kyle is only connected to John in a specific context and maybe not any other context. Basically, we need to abstract context so that a party, like, "Kyle" can have multiple contextual identities, where each contextual identity has its own set of connections and data relationships.

We need a party node and separately connected role (context) nodes. The combination of a party node and a role (context) node defines contextual identity and a single party (person, organization or group) can have multiple contextual identities.

If we don't do this now, at the pioneering stage we are currently at with modeling these relationships, we will suffer the same data, context and connection integration challenges enterprise systems currently suffer, because most enterprise systems represent a party in a single role as one entity. We need to separate them.

Here's a simple example of what I am saying.

A common analogy is to model actors and their movies. PAUL NEWMAN ---> BUTCH CASSIDY AND SUNDANCE KID and ROBERT REDFORD --->BUTCH CASSIDY AND SUNDANCE KID.

Here's the problem. Both parties are more than just actors. They are also producers, directors and writers. We need to model the links so that we have one PAUL NEWMAN and one ROBERT REDFORD. Then we create three role nodes: ACTOR, DIRECTOR, WRITER and/or whatever and link the movie to the party in a role. You will find that in practice, their location (address), for example, may actually vary with their role.

This is fundamentally important as all party and data relationships correspond to a single party in a single role, and not necessarily other roles. We should not make the roles predicates as we now do: "produces" vs "producer"; "directs" versus "director" and "writes" versus "writer".

If we do not start thinking along these lines, we really miss out on even more revealing insights burried in network connections. If interested, check out a recent article that speaks to this. ow.ly/TGixE

- Peter Perera  www. Perera-Group. com

dvh

Hi Peter,

I don't really understand your concern. The issue Linked Data solves is exactly this. With Linked  Data, there is only one Robert Redford. This 'entity' has got multiple relationships with other 'entities'. Besides of being an actor and producer, Robert Redford might also be a husband, a football fan, a father, a neighbour, etc. So in this situation, Robert Redford "has professions" actor, director and writer, while the same Robert Redford "acts in" multiple movies. Keith Richards on the other hand, "acts in" Pirates of the Carribean while he "has profession" musician. As for the addresses, it also depends on the relationship. Our company has a visiting address and a postal address. In our case these relationships link to the same address entity but it doesn't have to.

I hope this clarifies things a bit more :-)

Kind regards, Dimitri van Hees