Why Linked Data Is A Major Theme At APIcon In London

This past May, the subject of linked data made a showing at ProgrammableWeb's API conference in San Francisco and with this next iteration of APIcon in London from Sept 24-26 2014, we're giving the topic even more coverage. The sessions and workshops are not to be missed and if you have not already registered, be sure to visit the APIconUK Web site.

Oh, the WWW? I can do better.

Let's face it. Inventing the World Wide Web is a pretty tough act to follow and ever since Sir Tim Berners-Lee invented it, the rest of the industry has been very busy extending it and stretching the core technology to its very limits. But as much as the technology industry took Berners-Lee's baby, ran with it and continues to run with it, he has long believed that the Semantic Web is an even bigger idea. Though I haven't spoken to him recently, every discussion that I've been lucky enough to have with him has come back to the idea of the Semantic Web. 

So, what is the Semantic Web? The Wikipedia's entry does a pretty good of summarizing the Semantic Web's foundational concept

Humans are capable of using the Web to carry out tasks such as finding the German translation for "eight days", reserving a library book, and searching for the lowest price for a DVD. However, machines cannot accomplish all of these tasks without human direction, because web pages are designed to be read by people, not machines. The Semantic Web is a vision of information that can be readily interpreted by machines, so machines can perform more of the tedious work involved in finding, combining, and acting upon information on the web.

In English, what this means is that we humans are pretty good at sussing context out of the Web. We know how to find pages based on their content, often starting with search engines, and we can make pretty accurate assumptions from the data and links that appear on most Web pages. For example, just two paragraphs ago, I linked to an entry in the Wikipedia. You can assume based on the context of the link and what you know about the Wikipedia that you're going to find a detailed description of the Semantic Web when you click on that link. If there's data on the page, like the data we keep for the thousands of APIs in our API directory, you can use the labels on the page to understand what that data represents (like the name of an API). Unfortunately, to the extent that most Web pages contain data and links, most computers (machines) are incapable of making the same assumptions. Theoretically, this makes us humans do more work where the computer could be doing that work for us.

Even worse, where our API directory has some information about an API (like any articles that have been written about it) and the API provider's developer portal for that same API has other information, the machines involved in storing and presenting both have no way of knowing that the two sets of information actually go together and belong to the same API.  

A typical ProgrammableWeb API profile includes a bunch of data connections (aka links). As the chief engineer of ProgrammableWeb's current data model, I can tell you that those connections were either provided by a human, or, as in the case of an API's categories, automated with logic that was written, debugged, and maintained by a human. My guess is that 99.99 percent of the world's Web sites are built this way. 

Imagine if those links didn't require hand entry. Imagine if the logic never had to be written, debugged, or maintained. Imagine if the words, numbers, and data could, without significant programming, find their way to other related words, numbers, and data in such a way that enables an entirely different user experience; one that brings that related data to the user instead of the user having to go to the data. Imagine the time, resources, and ultimately money saved.

An example of this user experience can be seen on Google today. Use Google to search for chicken cacciatore and the search results appear to be enhanced by all sorts of information that you'd normally have to click to get (see screenshot below). This includes ratings, the number of reviews, and the preparation time for many of the listed recipes. Off to the right are images and text, pulled in from Wikipedia, that define "cacciatore." Here on ProgrammableWeb, if we supported this idea of "linked data," we could create a user experience where just mousing over the phrase "chicken cacciatore" (where it appears in the first sentence of this paragraph) would pop-up a small window that contains the same (or other) information found on Google's search results page, all driven by data found on other sites scattered across the Web. 

Berners-Lee envisioned a Web where all sites inherently included this capability. A Web that involved far less effort, time, and expense for both Web site producers and users. A Semantic Web. On social networks, this idea of starting with one piece of data (i.e.: a user of Facebook) and finding your way to other data (that user's friends, and then their friends, and who they work for and where they live) is often referred to as a social graph. A social graph is an example of a data graph and the foundational element of a data graph is something called a triple. "David is a friend of Wendell" is a triple. It involves two objects (David and Wendell) and the explanation of the relationship. In true Semantic Web vernacular, "David" is the subject, "is a friend of" is the predicate, and "Wendell" is the object. When linked together (David knows Wendell who knows Kevin and so on..), triples form the basis of graphs.

Berners-Lee's Semantic Web and the set of technologies that drove it -- things like RDF, the SPARQL query language, and triplestores (where triples are stored and retrieved) -- solved the key problem of how to fashion and query a data graph. One that, unlike today's social networks, inherently works across sites. For example, I shouldn't have to tell Facebook where I work. Facebook should inherently know where to find my work profile on LinkedIn (without me implicitly telling Facebook where to find that profile), and should be able to present that information to someone viewing my Facebook profile. Both sites could list all of my friends, whether they are connected to me on Facebook, LinkedIn, Twitter, or Google Plus.

Despite Berners-Lee's constant stumping, it has been tough for the Semantic Web to rise above the noise of the first Web, especially when marketers with no relation to him or the World Wide Web consortium (W3) start tossing terms like Web 2.0 and Web 3.0 into the mix (propagandists who are far better at marketing than the W3 will ever be). Open platform-level technologies also depend on a strong mix of tools to ease the pain of all stakeholders; especially developers. With the lion's share of the opportunity perceived to be in extending the existing Web, the toolmakers have focused the majority of their resources on keeping pace (much to the detriment of the Semantic Web). As, the pace of innovation drove towards increasingly lighter-weight ways for producing richer user experiences on existing Web technology, it would be an understatement to say that the collection of technologies represented by the Semantic Web has gotten and continues to get a lukewarm reception. 

That sentiment is reflected in a post by Digital Bazaar CEO Manu Sporny who, after having survived a cycle of standards setting at the W3 that involved RDF and the W3's RDF Working Group, wrote:

...to work with RDF you typically needed a quad store [which stores the name of the graph, in addition to the typical data kept in a triple store], a SPARQL engine, and some hefty libraries. Your standard Web developer has no interest in that toolchain because it adds more complexity to the solution than is necessary.

All is not lost

Whether you're a site operator, a developer, or a Web user, if the benefits of machine readable graph-based data structures appeal to you (and they should), the good news is that they're no longer tied to the adoption of RDF and SPARQL. In my mind (and maybe I'm being naive), there's the Semantic Web, and the semantic web. Whereas the Semantic Web is about data graphing with RDF, triplestores, and SPARQL, the generic semantic web is technology-agnostic. It shouldn't matter how triples are implemented or what technology is used to crawl graphs of linked data. In the end, what matters are things like great, efficient user experiences and dramatically lowering the cost of achieving them -- the sort of vision embodied by Berners-Lee's Semantic Web. 

Fortunately, as somewhat exemplified by the chicken cacciatore example, the core principles of the Semantic Web and machine readable/crawlable graphs of linked data are not just alive and well, but are on course to meet the needs of today's modern Web and mobile app developers [Sidebar to all you Web site operators with SEO sensitivities: the various search engines appear on track to give linked data increasingly more weight in their rankings].

According to Sporny, with over two decades of research and development having gone into the semantic web (the generic one), there have been other attempts to deal with the Semantic Web's overhead. Microformats was one of them. But it wasn't until the W3's recent work on a Web payment standard that the need for more developer-friendly graphs of linked data became an imperative. 

Via telephone, Sporny told me "We needed a way for building payments and identity into the core of the Web. Something that could handle items for sale on the Web, the ability to transact, digital receipts, and the identities of buyers and sellers." It was almost a problem that was tailor-made for machine readable graphs of linked data to solve. But with developer adoption being key, and developers having resisted the traditional Semantic Web toolset, the W3's Web Payments Community Group needed something that spoke the native tongue of today's developers. Whereas the traditional Semantic Web toolset requires learning an entire new domain of Web development and changing today's common development flows, Sporny says the Web Payments WG was after something "that was based on how the Web is being built today." Something where, according to Sporny, developers could "realize online identities and the ability to transact with one another in a peer to peer fashion."  

In terms of representing and serializing data, there was really only one answer: JSON. JSON, as many Web developers know (especially ones whose applications consume APIs), has evolved into the lingua franca of Web-based inter-process/machine communication of data (queries, results, etc.). JSON (Javascript Object Notation) is the native data format for Javascript, arguably the world's most popular Web development language (between browser-side Javascript and server-side Node.js) and pretty much every non-Javascript Web development language that matters now enjoys JSON support (either natively or through third-party and open source frameworks). 

But given how standard plain vanilla JSON is ill equipped to handle linked data, the W3 needed to come up with a standard adaptation that supported semantic web principles. The result, ratified as a W3 "recommendation" earlier this year, is JSON-LD 1.0  ("LD" stands for LInked Data). Of JSON-LD, Sporny, who co-authored and co-edited the specification, told me "By design -- it was no accident -- JSON-LD is designed specifically to not scare other people away from the semantic web. We started thinking of Web developers and their existing reliance on JSON and how to work linked data into their standard web development practices."

For believers in the semantic web (and the Semantic Web), the timing of JSON-LD could not have been better. JSON has the wind of the API economy at its back. As more API economy stakeholders begin to realize how the benefits of linked data (including SEO benefits) can be had without significant disruption to their existing Web development practices, the astronomic growth of the API economy could end up accelerating widespread adoption of semantic web principles.

This is why linked data is one of the main themes of APIcon in London. No matter where you sit in the API economy, it's time to get smart about linked data. If you're an API provider, it's time to get smart about what linked data means to you, how you can better prepare your organization for a linked data future, and how to rally your partners, your industry, and adjacent industries to the cause. If you're a developer, it's time get smart about using APIs to consume linked data, to understand the sorts of experiences you can build with it, and why to push back on your API providers to rethink their offerings in terms of semantic web principles.

To meet that need, we have four great sessions at APIconUK. Two of them are led by one of Sporny's JSON-LD 1.0 Specification co-authors/editors Markus Lanthaler and the other two are led by Freshheads Technical Architect Dimitri Van Hees. Lanthaler will first offer a session that explains the hows and whys of semantic web and linked data principles. Then, he'll be giving a workshop on how to use Hydra -- a solution he invented for creating and consuming JSON-LD-based APIs. Meanwhile Van Hees is offering a conference session that reflects on the realities of JSON-LD deployments that he has been involved and a developer workshop on how to make the most of existing data sources on the Web (especially ones like dbpedia.org that participate in the linked data economy).  Lanthaler is also one my handpicked ProgrammableWeb Innovation Showcase (Sept 26) presenters.

It doesn't matter who you are; a developer, a Web site operator, an API provider, or a user of the Web. Lanthaler and Van Hees are helping to lead a revolution that will eventually overtake us all. If you are in or around London at the time of APIconUK (Sept 24-26, 2014), their sessions are ones that are not to be missed. If you are already registered for APIconUK, I look forward to seeing you there. If you are not registered, what are you waiting for? Go to the APIconUK web site and sign up. Can't afford the modes price of entry? Let me know at the email address below and we'll work something out!

See you there.

David Berlind is the editor-in-chief of ProgrammableWeb.com. You can reach him at david.berlind@programmableweb.com. Connect to David on Twitter at @dberlind or on LinkedIn, put him in a Google+ circle, or friend him on Facebook.




Great post, David!

I'm a huge supporter of the semantic web concept, so this is all goodness in my mind, but I'd love to hear your thoughts on processing semantic data at some point. There are two sides to the semantic web, right? Encoding/providing data and decoding/using it. It's great that we're on track to adopt more semantic principles, and that linked-data is becoming something folks are thinking harder about, but until we have a strong approach for clients understanding and processing that linked (and even harder, true semantic) data, I don't see us getting really broad adoption.

Today, unless I've missed a big development, we don't have a good (or general) approach to understanding and working with it, or even great reccomendations on how to handle the processing of semantic information within our traditional approaches to application development. What follows is that clients aren't truly leveraging the opportunity provided, and so the incentive is much, much lower for API and website providers to go all-in.

Some big companies with a lot of really smart people are getting closer, but until we have a framework, or at least great resources, on how to approach this I think it'll be a long and painful road.


I don't want to advertise here, but I think we've found a way to both provide and consume Linked Data in a developer-friendly way and I'd like to share that during my talk ;-)

Slides (and I believe video) will be up after the conference if you can't make it.


Excellent.  Thanks for replying Dimitri! @Comptly's comment does speak to the deeply prescriptive and 360 degree path that we have to write about on ProgrammableWeb in order for our readers to really understand the ins and outs of a fully operational Linked Data stack (including the APIs) and the applications that consume them.

Also, to @Comptly's point, there's definitely a chicken and egg problem. The Google example (chicken cacciatore) doesn't adequately convey the full potential of linked data.  This creates a challenge because there are a limited number of proofs of concept to provoke the Aha! moment for API providers who must be talked into offering LD. As @dvh will tell you, the uncertainty means that API providers will want simultaneously support non-LD (um, legacy?) formats.  

IMHO, we need more API providers (including ProgrammableWeb) to take the LD leap of faith and we also need to see more activity at the ecosystem level in hopes of developing very robust graphs.  For example, were ProgrammableWeb to go the LD route,  we still need others in the API ecosystem to get on board, or "our" graph could dead-end prematurely (admittedly, I'm getting out of my depth of field and this could thankfully be a misconception). 

Then, we need incredibly innvoative developers (we KNOW you're out there) to, well, innovate; blow our minds with LD-powered apps that prove the benefits of LD.  But just as a reminder, the benefits don't just surface in the applications. They also surface at the API provider level because the burden to create and store as much data as originally anticipated is lifted since that data may exist elsewhere on the graph.

Today, that problem is solved to some extent through aggregation of APIs... if all I need is the data.  Where LD (and the Semantic Web) come in is the context (as in the "is friends" in "David is friend of Wendell"). That context doesn't naturally exist when aggregating.



And the real challenge is in marketing. Or data science. Most data analysts are good in doing their job on structured data, writing SQL queries to get insights in their data. Now when we are able to convert our data to linked data, there are no technical limitations left to create selections, relational queries or use external (open) data in the selection. Kind of horizontally scaled big data. Just like big data, it will be a full time job to know where to start and get benefits out of it.



Looking forward to seeing the slides/video, as I wont be able to make the London event, unfortunately. Enjoy it though -- SF APIcon was a blast of an event!

@David -- would *love* to see a summary/roundup article on the Linked Data talks at APIcon, if you've got someone to cover it/round up all the videos/slideshares


Hi Comptly...

Most of the content from APIconUK will be made available via video (and other coverage) after the event (not immediately after, but some time after .. we have some post-production work to do on it).



@Mcrider, your post on Medium is indeed consistent with the thinking of what I wrote.  It also reminded me of a post that I saw on the Web but that I cannot find... it was a thought provoking article about why EVERY url on the Web should be an API.