This past May, the subject of linked data made a showing at ProgrammableWeb's API conference in San Francisco and with this next iteration of APIcon in London from Sept 24-26 2014, we're giving the topic even more coverage. The sessions and workshops are not to be missed and if you have not already registered, be sure to visit the APIconUK Web site.
Oh, the WWW? I can do better.
Let's face it. Inventing the World Wide Web is a pretty tough act to follow and ever since Sir Tim Berners-Lee invented it, the rest of the industry has been very busy extending it and stretching the core technology to its very limits. But as much as the technology industry took Berners-Lee's baby, ran with it and continues to run with it, he has long believed that the Semantic Web is an even bigger idea. Though I haven't spoken to him recently, every discussion that I've been lucky enough to have with him has come back to the idea of the Semantic Web.
So, what is the Semantic Web? The Wikipedia's entry does a pretty good of summarizing the Semantic Web's foundational concept
Humans are capable of using the Web to carry out tasks such as finding the German translation for "eight days", reserving a library book, and searching for the lowest price for a DVD. However, machines cannot accomplish all of these tasks without human direction, because web pages are designed to be read by people, not machines. The Semantic Web is a vision of information that can be readily interpreted by machines, so machines can perform more of the tedious work involved in finding, combining, and acting upon information on the web.
In English, what this means is that we humans are pretty good at sussing context out of the Web. We know how to find pages based on their content, often starting with search engines, and we can make pretty accurate assumptions from the data and links that appear on most Web pages. For example, just two paragraphs ago, I linked to an entry in the Wikipedia. You can assume based on the context of the link and what you know about the Wikipedia that you're going to find a detailed description of the Semantic Web when you click on that link. If there's data on the page, like the data we keep for the thousands of APIs in our API directory, you can use the labels on the page to understand what that data represents (like the name of an API). Unfortunately, to the extent that most Web pages contain data and links, most computers (machines) are incapable of making the same assumptions. Theoretically, this makes us humans do more work where the computer could be doing that work for us.
Even worse, where our API directory has some information about an API (like any articles that have been written about it) and the API provider's developer portal for that same API has other information, the machines involved in storing and presenting both have no way of knowing that the two sets of information actually go together and belong to the same API.
A typical ProgrammableWeb API profile includes a bunch of data connections (aka links). As the chief engineer of ProgrammableWeb's current data model, I can tell you that those connections were either provided by a human, or, as in the case of an API's categories, automated with logic that was written, debugged, and maintained by a human. My guess is that 99.99 percent of the world's Web sites are built this way.
Imagine if those links didn't require hand entry. Imagine if the logic never had to be written, debugged, or maintained. Imagine if the words, numbers, and data could, without significant programming, find their way to other related words, numbers, and data in such a way that enables an entirely different user experience; one that brings that related data to the user instead of the user having to go to the data. Imagine the time, resources, and ultimately money saved.
An example of this user experience can be seen on Google today. Use Google to search for chicken cacciatore and the search results appear to be enhanced by all sorts of information that you'd normally have to click to get (see screenshot below). This includes ratings, the number of reviews, and the preparation time for many of the listed recipes. Off to the right are images and text, pulled in from Wikipedia, that define "cacciatore." Here on ProgrammableWeb, if we supported this idea of "linked data," we could create a user experience where just mousing over the phrase "chicken cacciatore" (where it appears in the first sentence of this paragraph) would pop-up a small window that contains the same (or other) information found on Google's search results page, all driven by data found on other sites scattered across the Web.
Berners-Lee envisioned a Web where all sites inherently included this capability. A Web that involved far less effort, time, and expense for both Web site producers and users. A Semantic Web. On social networks, this idea of starting with one piece of data (i.e.: a user of Facebook) and finding your way to other data (that user's friends, and then their friends, and who they work for and where they live) is often referred to as a social graph. A social graph is an example of a data graph and the foundational element of a data graph is something called a triple. "David is a friend of Wendell" is a triple. It involves two objects (David and Wendell) and the explanation of the relationship. In true Semantic Web vernacular, "David" is the subject, "is a friend of" is the predicate, and "Wendell" is the object. When linked together (David knows Wendell who knows Kevin and so on..), triples form the basis of graphs.
Berners-Lee's Semantic Web and the set of technologies that drove it -- things like RDF, the SPARQL query language, and triplestores (where triples are stored and retrieved) -- solved the key problem of how to fashion and query a data graph. One that, unlike today's social networks, inherently works across sites. For example, I shouldn't have to tell Facebook where I work. Facebook should inherently know where to find my work profile on LinkedIn (without me implicitly telling Facebook where to find that profile), and should be able to present that information to someone viewing my Facebook profile. Both sites could list all of my friends, whether they are connected to me on Facebook, LinkedIn, Twitter, or Google Plus.
Despite Berners-Lee's constant stumping, it has been tough for the Semantic Web to rise above the noise of the first Web, especially when marketers with no relation to him or the World Wide Web consortium (W3) start tossing terms like Web 2.0 and Web 3.0 into the mix (propagandists who are far better at marketing than the W3 will ever be). Open platform-level technologies also depend on a strong mix of tools to ease the pain of all stakeholders; especially developers. With the lion's share of the opportunity perceived to be in extending the existing Web, the toolmakers have focused the majority of their resources on keeping pace (much to the detriment of the Semantic Web). As, the pace of innovation drove towards increasingly lighter-weight ways for producing richer user experiences on existing Web technology, it would be an understatement to say that the collection of technologies represented by the Semantic Web has gotten and continues to get a lukewarm reception.
...to work with RDF you typically needed a quad store [which stores the name of the graph, in addition to the typical data kept in a triple store], a SPARQL engine, and some hefty libraries. Your standard Web developer has no interest in that toolchain because it adds more complexity to the solution than is necessary.
All is not lost
Whether you're a site operator, a developer, or a Web user, if the benefits of machine readable graph-based data structures appeal to you (and they should), the good news is that they're no longer tied to the adoption of RDF and SPARQL. In my mind (and maybe I'm being naive), there's the Semantic Web, and the semantic web. Whereas the Semantic Web is about data graphing with RDF, triplestores, and SPARQL, the generic semantic web is technology-agnostic. It shouldn't matter how triples are implemented or what technology is used to crawl graphs of linked data. In the end, what matters are things like great, efficient user experiences and dramatically lowering the cost of achieving them -- the sort of vision embodied by Berners-Lee's Semantic Web.
Fortunately, as somewhat exemplified by the chicken cacciatore example, the core principles of the Semantic Web and machine readable/crawlable graphs of linked data are not just alive and well, but are on course to meet the needs of today's modern Web and mobile app developers [Sidebar to all you Web site operators with SEO sensitivities: the various search engines appear on track to give linked data increasingly more weight in their rankings].
According to Sporny, with over two decades of research and development having gone into the semantic web (the generic one), there have been other attempts to deal with the Semantic Web's overhead. Microformats was one of them. But it wasn't until the W3's recent work on a Web payment standard that the need for more developer-friendly graphs of linked data became an imperative.
Via telephone, Sporny told me "We needed a way for building payments and identity into the core of the Web. Something that could handle items for sale on the Web, the ability to transact, digital receipts, and the identities of buyers and sellers." It was almost a problem that was tailor-made for machine readable graphs of linked data to solve. But with developer adoption being key, and developers having resisted the traditional Semantic Web toolset, the W3's Web Payments Community Group needed something that spoke the native tongue of today's developers. Whereas the traditional Semantic Web toolset requires learning an entire new domain of Web development and changing today's common development flows, Sporny says the Web Payments WG was after something "that was based on how the Web is being built today." Something where, according to Sporny, developers could "realize online identities and the ability to transact with one another in a peer to peer fashion."
But given how standard plain vanilla JSON is ill equipped to handle linked data, the W3 needed to come up with a standard adaptation that supported semantic web principles. The result, ratified as a W3 "recommendation" earlier this year, is JSON-LD 1.0 ("LD" stands for LInked Data). Of JSON-LD, Sporny, who co-authored and co-edited the specification, told me "By design -- it was no accident -- JSON-LD is designed specifically to not scare other people away from the semantic web. We started thinking of Web developers and their existing reliance on JSON and how to work linked data into their standard web development practices."
For believers in the semantic web (and the Semantic Web), the timing of JSON-LD could not have been better. JSON has the wind of the API economy at its back. As more API economy stakeholders begin to realize how the benefits of linked data (including SEO benefits) can be had without significant disruption to their existing Web development practices, the astronomic growth of the API economy could end up accelerating widespread adoption of semantic web principles.
This is why linked data is one of the main themes of APIcon in London. No matter where you sit in the API economy, it's time to get smart about linked data. If you're an API provider, it's time to get smart about what linked data means to you, how you can better prepare your organization for a linked data future, and how to rally your partners, your industry, and adjacent industries to the cause. If you're a developer, it's time get smart about using APIs to consume linked data, to understand the sorts of experiences you can build with it, and why to push back on your API providers to rethink their offerings in terms of semantic web principles.
To meet that need, we have four great sessions at APIconUK. Two of them are led by one of Sporny's JSON-LD 1.0 Specification co-authors/editors Markus Lanthaler and the other two are led by Freshheads Technical Architect Dimitri Van Hees. Lanthaler will first offer a session that explains the hows and whys of semantic web and linked data principles. Then, he'll be giving a workshop on how to use Hydra -- a solution he invented for creating and consuming JSON-LD-based APIs. Meanwhile Van Hees is offering a conference session that reflects on the realities of JSON-LD deployments that he has been involved and a developer workshop on how to make the most of existing data sources on the Web (especially ones like dbpedia.org that participate in the linked data economy). Lanthaler is also one my handpicked ProgrammableWeb Innovation Showcase (Sept 26) presenters.
It doesn't matter who you are; a developer, a Web site operator, an API provider, or a user of the Web. Lanthaler and Van Hees are helping to lead a revolution that will eventually overtake us all. If you are in or around London at the time of APIconUK (Sept 24-26, 2014), their sessions are ones that are not to be missed. If you are already registered for APIconUK, I look forward to seeing you there. If you are not registered, what are you waiting for? Go to the APIconUK web site and sign up. Can't afford the modes price of entry? Let me know at the email address below and we'll work something out!
See you there.