How GraphQL Delivers on the Original Promise of the Semantic Web. Or Not.

This is Part 4 of the ProgrammableWeb API University Guide to GraphQL: Understanding, Building and Using GraphQL APIs.

In previous installments in this series we looked at the history that led up to the advent of GraphQL. Also, we looked at the GraphQL specification and a real world GraphQL API that we created using Apollo Server 2.0, a Node.js-based implementation of the GraphQL specification. Now we're going to take a step back and look at GraphQL in terms of the Semantic Web. The Semantic Web is an extension of the World Wide Web that's intended to enable machines to search and understand the information on the Internet in a meaningful way. The Semantic Web has been quietly influencing the evolution of the World Wide Web since the early days of web page publishing.

In this installment we're going to look at how the Semantic Web came to be. We'll look at the historical evolution of the technologies that emerged to meet its requirements. Then after we understand the historical underpinnings, we'll look at how some of the promise of the Semantic Web is realized by GraphQL, now and going forward. To be clear though, living up to the promise of the Semantic Web was never a stated objective of GraphQL's inventors. It's just that the similarities are too coincidental to ignore. If you're familiar with the Semantic Web, you'll be pleasantly surprised by some of GraphQL's advancements. If you're not familiar with the Semantic Web, it's worth it to learn about what it attempted to achieve and how GraphQL delivers on some of that potential. 

Understanding the Semantic Web

In order to understand the Semantic Web, you need to understand the problem it attempts to solve — how to create a way to describe entities and relationships that are exposed on the Internet in a standard format that is machine-understandable.

The Semantic Web is an idea that has been around since the introduction of the Internet to mainstream computing. An article published by Tim Berners-Lee, James Hendler, and Ora Lassila in a 2001 issue of Scientific American brought the Semantic Web into the mainstream. According to Berners-Lee, et. al.,

"The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined meaning, better-enabling computers and people to work in cooperation."

The Scientific American article built on ideas previously described in the Resource Description Framework (RDF) which was adopted by the W3C in 1999, (The concepts described in the RDF have become central to the Semantic Web as we'll examine soon.)

In order to realize the vision of the Semantic Web, three requirements need to be satisfied. These requirements are:

  1. There needs to be a way to represent data entities in a standardized, self-describing format that is universally machine-understandable.
  2. There needs to be a standardized way to describe any one of the potentially infinite number of relationships that can exist between any two entities.
  3. There needs to be a standardized way to determine the meaning of metadata applied to a given data entity — for example having a way to determine if the attribute, title describes the name of a book or is a prefix applied the name of a person, such as the case of Moby Dick vs. Doctor.

Let's take a look at the way web technology has evolved to satisfy these requirements.

Implementing a Self-Describing Data Format

The first requirement — describing data entities in a standard, self-describing format — has been satisfied for a while. Today the most common formats for publishing data on the web are XML (as shown in Listing1) and JSON (Listing 2.).

<persons>
    <person id="101" firstName="David" lastName="Bowie" dob="1947-01-08" />
    <person id="102" firstName="Nicholas" lastName="Roeg" dob="1928-08-15" />
    <person id="103" firstName="Rip" lastName="Torn" dob="1931-02-06" />
    <person id="104" firstName="Candy" lastName="Clark" dob="1947-06-20" />
    <person id="105" firstName="Mick" lastName="Jagger" dob="1943-07-23" />
    <person id="106" firstName="Buck" lastName="Henry" dob="1930-12-09" />
</persons>

Listing 1: XML is a standard way to format data for publication on the internet

{
    "persons": [
        {"id": 101, "firstName": "David", "lastName": "Bowie", "dob": "1947-01-08"},
        {"id": 102, "firstName": "Nicholas", "lastName": "Roeg", "dob": "1928-08-15"},
        {"id": 103, "firstName": "Rip", "lastName": "Torn", "dob": "1931-02-06"},
        {"id": 104, "firstName": "Candy", "lastName": "Clark", "dob": "1947-06-20"},
        {"id": 105, "firstName": "James", "lastName": "Dean", "dob": "1931-02-08"},
        {"id": 106, "firstName": "Buck", "lastName": "Henry", "dob": "1930-12-09"}
    ]
}

Listing 2: JSON is also a widely supported way to format data for publication on the internet

Both XML and JSON are considered self-describing standards. In other words, the fields of the data structure are apparent. When you look Listings 1 and 2 above, you have no problem determining that there are attributes; firstName, lastName, and dob. The attribute names are embedded in the data structure. You don't need an external reference to figure things out.

Defining Relationships Between Entities

While XML and JSON are self-describing in terms of attributes, the formats offer a way to describe the relationships in and between entities beyond the simple "has a" relationship. A "has a" is implicit in any standard entity definition. For example, a person has a first name; a person has a last name; a person has a date-of-birth. Also, there are "has some" relationships., For example, "a movie has some actors". The has some relationship implies that one entity is related to a number of other entities in some sort of organizational hierarchy as shown in JSON format in Listing 3 where the movie The Man Who Fell to Earth has some actors.

{
    "id": 4001,
    "title": "The Man Who Fell to Earth",
    "releaseDate": "1976-04-18",
    "director":{"id": 102, "firstName": "Nicholas", "lastName": "Roeg", "dob": "1928-08-15"},
    "actors" : [
        {"id": 101, "firstName": "David", "lastName": "Bowie", "dob": "1947-01-08"},
        {"id": 103, "firstName": "Rip", "lastName": "Torn", "dob": "1931-02-06"},
        {"id": 104, "firstName": "Candy", "lastName": "Clark", "dob": "1947-06-20"},
        {"id": 106, "firstName": "Buck", "lastName": "Henry", "dob": "1930-12-09"}
    ]
}

Listing 3: A "has some" relationship is implicit when an entity has an attribute that is an array of other entities

Defining a "has some" relationship requires nothing more than assigning an array to an attribute of the entity. Thus, it's implicit.

The Problem of Complex Relationships

Where self-description gets tricky is when it comes to describing relationships that are not implicit or when more than one relationship exists between two entities. For example, Nicholas Roeg knows David Bowie and Nicholas Roeg likes David Bowie.

The way the HTML specification attempts to address the problem of complex relationship description is by way of the rel attribute and up until HTML 5, the rev attribute. The rel attribute can be used in <a>, <area> and <link> tags to describe the relationship between the current document and the linked document/resource.

A convention has evolved for using the rel attribute. For example, it's commonly used within the <link> tag to bind a document to a stylesheet, like so:

<link rel="stylesheet" href="main.css" type="text/css" media="screen"/>

Also, the attribute can be used to support an HTML microformats keyword such as home. This example shows the attribute used to describe the destination target of a link as a homepage on the site.

<a href="http://example.com" rel="home">Home</a>

Multiple, space-delimited values can be assigned to the rel attribute. Thus, the attribute can define multiple relationships, as shown in the following example.

<link rel="alternate stylesheet" title="Better Styling" href="better.css" type="text/css"/>

The rel attribute makes it theoretically possible to use web pages to publish data to the Semantic Web. Listing 4 below shows an HTML page that lists connections to the profile of the entity, Nicholas Roeg. Notice that the rel attribute is used to define the variety of relationships that the Nicholas Roeg entity has to the other entities in the unordered list.

  <div>Nicholas Roeg</div>
  <div>1928-08-15</div>
  <div>
    <div>Connections</div>
      <div>
        <ul>
          <li><a rel="knows workedWith likes" href="https://en.wikipedia.org/wiki/David_Bowie">David Bowie</a></li>
          <li><a rel="knows workedWith" href="https://en.wikipedia.org/wiki/Rip_Torn">Rip Torn</a></li>
          <li><a rel="knows workedWith" href="https://en.wikipedia.org/wiki/Candy_Clark">Candy Clark</a></li>
          <li><a rel="knows workedWith likes" href="https://en.wikipedia.org/wiki/Buck_Henry">Buck Henry</a></li>
          <li><a rel="knows workedWith" href="https://en.wikipedia.org/wiki/Mick_Jagger">Mick Jagger</a></li>
          <li><a rel="knows marriedTo" href="https://en.wikipedia.org/wiki/Susan_Stephen">Susan Stephen</a></li>
         <li><a rel="knows workedWith marriedTo" href="https://en.wikipedia.org/wiki/Theresa_Russell">Theresa Russell</a></li>
        </ul>
    </div>
  </div>
</body>
</html>

Listing 4: The HTML rel attribute can be used to describe relationships between a parent document and a linked document.

While it is possible to use HTML to publish semantic data to the web, it's a limited approach in the real world. In terms of machine consumption, the parsing alone is daunting. In addition, when a relationship among any one of the entities changes, the entire HTML needs to be updated. Finally, getting a clear picture of the semantics on a page requires inferring a lot about the information that's marked up. In the example web page shown above, there are a lot of implications in play. For example, there's the implication that Nicholas Roeg is the subject of the web page on which the ordered list is hosted. Also, the ordering of the strings in each item in the unordered list implies first name and last name values. And, the association of the name in the list item to the hyperlink is implied too. A simple change in content, say mistakenly changing the strings from Mick Jagger to Merle Jagger, a well known Western-Rock band in California, will corrupt a significant part of the page's semantics.

As you can see, using the HTML rel attribute in this elementary manner is an obscure and brittle way to publish semantic data to the web. Clearly, a more precise way is needed. Fortunately, the Resource Description Framework (RDF) provided the mechanisms that were needed to move forward.

Applying RDF to Create Well Defined Data

The Resource Description Framework (RDF) is a set specifications defined by the World Wide Web Consortium (W3C) which are intended as a model by which to describe data on the internet. RDF addresses the second and third requirements for publishing data to the Semantic Web as I described at the beginning of this article. RDF provides the standardization necessary to describe any one of the potentially infinite number of relationships that can exist between any two data entities. Also, it provides a standardized way to determine the meaning of metadata applied to a given data entity.

We can use concepts described in RDF to not only enhance the HTML on a web page to be more precise in terms of semantic description but, more importantly, we can apply concepts described in RDF to any data we want to make semantically robust.

The place to start is with the RDF concept of a triple.

Using Triples to Define Relationships

A triple describes the semantic relationship between two things. A triple, as the name implies, is made up of three parts: the subject, the predicate, and the object. The concept is found in all human language. Take the example, "Bob likes fish." The subject is, Bob; the predicate is, likes and the object is, fish. Figure 1, below shows graphical description of the sentence as a triple.

Figure 1: A triple describes a relation in three parts — subject, predicate, and object

Figure 1: A triple describes a relation in three parts — subject, predicate, and object

Notice that the triple diagram in Figure 1 illustrates two circles connected by a line. In graph mathematics, the circles are called vertices and the line is called an edge. Vertices (aka, nodes) and edges are important concepts that we'll use when we discuss working with data in GraphQL

Triples are useful for describing a number of relationships between two entities. Figure 2, below, shows the entity, David Bowie and the other entities to which he is related. Notice that each of the relationships between entities is clearly described, Also notice that David Bowie has two types of relationships with the song, Heroes. One relationship is that he sings the song. The other relationship is that he composed the song.

As you can see, a triple provides descriptive capabilities that go well beyond the simple "has a" and "has some" relationships found in standard databases.

Figure 2: Using triples captures both entities and relationships

Figure 2: Using triples captures both entities and relationships

Using triples to capture and describe relationships between entities in a standardized manner provides a way to unify all the data in a meaningful way regardless of the Internet domain. However, more pieces are needed. A triple is well-suited to identify entities and the relationship(s) between entities but they do not describe the meaning of those relationships.

For example, consider the word, know. Does using the word "know," as in Nicholas Roeg knows David Bowie, mean that the extent of Roeg's knowledge of Bowie is what he's read in the newspaper? Or does the word "know" mean that Roeg has known David Bowie from first-hand experience? "Know" can have two different meanings. Clearly, we need a mechanism that describes exactly what the word means when applied to the relationship. This mechanism is called a vocabulary.

Defining Meaning Using Vocabularies

A vocabulary, also known as ontology, is a construct used in the Semantic Web to describe entities and relationships within a particular domain. You can think of a vocabulary as a dictionary that describes the usage of a word or term. For example, as we saw above, using the word, "know" to describe a passing acquaintance as opposed to a long-term associate.

In terms of the Internet in general and HTML and XML documents in particular, a dictionary(s) is defined using an XML namespace that references one or many types of vocabularies such as an XML schema, an RDF schema (RDFS) or Web Ontology Language (OWL).

Listing 5 below shows the Friend of a Friend (foaf) ontology applied to the HTML document we presented earlier.

<html>
<head>
<title>Profile</title>
</head>
<body xmlns:foaf= "http://xmlns.com/foaf/0.1/">
  <div><span about="#Nicholas Roeg" instanceof="foaf:Person" property="foaf:name">Nicholas Roeg</div>
  <div><span about="#Nicholas Roeg" property="foaf:birthday">1928-08-15</span></div>
  <div>
    <div>Connections</div>
    <div>
      <ul>
      <li><a href="https://en.wikipedia.org/wiki/David_Bowie">
          <span about="#David Bowie" instanceof="foaf:Person" property="foaf:name">David Bowie</span>
        </a>
        <span about="#Nicholas Roeg" rel="foaf:knows" resource="#David Bowie"></span>
      </li>
      <li><a href="https://en.wikipedia.org/wiki/Rip_Torn">
          <span about="#Rip Torn" instanceof="foaf:Person" property="foaf:name">Rip Torn</span>
        </a>
        <span about="#Nicholas Roeg" rel="foaf:knows" resource="#Rip Torn"></span>
      </li>
      <li><a href="https://en.wikipedia.org/wiki/Candy_Clark">
          <span about="#Candy Clark" instanceof="foaf:Person" property="foaf:name">Candy Clark</span>
        </a>
        <span about="#Nicholas Roeg" rel="foaf:knows" resource="#Candy Clark"></span>
      </li>
      <li><a href="https://en.wikipedia.org/wiki/Buck_Henry">
          <span about="#Buck Henry" instanceof="foaf:Person" property="foaf:name">Buck Henry<span>
        </a>
        <span about="#Nicholas Roeg" rel="foaf:knows" resource="#Buck Henry"></span>
      </li>
      <li><a href="https://en.wikipedia.org/wiki/Mick_Jagger">
          <span about="#Mick Jagger" instanceof="foaf:Person" property="foaf:name">Mick Jagger</span>
        </a>
        <span about="#Nicholas Roeg" rel="foaf:knows" resource="#Mick Jagger"></span>
      </li>
      <li><a href="https://en.wikipedia.org/wiki/Susan_Stephen">
          <span about="#Susan Stephen" instanceof="foaf:Person" property="foaf:name">Susan Stephen</span>
        </a>
        <span about="#Nicholas Roeg" rel="foaf:knows" resource="#Susan Stephen"></span>
      </li>
      <li><a href="https://en.wikipedia.org/wiki/Theresa_Russell">
          <span about="#Theresa Russell" instanceof="foaf:Person" property="foaf:name">Theresa Russell</span>
        </a>
        <span about="#Nicholas Roeg" rel="foaf:knows" resource="#Theresa Russell"></span>
      </li>
    </ul>
    </div>
  </div>
</body>
</html>

Listing 5: Applying a Semantic Web vocabulary to a to a web page

Notice that the foaf markup applied to the HTML does not affect the rendering of the document in the browser, nor should it. (See Figure 3). The purpose of using the Friend of a Friend ontology is to provide a way for machines to understand the semantics represented in the HTML.

Figure 3: Applying an XML based ontology to an HTML document does not affect visual rendering

Figure 3: Applying an XML based ontology to an HTML document does not affect visual rendering

How does a machine interpret the semantics? Take a look at Listing 6, below.

Listing 6: A snippet of HTML that implements the Friend of a Friend (foaf) vocabulary as an XML namespace

Listing 6: A snippet of HTML that implements the Friend of a Friend (foaf) vocabulary as an XML namespace

Notice how the the XML namespace prefix foaf declared on line 1 of Listing 6 is bound to an ontology on the Internet at http://xmlns.com/foaf/0.1. Notice also the namespace property name which is used at line 3 and the namespace property birthday which is used at line 6. What's going on behind the scenes is that the name and birthday properties, which are part of the foaf namespace, are used to describe data on the web page; in this case the name "Nicholas Roeg" and the birthday "1928-08-15." What's interesting in terms of the Semantic Web is that because the namespace is defined on the Internet (See Figure 4, below), any machines implementing the HTML and consuming the HTML have a common reference point — the ontology at http://xmlns.com/foaf/0.1 — by which the semantics on the web page can be understood.

Figure 4: Putting an ontology on the internet as a namespaced resource provides a semantic definition that is common to both publishers and consumers of information

Figure 4: Putting an ontology on the internet as a namespaced resource provides a semantic definition that is common to both publishers and consumers of information

Once the semantics are applied, it's quite possible to program a search algorithm that has the instructions to "go inspect all resources on the Internet that support the ontology defined at, http://xmlns.com/foaf/0.1 and return all entities in which name="Nicholas Roeg" and birthday="1928-08-15".

But, this is only the tip of the iceberg. Remember, we need a way to describe not only simple name-value pairs found in profile data but also the one or many relationships between entities. Again, ontologies solve this problem.

Continue on page 2.

Be sure to read the next GraphQL article: How Companies are Making GraphQL Work in the Real World

 

Comments (0)