How GraphQL Delivers on the Original Promise of the Semantic Web. Or Not.

Continued from page 1

XML schema, RDF schema and OWL support inheritance. Inheritance makes it possible to create an ontology that builds on work done previously in other ontologies. For example the ontology published by the Semantically-Interlinked Online Communities (SIOC, pronounced, shock) extends the Friend of a Friend ontology by publishing properties that describe relations relevant to social media such as Facebook. One such property is "likes".

Listing 8 below is a snippet of the HTML displayed earlier. Only, this time the SIOC ontology is used instead of FOAF. (See line 1.) As mentioned earlier, the SOIC ontology supports the property "likes." Thus, we can use the "likes" property in conjunction with the "knows" property to describe that Nicholas Roeg has two relationships with David Bowie. One relationship is that Roeg likes Bowie (line 11) and the other relationship is the Roeg knows Bowie (line 12).

Listing 7: Applying the SIOC ontology to describe a ''likes'' relationship

Listing 7: Applying the SIOC ontology to describe a "likes" relationship

Applying the SOIC ontology to the HTML allows the web page to describe a multitude of relationships to entities on the web page that are machine readable. Listing 9 below shows the result of a semantic analysis of the SOIC ontology that's implicit when applying the ontology to the HTML for the web page we displayed earlier.

Listing 9: The semantics of the data in the web page according to the SOIC ontology

Listing 10 shows the result of subjecting the web page to a conversion algorithm to describe the semantics on the page as pure XML according to the RDF format (again, RDF being the foundation of the Semantic Web).

<?xml version="1.0" encoding="utf-8" ?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:soic="http://rdfs.org/sioc/ns#">
  <rdf:Description rdf:about="http://localhost#Nicholas Roeg">
    <soic:name>Nicholas Roeg</soic:name>
    <soic:birthday>1928-08-15</soic:birthday>
    <soic:likes rdf:resource="http://localhost#David Bowie"/>
    <soic:likes rdf:resource="http://localhost#Theresa Russell"/>
    <soic:knows rdf:resource="http://localhost#David Bowie"/>
    <soic:knows>
      <rdf:Description rdf:about="http://localhost#Rip Torn">
        <soic:name>Rip Torn</soic:name>
      </rdf:Description>
    </soic:knows>

    <soic:knows>
      <rdf:Description rdf:about="http://localhost#Candy Clark">
        <soic:name>Candy Clark</soic:name>
      </rdf:Description>
    </soic:knows>

    <soic:knows>
      <rdf:Description rdf:about="http://localhost#Buck Henry">
        <soic:name>Buck Henry</soic:name>
      </rdf:Description>
    </soic:knows>

    <soic:knows>
      <rdf:Description rdf:about="http://localhost#Mick Jagger">
        <soic:name>Mick Jagger</soic:name>
      </rdf:Description>
    </soic:knows>

    <soic:knows>
      <rdf:Description rdf:about="http://localhost#Susan Stephen">
        <soic:name>Susan Stephen</soic:name>
      </rdf:Description>
    </soic:knows>

    <soic:knows rdf:resource="http://localhost#Theresa Russell"/>
  </rdf:Description>

  <rdf:Description rdf:about="http://localhost#David Bowie">
    <soic:name>David Bowie</soic:name>
  </rdf:Description>

  <rdf:Description rdf:about="http://localhost#Theresa Russell">
    <soic:name>Theresa Russell</soic:name>
  </rdf:Description>
</rdf:RDF>

Listing 10: The web page semantics translated into pure RDF/XML

And finally, Figure 5 shows a visual graph of the triples implicit in the web page illustrated previously in Figure 3 after the SOIC ontology has been applied. The illustration was created using the RDF validation tool published by the W3C.

Figure 5: Data defined in an RDF dataset expressed in a semantics graph

Figure 5: Data defined in an RDF dataset expressed in a semantics graph

As you can see, the vision of the Semantic Web and its implementation by way of RDF does indeed offer a way to unify all data published on the Internet as meaningful information. But, you need to know a lot to make it work. In addition to understanding the underlying concepts, you need to have a good degree of mastery in terms of the implementation of ontologies. It's a daunting task that has, for the most part, kept development for the Semantic Web outside of the scope of the commercial mainstream. But, just because companies are slow to adopt techniques compatible with the formal application of RDF does not mean that publishing information in the spirit of the Semantic Web is not taking place. There are commercial frameworks that are designed to support the vision of the Semantic Web. GraphQL is one of the more prominent.

Simplifying the Semantic Web Using GraphQL

The Internet and open source standards have changed the fundamental way that applications access data. In the past, a software application was typically dedicated to a single database. Not only was the database proprietary (for example, Oracle, SQL Server, and IBM DB2), but the connection protocol used to access the database was proprietary as well. Today, it's not unusual for a single application to work with data that resides in a variety of different types of databases. And, more often than not, the application will connect to that data using a variety of open-source protocols such as HTTP, SSH, RTMP and XMPP. In order to promote easy reuse, the trend is to put a generic data access layer between the application and the various data storage technologies. This generic layer is what we have come to know as the Application Program Interface (API).

The Rise of the RESTful API

Using APIs is becoming the standard by which desktop and mobile applications access data on the web. At the time this article was published, the most familiar API architectural style was REST. The premise behind REST is that data exists on the Internet as resources. A REST API publishes data as a resource according to a URI, such as the following example

https://openlibrary.org/api/books?bibkeys=ISBN:0451526538

WHERE

https://openlibrary.org/api/ is the root URI of the API, otherwise known as the API's "endpoint"

https://openlibrary.org/api/books is the location of the resource, in this case books that are known to openlibrary.org

? is the character that separates the the resource URL from the query parameter

bibkeys=ISBN:0451526538 is a query parameter, bibkeys assigned the value, ISBN:0451526538, that indicates a particular book in the library.

The data returned by that call to the URI is shown below, formatted as JSON (vs. XML, RDF, etc.) in Listing 11.

{
    "ISBN:0451526538": {
        "bib_key": "ISBN:0451526538",
        "preview": "noview",
        "thumbnail_url": "https://covers.openlibrary.org/b/id/295577-S.jpg",
        "preview_url": "https://openlibrary.org/books/OL1017798M/The_adventures_of_Tom_Sawyer",
        "info_url": "https://openlibrary.org/books/OL1017798M/The_adventures_of_Tom_Sawyer"
    }
}

Listing 11: A RESTful API publishes data in open formats such as JSON or XML.

REST provides a generic way to access data that's published in open formats like JSON and XML. Thus, it scales well. But there are two drawbacks. REST increases the network traffic between the application and the network and the overall semantics of published data is still obscure in terms of machine readability.

As you can see in Listing 11 above, the JSON returned by a call to the RESTful API has fields that contain string data. The fields bib_key and preview contain simple string data. However the data in fields thumbnail_url, preview_url, and info_url represent URLs that can be called subsequently on another trip back to the network. Hence the implication that in order to get all the information relevant to a particular book, a human or machine must make multiple trips to the network, thus the increased network traffic. Again, this is the first problem.

The second problem is that we still have no well defined understanding as to what the fields mean. For example, does the field, preview contain preview data or does it indicate that a preview is available? When a human looks at the key-value pair "preview": "noview" he or she can infer that the field preview is of type boolean indicating that a preview exists or not and that the value assigned — "noview" — implies a value of false. While a human should intuit this pretty quickly, a machine will be completely baffled. Without an ontological reference, the meaning of both the field and the value assigned to it are unknown. REST APIs do indeed solve a great many problems related to publishing data on the Internet. Yet they fall short in terms of fulfilling the promise of the Semantic Web. Clearly something better is needed. This is where GraphQL comes in.

Using GraphQL to Realize the Semantic Web

Whereas in the past, publishing data to the Semantic Web was an arduous undertaking, one of the goals for GraphQL is to simplify the process. Much the same way triples (return to Figure 1) are the foundation of the Semantic Web, they also serve as the foundation of GraphQL. As you might recall, a triple describes a semantic in three parts, the subject, predicate, and object. For example, as mentioned previously, with the statement Bob likes fish, Bob is the subject, likes is the predicate and fish is the object.

While the GraphQL specification is very exact in the way it describes how objects of a graph (aka, nodes) are to be implemented, there is no description for describing an edge (aka, a predicate). In fact all that's really implied is the basic parent-child, "has-a" relationship (eg: a movie has a director).

The GraphQL specification presently does not support a way to standardize defining predicates. However, a convention has emerged. Over the years, the GraphQL developer community has developed a convention that calls an edge (aka, a predicate) a connectionem>. And, the way a connection is defined is by appending the term Connection as a suffix added to a descriptor (eg: likesConnection, knowsConnection, etc) as shown in Figure 6 below.

Figure 6: Using the Connection suffix is a convention that has emerged in the GraphQL community for identifying edges (aka predicates)

Figure 6: Using the Connection suffix is a convention that has emerged in the GraphQL community for identifying edges (aka predicates)

Where it gets tricky is to understand that while the notion of a connection implies a predicate, at the implementation level, a connection is an array of one-to-many nodes that satisfy the implied predicate. For example, a likesConnection is an array of person entities, in which each person is liked by the entity that "owns" the likesConnection array. (See Figure 7.)

Figure 7: The "Connection" naming convention implies the relationship between an entity and an array of entities

Figure 7: The "Connection" naming convention implies the relationship between an entity and an array of entities

Actually, coding to the Connections convention in GraphQL's Schema Definition Language varies among developers. Remember, the only thing the convention really requires is naming the array of entities with the suffix, Connection. While the naming part is easy, things can get complex in the implementation.

Take a look at Listing 12 below, which is an example of a GraphQL query named, searchPerson. The query searches for a person according to firstName and lastName parameters. The information that's returned is a paginated collection of entities that have the first and last names specified in the query.

Listing 12: A GraphQL query that declares a likeConnection

Listing 12: A GraphQL query that declares a likeConnection

Pagination in GraphQL is a complex topic that we discussed previously in Part 3 of this series. So, we won't go into a lot of details about it now, The important thing to understand about Listing 12 above is that in addition to displaying the firstName and lastName of each person returned (lines 7 and 8), the query is configured at line 9 to display a likesConnection for that person too.

As mentioned previously, by convention a Connection is an array of objects that have a particular relationship to the owner. However, things can get confusing when you look at the structure of an object in the likesConnection array. It's not a simple person object. Person information is in there but it's nested down into the node object at Listing 12, line 14. The reason for the nesting it so that the likesConnection can support pagination.

In the real world, on Facebook for example, it's possible for one person to like hundreds, maybe thousands of items. Thus, returning the entire list of items at once is impractical. Instead information is returned in chunks. In the case of the search query in Listing 12, likesConnection information is returned as an array of edges (Listing 12, line 13) in which each edge has a node (Listing 12, line 14) and a cursor (Listing 12, line 18). A cursor is the positional identifier of the particular edge from the overall list of edges contained within the API. The node contains information specific to the person, firstName and lastName, for example.

Figure 8 below shows an excerpt from the result of the searchPerson query when executed in GraphQL Playground against the IMBOB demonstration API that accompanies this series.

Figure 8: A likesConnection returns an edges collection

Figure 8: A likesConnection returns an edges collection

The reason for this level of complexity around a Connection is to keep the notion of nodes and edges at the forefront of a GraphQL schema design. Remember, the way the Semantic Web defines relationships is by way of the triple. A triple is made up of an edge between two nodes. Thus, a Connection is associated with a single entity, Nicholas Roeg, for example, and that entity can have a number of similar edges with each edge associated to only on other entity. Admittedly, the technique is a bit awkward, but it does support the underlying principle of a triple: subject, predicate, object. Only in terms of the Connection convention, it's subject, predicate(s) and associated object. It takes some getting used to, but it does work. Will a better convention emerge? Time will tell, which leads up to do some frank analysis of how well GraphQL supports the promise of the Semantic Web.

GraphQL and the Semantic Web: Good But Not Complete

In terms of speed, ease of use and flexibility GraphQL offers significant advantages over RESTful APIs. GraphQL supports declarative resultset definition. This means that unlike RESTful APIs in which the structure of a resultset is predefined and immutable when you create a query GraphQL you define the exact structure of the resultset you want returned. You don't incur the overhead of parsing through unwanted response data to get the information you need. GraphQL makes it so it's possible to get only what you need when you need it.

GraphQL supports recursive querying thus cutting down extensively on network traffic. And, the schema of a given API is discoverable to both humans and machines at runtime via GraphQL introspection. These are significant benefits.

Yet, in terms of supporting the Semantic Web, there's still work to be done. While the emerging Connections convention is a useful way to define and determine the edges and nodes within an API in order to create semantic representations between entities, there is an inherent drawback. The Connections convention is simply that, a convention. It's not a standard. Developers can take it or leave it with marginal consequence. To put it another way, the Java community might prefer to see code written according to the camel casing convention, but no compiler is going to barf if it's written otherwise. However, violating the Java syntax standard by putting a curly bracket in the wrong place will bring the compiler to it knees until the mistake is corrected.

Industries tend to support standards. Thus, in order for Connections to become the standard by which semantics are defined with a GraphQL API, and hence become machine discoverable, the convention must become part of the GraphQL specification. Otherwise, it is simply an arbitrary nice-to-have that's popular among developers.

Finally, there is the larger problem of support for authoritative vocabularies. As of now there is no single authority like foaf or SOIC in the GraphQL ecosystem that verifies and publishes semantic vocabularies in a standard manner, on-par with the way W3C's RDF specification is supported under XML. Without an ontological standard, there is no definitive way to distinguish the meanings behind the way different GraphQL APIs use the same term. For instance, the "knows" example we described above. And, more importantly, without a standard way to publish ontologies, the underlying semantics supported by an API are unknowable to a machine, especially one in another domain that may conform to different conventions. A human can "figure it out", a machine can't, although this might change in the future as machine intelligence matures. In order to fulfill the promise of the Semantic Web, the information in a GraphQL API must be discoverable and meaningful to human and machine alike. Until such time, GraphQL is proving to be an effective framework for developers but is not yet a full-fledged participant in the Semantic Web. The technology still has a way to go.

Next Steps: Moving From Theory to Practice

So far in this series we've covered the history that led to the emergence of GraphQL. We took an in-depth look at the specification and presented a demonstration API that shows you how to implement a GraphQL API under Node.js using Apollo Server. In this installment, we looked at GraphQL in terms of the Semantic Web.

In the next and final installment, we'll look at how companies adopted GraphQL in the real world. We'll look at the challenges they encountered and how they addressed them. Also, we'll examine how they benefited from using GraphQL and the lessons they learned.

Next: Part 5:

Be sure to read the next GraphQL article: How Companies are Making GraphQL Work in the Real World

 

Comments (0)