XML vs. JSON - A Primer

As more and more Web and mobile applications utilize APIs to drive their respective front ends, performance becomes an emerging concern. XML, long used as a method for exchanging data, is giving way to JSON, now considered the gold standard. But is JSON always the right choice over XML? As an API designer, it’s important to understand the foundations of JSON and XML, as well as the differences between them.

XML: ‘Data stuffed’

XML, or Extensible Markup Language, is the functional cousin to HTML. Where HTML is responsible for displaying data in a human-readable format (in a Web browser, for example), XML is responsible for representing the structure of that data before it is transported from one system to another. XML is well-defined, widely supported and clearly structured. These attributes originally helped to make XML a fundamental part of the Web when SOAP (Simple Object Access Protocol), the precursor to RESTful Web Services, was the preferred way for one system to request data from another over the Web. (Email was another supported transport.)

Responses were invariably structured as XML. These SOAP requests were hand-crafted, stuffed with data and transported to their destination for disassembly. In addition, there was no promise that the stakeholders on the receiving side of the transaction were parsing the document and displaying it in a neat, HTML Front-end application. In fact, I’ve seen people simply read such documents with Microsoft Notepad.

With all of that said, XML can hold any data type imaginable, in an easy-to-read format. Consider this snippet of XML:

<?xml version="1.0" encoding="ISO-8859-1"?>

It doesn’t take much to deduce that this piece of code represents a medical document of some sort.

JSON: A model of efficiency

XML has worked and worked well in many different situations, but, in most cases, JavaScript Object Notation (JSON) is now the preferred means of data marshalling. The reasons are many, but they include the way in which modern browsers offer native JSON support, preference among the most popular Javascript frameworks, and the number of off-the-shelf utilities for working with JSON-formatted data in non-Javascript languages such as Ruby and .NET.

However, the biggest reason that JSON is now being used over XML is that JSON is inherently more efficient. With JSON, data is already represented as a Javascript object (again, widely understood by both Javascript and other languages), fewer bits are being passed across the wire, and less machine time is required to process data (on either end).

While JSON isn’t meant to follow a strict standard and certainly isn’t readable by humans in the way that XML is, its lack of overhead gives it an advantage when performance is an issue. And when isn’t performance an issue--especially given how unaccepting users are of latency in their Web and mobile applications? Think of the modern Web: AJAX requests are meant to enable rich user experiences without unnecessary page reloads or waiting. And, as businesses work to capture and engage existing and potential customers, responsiveness becomes a paramount concern for developers.

When readability and complexity are no longer goals, the agreeable structure and opening/closing tags of XML just take up valuable space and processing time in an HTTP request and response. Instead, we want our machines to read data and transform it into something meaningful that business logic can interact with. This is where JSON shines: It is exceedingly fast at Serialization and Deserialization. For example, a complex C# domain object can be serialized into a JSON string and transported in a single line of code. That same object can be deserialized into an object in client-side Javascript and acted upon accordingly. Check out this sample JSON string:

    “books” : [
    { “title”: “My First Book”, “Author” : “Fake Author” },
    { “title”: “Test book” , “Author” : “Real Author” }

The secret of JSON isn’t much of a secret at all--it’s just a string representation of a key/value pair. The values can be arrays (as above), strings, integers or even objects. The key names are relevant and enable JSON parsers to fly through the records and convert them into objects developers can interact with in code. JSON is also lightweight, which means existing objects and collections can be serialized and deserialized quickly and easily.

Choose wisely

There are good reasons for using JSON, and there are still good reasons for using XML. The Platform you choose really depends on what you are working to accomplish, the audience and the data that will be shared. XML’s strength is extensibility and the avoidance of namespace clashes. It holds any data type and can be used to transport full documents with formatting information included.

XML is best used when transporting something like a patient chart or text document with markup included. JSON is purposefully limited and therefore much lighter than XML. I suspect that, most of the time, data can be modeled with hashes and lists comprising simple data types, making JSON the preferred route.

Be sure to read the next API Design article: Building Successful Web APIs