How Ably.io Uses gRPC APIs to Streamline Its Messaging Service

In previous installments of this series, we looked at the historical events that led to the creation of gRPC. Also, we examined the details that go with programming using gRPC. We discussed the key concepts of the gRPC specification. We took a look at the application we created especially for this series that demonstrates key gRPC concepts. And, we examined how to use the auto-generation tool, protoc provided by gRPC to create boilerplate code in a variety of programming languages to speed the gRPC development. We also talked about how to bind to a protobuf files statically and dynamically when programming under gRPC. In addition, we created a number of lessons on Katacoda's interactive learning environment that illustrate the concepts and practices we covered in the introductory articles.

Having presented the basics required to understand what gRPC is and how it works, we're now going to do a few installments about how grPC is used in the real world. In this installment of ProgrammableWeb's series on gRPC we're going to look at how the Messaging as a Service platform Ably.io uses gRPC to optimize its service's streaming capabilities. We'll provide a brief overview of the Ably.io tech stack and then we'll look at how gRPC is used to optimize communication in the service's control plane.

Messaging as a Service Under Ably.io

As event-driven architectures continue to grow in prominence on the IT landscape, effective message systems play an increasingly critical role. Event-driven architectures offer a high degree of flexibility for creating applications and services that need fast, accurate data in real-time.

A good example of an event-driven application is Uber or Lyft. The code that hails a driver for a rider is essentially waiting around, doing nothing, until a software event happens. That software event is when a rider pulls out their smartphone, opens the Uber application, and requests a ride. This software event in turn triggers a series of message exchanges that ultimately result in the execution of the ride-hailing code that was waiting to be awoken.

However, challenges arise when message delivery becomes delayed or corrupted. Missed messages can result in application inaccuracies or at worst, system failures like a rider not getting their ride. Thus, a distributed application that uses an event-driven architecture is only as good as the messaging system that supports it.

Ably.io understood this need when it created its messaging service in 2016. According to CTO Paddy Byers, Ably.io was built from the ground up to "create a service that would comfortably encompass not just the demands of the most popular consumer apps, but that would lead the way in enabling the massive growth in instantaneous and high-value data exchanges between global businesses."

In short, Ably.io is a PubSub messaging service that distributes discrete messages at high volume and low network latency for any app that requires asynchronous events to be delivered. Ably.io supports direct and fan-out distribution patterns. (See Figure 1, below)

Figure 1: The Direct and Fan Out message queue patterns

Figure 1: The Direct and Fan Out message queue patterns

The direct pattern is one in which a message is delivered to a specific message queue. In a fan-out, a message is delivered to multiple queues simultaneously. The direct pattern is good for a message queue that supports a specific concern while a fan-out pattern is well suited to applications that support a variety of interested parties such as sports events and large-scale broadcasting.

Ably.io allows companies to enjoy the benefit of industrial-strength messaging and streaming without having to make the massive investment required to support such an infrastructure. With Ably.io companies pay only for what they use.

gRPC is key to the Ably.io infrastructure.

Understanding the Connection Challenge

Ably.io had a fundamental problem to solve in order to make its messaging service support the breadth of scale required for its intended corporate customers. The problem is centered on how the Linux operating system handles network connections. The Linux kernel supports a limited number of file descriptors per machine. This is an issue because every network connection on a machine has a corresponding file descriptor. Thus, the number of network connections available to a system is limited.

For the average computer user or service, this is not a problem. But when you have a messaging system that might have millions of users connected to it, exhausting the file descriptor limit is a real possibility. Paddy Byers, CTO of Ably.io described the problem in a recent interview with ProgrammableWeb. According to Byers, "When you scale a system that is made up of a cluster that has an arbitrary scale, you need to be able to make everything work without scaling the number of individual connections you have because the number of connections... is limited. The Linux kernel has a fixed number of file descriptors."

In order to get Ably.io's messaging service to work, the company needed a workaround. CTO Byers came across gRPC in early 2015. One of the things Byers found attractive about gRPC is that the technology uses HTTP/2 as its underlying protocol.

HTTP/2 differs from HTTP/1.1 in a significant way. Each request made to the network over HTTP/1.1 incurs a new network connection. For example, it's not unusual for a commercial web site to incur a hundred connections or more to load a typical web page, as shown in Figure 2, below.

Figure 2: Web pages the HTTP/1.1 will incur a new network connection for each request

Figure 2: Web pages the HTTP/1.1 will incur a new network connection for each request

HTTP/2 makes it so multiple requests from the same originating source can be made over a single connection. This an important distinction from HTTP/1.1. Allowing multiple requests over a single connection reduces file descriptor utilization and it also improves application performance. In addition, HTTP/2 supports two-way streaming. All an application needs to do is establish a single network connection over HTTP/2. Then continuous streams of data can traverse the connection in both directions, from client to server and server to client.

gRPC and HTTP/2 were the technologies that Ably.io needed to accomplish its mission. As Byers recalls, "What you need is an RPC service that multiplexes multiple streams and multiple operations over a single connection. And that protocol didn't really exist until HTTP/2 came along."

By the end of 2015, Ably.io had integrated gRPC into its stack. Byers reports, "We were quite an early adopter. I integrated over a weekend. It was backed by Google so it looked like technically it was heading in the right direction. It was a very credible solution at that time. So we decided to adopt it."

How Ably.io uses gRPC

Ably.io uses gRPC in a very particular way. First of all, its gRPC implementation is not client-facing. Ably.io's public interface exposes its service via standard messaging protocols such as MQTT, AMQP, STOMP, and WebSockets as well as HTTP/1.1 using its REST API. Ably.io's gRPC activities take place behind the scenes on the server-side (similar to the way many of Google's public-facing APIs are built).

Ably.io's essential value proposition is that companies get messaging capabilities without incurring the cost of an enterprise-grade messaging infrastructure. It's the difference between buying the electricity from a power company or buying a boatload of generators that will provide electricity. Some companies will benefit from owning the generators, most won't.

However, no matter who owns the infrastructure, it still needs to exist and needs to be managed. As mentioned above, Ably.io takes on the work of creating the infrastructure and managing it. Customers pay for what they need.

However, Ably.io takes things a bit further in that it optimizes message activity according to geographic location. Instead of all messages going to and coming from a common location, for example, a data center in Chicago, Ably.io moves the emission and consumption of messages as close as possible to the source and target locations. If you're in Perth Australia, you transmit to the location closest to Perth. If you're in Hong Kong, you get your messages from a target location closest to Hong Kong. Providing proximate delivery improves performance. Latency decreases as you physically move closer to the source of messaging activity. It's the difference between delivering a package across the street and delivering it across town.

Doing all this — message management, queue and fanout management, collecting messages, moving them to optimal emission locations, etc. are herculean tasks that Ably.io accomplishes internally in the Ably.io infrastructure. This is where gRPC plays a critical role.

As mentioned above, Ably.io's public-facing interface supports the messaging protocols that are typical in an event-driven architecture. But, internally things are more streamlined. Ably.io condenses all the messages coming in from external network connections into a smaller number of HTTP/2 connections. Also, Ably.io separates activity into two planes; data, and control. (See Figure 3 below.)

Figure 3: The Ably.io architecture relies upon gRPC to support its internal Data and Control planes

Figure 3: The Ably.io architecture relies upon gRPC to support its internal Data and Control planes

The data plane holds user data. The control plane contains data relevant to managing the Ably.io Messaging as a Service platform. Once inside the Ably.io infrastructure, data is encoded into the Protocol Buffer binary format according to a schema defined by Ably.io. The logical processing is accomplished via gRPC method calls.

Using gRPC makes data exchange fast and efficient. It has served Ably.io well over the years. However, this is not to say that using gRPC at the onset was an easy undertaking. There were problems.

Growing Pains

Things were not easy for Ably.io when starting out with gRPC. There were a lot of bugs. CTO Bayers told ProgrammableWeb. "Early on, especially with the Node.js implementation, we literally had crashes. We would have processes just exiting as gRPC crashes. And then as I say, you would get these anomalies where requests would stop working in one direction. So you get messages being backed up or you would drop events or those kinds of things. And, now, I would say it's [the various gRPC implementations] improved a lot."

In addition to crashes, there were times when requests would drop without notification of failure. To address the issue the company implemented heartbeats and liveliness checks to monitor the state of the various gRPC components within its infrastructure. Also, the company paid close attention to making sure that it always had the latest version of gRPC installed. As Byers reports, "You have to keep moving forward with the updates."

Ben Gamble, Head of Developer Relations, pointed out during the interview that another problem Ably.io experienced with gRPC was that as their gRPC Protocol Buffer schemas became hard to manage over time. According to Gamble,""... as you keep incrementing systems, you end up with this massive maintenance problem with how your protobufs are actually even defined. You end up with the fact that you can't easily remove things from the definition itself. … [eventually] you end up with this massive overhead in every single part of the Protocol Buffer."

Gamble continues, "if you actually chart your protobuf by size over time, it will creep up unless you've done a hard reset at some point. That's great if you can maintain the fact that all your systems are going to be fully up to date. But if anything is outside of your control, you end up with these constant increments [in] size, which means the overhead just grows."

Another pain point for Gamble was that Protocol Buffers version 2 supports the required label on data fields. Marking a field as required makes it so the validation mechanisms inherent in the Version 2 gRPC libraries throw errors when a required field hasn't any data. This made ensuring backward compatibility difficult when making changes to the schema. Gamble said there were many times that he wanted to make a change to the schema, but was unable to due to a required field problem in the legacy schema. Both Byers and Gamble acknowledge that things have gotten a lot easier for the company's development efforts with the introduction of Protocol Buffers Version 3. Version 3 has done away with the required label. Instead, validation checks now need to take place within the business logic of the gRPC implementation.

Putting It All Together

Ably.io is still committed to gRPC. The benefits it provides have yet to be matched by other technologies. The implementation has come a long way since Paddy Byers did his first implementation of gRPC over a weekend in 2015. Today gRPC is a mainstay in the company's technology stack. Byers states unambiguously that from his point of view, "gRPC is the defacto choice for any internal interaction between components."

Interoperability was a key attraction for the company when it first adopted gRPC. Byers liked the idea that he wouldn't be constrained to a particular programming language to keep moving forward with Ably.io's gRPC development. Having access to a broad variety of developers from which to hire is a plus.

Ably.ios current implementation of gRPC is in Node.js, but they are doing more implementations in Go moving forward. Ably.io plans to keep using gRPC as an internal technology. Byers acknowledges that the company is not getting a lot of interest in a public-facing gRPC API. However, one area where he does see a potential for using gRPC on the front-end is in the Internet of Things space.

Today Ably.io handles millions of messages per second for companies that operate in a variety of business sectors. Its use of gRPC is an ongoing testament to the power of technology. But, with great power comes great complexity. It took Ably.io a few years to make gRPC work reliably for the business. They have no regrets, but they did have growing pains. Companies that decide to adopt gRPC will enjoy a somewhat easier path now that the technology is more mature and has wider adoption. But, there will still be challenges. Fortunately, the road to effective adoption has been made easier because companies such as Ably.io have led the way.

Be sure to read the next API Design article: New Features in GraphQL Editor 3.0 Enrich the API Development Experience