GraphQL is the alternative to REST APIs offered by Facebook. Developers using Coursera APIs love the flexibility, type-safety and documentation that GraphQL brings. Coursera’s own developers are no less fans. But that’s not to say it was always plain sailing. Coursera engineer and GraphQL enthusiast Bryan Kane describes the bumpy journey from REST to GraphQL over at the Apollo dev blog.
Coursera APIs started life as resource-based REST APIs (e.g. an API for courses, others for instructors and course grades) that were easy to build and test. While there was a nice separation of concerns, as APIs proliferated, performance and documentation suffered. For many endpoints, four or five round trips to the server were necessary just to render a page. Since GraphQL was designed to avoid all these round-trips and provide structured docs automatically, the team was convinced it would be the answer to its problems.
Coursera wasn’t in the position, however, to simply jump ship. There were over one thousand API endpoints both public and internal for backend services communication. It would have taken forever to simply replace each one. To top it off, these APIs served three clients: the iOS and Android apps and the web app. The team wanted the option to roll out GraphQL slowly across services.
To achieve this, the team decided to add a GraphQL proxy layer on top of the REST APIs. The idea was effectively to create a wrapper for the REST APIs. A GraphQL server would send requests to REST API endpoints downstream. This idea was demoed on a pilot page, which proved a success in internal testing.
Success was a little too easy and proved deceptive. As soon as developers set up a demo to show the team, the GraphQL queries started to fail. A little homework showed that the GraphQL server had gotten out of sync with the downstream course catalog service. The schema was updated to fix the demo but it was quickly realized that when the GraphQL schema would be rolled out to 1,000 different resources and 50 different services, keeping the schema up-to-date for all services would prove a nightmare.
What was needed was a single source of truth. The REST APIs were that source of truth so to make sure that GraphQL would always be in sync with the other backend services the team had to build the GraphQL layer automatically and deterministically so that it would always reflect the underlying architecture. Thankfully, the team’s REST framework gave Coursera everything it needed to do this. Each service could dynamically provide a list of all its REST resources and each of those resources could list its own endpoints and arguments.
The team consequently set up a task on the GraphQL server to ping every downstream service every five minutes and request all the above information. A conversion layer was then written between the Pegasus Schemas used by the REST APIs and the GraphQL types. The team subsequently defined a translation between GraphQL queries and REST requests in order to power a GraphQL server that would never be more than five minutes out-of-sync thanks to those pings.
But this wasn’t the end by a long shot. One of the main reasons for using GraphQL was to save on round-trips. The above setup as it stood promised to still require round-trips. To fix this, the team had to connect up the GraphQL resources and models so that they would know about each other (something that hadn’t been necessary with the REST models). This required developers to specify relations between resources manually. So, for example, a course resource would need an instructors field. To fetch those instructors, they had to be looked up via their ids available in the relevant field of the course.
Once these issues were ironed out, the team was able to introduce the GraphQL server to the production environment where it’s been running for over six months. Both client and in-house developers now have an easier time writing queries and pages using the APIs load much faster.