Facebook software engineer, Kewei Qu, presented 5 GraphQL deployment principles that the company follows during her talk "How to Move a Mountain? GraphQL Migration Best Practices" delivered recently at the GraphQL Conference 2019 in Berlin Germany.
Qu pointed out that Facebook’s Relay and GraphQL server teams engaged in a massive migration project that affected the entire GraphQL codebase, client and server. The migration occurred over 7 years between 2012 to 2019. According to Qu, “we’re talking about hundreds of thousands of Relay modules, objects, interfaces, unions, mutations, etc. It’s everything as big as the entire news Feed to things as small as the visibility icon.”
Qu reported that as the development teams reflected back on the migration experience, they discovered a set of similar patterns that they distilled into a set of 5 principles that other development teams can use when engaging in large scale GraphQL migrations.
The 5 principles are:
- Always ask, do I want a migration?
- Plan for incremental changes
- Build adapters
- Automation is key
- Documentation and Evangelism
The following are the details of these principles.
Always Ask, Do I Want a Migration?
Qu pointed out that a migration is always risky, that once a company starts there is always a chance that the migration could be stopped before completion, particularly if a company is not clear about the time and expense required to achieve the goal. Qu says, “to be honest, migration is a complicated and risky task. It requires a lot of human effort and a lot of time. A migration may never complete and then you'll be stuck with supporting two systems forever, which might be a situation that is even worse than what you begin with. So, therefore, it is very important for us to weigh the benefits and costs of migration.”
In order to determine benefit, Qu recommends benchmarking development activity before, during, and after migration. For example, Facebook discovered during their efforts that migrating its Marketplace feature saved a startup time of 900 milliseconds on the GraphQL server side. This metric provided concrete evidence of the benefit to move forward with the migration.
Plan for Incremental Changes
Qu admitted that in a perfect world it would be easier to do the migration work all at once and release it with a single commit of code. But to do so is not practical. The company decided to take an incremental approach. As such, Facebook accepted the additional work required to support two systems, the legacy system as well as the new system under development. Qu points out the decision had two essential benefits. The first is that incremental change accommodates system rollback should bad code be introduced into the system. The second benefit if that incremental change allows your invert the 80-20 rule. Typically in a non-incremental release, 80% of the time is spent addressing the 20% of the released codebase that is causing most of the problems. Incremental release inverts the ratio. As Qu says, “incremental changes allow 80% of your code base to take advantage of the modern system already. And you can spend the [rest of the] time and energy to just focus on those edge cases.”
As mentioned above, in order for Facebook to accomplish its migration, it had to support two systems simultaneously. One was the legacy Classic system and the other was called the Modern system. The way the company conducted its migration activities was to put an adapter in place that served as an intermediary between the two systems.
At first, all user activity was directed to the Classic system. But over time user activity was redirected to the Modern system as its features became operational. Eventually, when the Modern system became fully functional all user activity was directed there. Then, the adapter was removed. Users interacted directly with the Modern system only.
The benefit of using an adapter is that it allowed the migration to avoid the risk of a Big Bang release. Changes were implemented a feature at a time and thus rollback could be conducted at a very fine gain.
Automation is Key
According to Qu, “automation is really the key to the success of a migration.” Automation offers two advantages. First, it can prevent human error. However, Qu was quick to point out that automation can create problems when a script is not written correctly. But she went on to add that if a script was in error, the problems it caused were fixed immediately because it was just a matter of running the corrected script again. Fixing problems manually would take too much time.
The second benefit that automation provided was that it allowed Facebook to introduce changes in a controlled and verifiable manner. During the presentation, Qu displayed the slide shown below in Figure1.
At the beginning of the migration, as shown on the left side of the slide, the rate of new modules released into the Classic system was greater than the rate of new modules released into the Modern system. (Remember, even though the Classic system was in the process of being replaced, maintenance work still needed to be conducted.) However, after automation was introduced into the Modern system’s development process, the rate of new module release into the Modern system began to outpace that of the Classic system. Eventually, the number of modules introduced into the Modern System exceeded the rate of module release into the Classic system to the point where the Modern system could be safely retired. Automation was the driving force behind this dynamic.
Documentation and Evangelism
The migration team at Facebook understood from the beginning that developer support was critical for the successful adoption of the Modern system. The company knew that it could not force the adoption of the Modern system by edict. Developers needed to want to use the new code. Thus, success was a matter of education and evangelism. The Modern system developers wrote documentation describing the benefits of the new system. They provided comprehensive instructions on all aspects of using the various new components that were part of the migration. The developers attended round tables and internal hackathons. In short, they treated the Modern system as any new product that needed to sold and accepted by the user base.
Today the Modern system is completely operational at Facebook. Qu attributes the success to following the 5 Principles they development team uncovered in their efforts. As Qu says, “our codebase finally enjoys the beauty, the magnificence, and the power of modern GraphQL. Through it, we were able to deliver meaningful social interactions to billions and billions of end users.”