Why Github's Scientist 1.0 Could Be Great for API Versioning

Let's face it. When it comes to managing an API's lifecycle, one of the biggest pains in the ass occurs when it comes time to version it. For the sake of this article, I'm going to loosely define "versioning" as changes that impact either an API's contract itself (the exact parameters for calling it and the payload you get in return) or the infrastructure (or any part thereof) that powers it. That latter part is critical because, after all, one of the primary benefits of disassembling your entire IT infrastructure into a bunch of API-driven services is that you can more easily make substitutions to whatever technology lives behind those services.

For example, after such disassembly, if the technology behind your inventory lookup API turns out to be slow and expensive, you can more easily replace that technology with something faster and cheaper. As long as the API contract doesn't change (the same way an electrical socket delivers 120 volts regardless of whether it's powered by coal or wind), the applications that consume that API shouldn't care (the same way your hair dryer doesn't care where its power is ultimately coming from).

Versioning is tricky business though. If you as the API provider make one tiny mistake -- if the contract changes ever so slightly in a way that doesn't meet the expectations of consuming applications -- there could be a lot of breakage, angry developers, and disappointed end-users. This is why just about any standard tool that helps to assure contract continuity during API versioning is a welcome addition to the API provider's bag of tricks. And although its application to APIs wasn't necessarily discussed as a part of the recent open source release of Scientist 1.0, the tool's developer (Github) agreed that it is very well-suited to certain API versioning exercises.

So what is Scientist 1.0? According to a blog post by Github principal engineer Jesse Toth, Scientist is designed to "to help you rewrite critical code with confidence." But it could have just as easily said that Scientist can help you version your API with confidence. Like other open source projects that have scratched an itch for their creators (ie: Facebook's React), Scientist addressed a critical need as Toth looked to rewrite the entire codebase behind Github's permissions scheme. Her blog post goes to say the following:

"There is a fairly common architectural pattern for making large-scale changes known as Branch by Abstraction. It works by inserting an abstraction layer around the code you plan to change. The abstraction simply delegates to the existing code to begin with. Once you have the new code in place, you can flip a switch in the abstraction to begin substituting the new code for the old.

Using abstractions ...doesn't really ensure that the behavior of the new system will match the old system.... [in Github's case], we needed to ensure not only that the new system would be used in all places that the old system was, but also that its behavior would be correct and match what the old system did."

For Github, adding such behavioral monitoring amounted to comparing the outcomes of "competing" function calls (the old vs. the new) in the source code. While still relying on the old code for production outcomes, Scientist therefore compares a new function's performance and output to that of the old one it's designed to replace. Github likes to save its outcomes to a redis.io data structure. But nothing says you can't hack the open source to save your results to your favorite storage platform.  Once the comparative outcomes are stored, it makes it possible to easily spot undesirable outcomes. For example, mismatched results (where they should be equal) or exceedingly slow performance of the replacement.

Scientist processes inbound data with both the new and legacy code and stores the results of both for subsequent comparison. Meanwhile, it still returns the legacy code's results to ensure continuity of the "working" system.

The rigor of Scientist versus standard testing approaches is not to be underestimated.  Whereas standard testing approaches typically deal with a mock or static data set, Scientist can deal with the production data that's coarsing through your system's veins every day. 

As I read about Scientist, I couldn't help but wonder if it could be applied to APIs just as easily as it could be applied to functions. After all, isn't an API call just a glorified function call? In a telephone interview, Toth agreed saying there's no reason it wouldn't work. In fact, we batted around a couple of different implementation scenarios. 

One scenario involved replacing the code behind an API. In this scenario, the API endpoint collects input from the calling application and forwards that input as parameters to functions called by the API's underlying code. In this scenario, Scientist would be used to compare the responses of the old and replacement functions. In this scenario, everything behind the API would more than likely involve the same platform. For example, in Github's case, the Scientist 1.0 library that it open sourced is Ruby-based and so it's ideal for testing replacement Ruby functions or gems. Although Github is focused on the Ruby version, Toth says other developers are coming forward with versions for their favorite platforms including Node.js, Python, C#, and Erlang.

But what about the scenario where you're taking advantage of an API's ability to separate concerns? You know, the one where you can rip out and replace everything that powers the endpoint in the same way that your local power utility can rip and replace coal with wind or solar? Whereas the inbound request is deserialized and forwarded to functions in the first scenario, this scenario involves branching from a virtual or proxy endpoint to the old and new (replacement) endpoints (which in turn implies that you should be thinking about virtualizing your endpoints from the get-go). Yes, if your API endpoints are already virtualized, you are architecturally in a much better position to leverage Scientist in a way that is agnostic to platform. Scientist can easily compare the responses from two separate (old and new) endpoints regardless of what platform they're running on. 

There are also API versioning exercises to which Scientist is not well-suited. For example, if you know you need to break your contract (and break it badly), then you've already eliminated the expectation that two different forms of your API will return the same thing. In which case, Scientist will be of little or no use (although you may want to virtualize the new endpoint to set yourself up for Scientist on the next go around!). But if you're adding new resources to your API's existing resource set, changing the underlying infrastructure, or maybe even changing something that's less consequential to the APIs response (for example, the method of authentication), Scientist might be worth checking out. 

David Berlind is the editor-in-chief of ProgrammableWeb.com. You can reach him at david.berlind@programmableweb.com. Connect to David on Twitter at @dberlind or on LinkedIn, put him in a Google+ circle, or friend him on Facebook.

Comments