Over the years, ProgrammableWeb has followed the arc of APImetrics, a Seattle-based company that wakes up every morning thinking about one thing; the performance of all those APIs out there in the API economy. In this day and age when so much is hosted in one of the public clouds, API performance is a complicated topic. For example, certain APIs are simultaneously hosted out of multiple global data centers in a way that an API's performance might vary wildly, depending on which of a public cloud's global datacenters is hosting it. This could impact developers that are looking to incorporate reliable, well-performing APIs into their applications. On the API provider side of the equation, providers looking to offer reliable, well-performing APIs might look to avoid certain datacenters with a reputation for worse-than-average latency, or worse, downtime.
It's detailed statstics like these that APImetrics keeps a close eye on on behalf of its customers for whom APIs are mission critical. After years of gathering all sorts of performance data about APIs from across the API economy, APImetrics launched a new service -- API.Expert -- in January 2020 to make it easier to rank and compare API providers based on their performance. To find out more, I interviewed the company's CEO and co-founder David O'Neill who not only talks about the virtues of the new service, he gives us a demonstration.
The interview can be consumed any way you like; via video, audio, or a full-text transcript. All versions are embedded below.
How API.Expert Ranks Performance of Common APIs Used By Developers
Editor's Note: This and other original video content (interviews, demos, etc.) from ProgrammableWeb can also be found on ProgrammableWeb's YouTube Channel.
Editor's note: ProgrammableWeb has started a podcast called ProgrammableWeb's Developers Rock Podcast. To subscribe to the podcast with an iPhone, go to ProgrammableWeb's iTunes channel. To subscribe via Google Play Music, go to our Google Play Music channel. Or point your podcatcher to our SoundCloud RSS feed or tune into our station on SoundCloud.
Transcript of: How API.Expert Ranks Performance of Common APIs Used By Developers
David Berlind: Hi, I'm David Berlind and this is another edition of ProgrammableWeb's Developers Rock Podcast. Of course developers, totally rock. We love them and that's why we do these shows to bring more information about what's going on around the industry to developers as well as API providers. And today with me is David O'Neill. He is not only another David, I love all other Davids, but he's also the CEO and co-founder of APImetrics. David, thanks very much for joining us on the show today.
David O'Neill: David, thank you so much for having me. It'sa pleasure to be here.
Berlind: It's great to have you. So let's first start out, what does APImetrics do?
O'Neill: So APImetrics is a API monitoring platform that monitors any API from the outside in as if you were an actual user or partner built right into it.
Berlind: And when you say monitors the API, what is it that you monitor and who would care about this?
O'Neill: We monitor the functionality of the actual APIs provided. Ideally the production APIs. Typically, our customers, our customer success teams, CIO offices, people on the sharp end of the gap between developer operations and customer support where it's entirely possible for everything to look like it's working, but actually you've got a significant outage but it may only affect one customer or one particular region of the world, and we've got a solution that that spots across different clouds where the performance problems are and warns you if things are not as good as you expect them to be.
Berlind: Wait a minute, wait a minute, wait, wait. ProgrammableWeb is all about how wonderful and great API's are. Nothing ever goes wrong. Are you trying to tell me that sometimes things go wrong with APIs?
O'Neill: I couldn't possibly comment, but yes. And usually in pretty entertaining ways. We've got hours of war stories on APIs that looked like they were working just great, but actually had failed in very entertaining ways. My favorite is a major UK bank had an API that suddenly got a lot faster. Crack the champagne. Everybody was excited. They had a tenfold speed increase. It was a week before anybody noticed. They stopped returning any data.
Berlind: Oh, no! So that's a sort of failure that API metrics would spot and say, maybe send an alert out, "hey, you've got a problem!""
O'Neill: Yeah, that's exactly what we look for. So we actually check the ... You are getting back what you expected, and I know this used to annoy one of our competitors, but 200 is not always okay.
Berlind: 200, and we're referring to... 200 is the HTTP code that comes back every time you get a request to either a web server or an API endpoint. A web-based one. So that's what you're talking about. 200, okay.
O'Neill: HTTP codes lie as I've begun to —
Berlind: No, they don't. Come on. They don't lie! (Laughter)
O'Neill: They're set up incorrectly. We've got a lot of customers who use 200 for everything. So even if it fails, it returns a code saying, "okay. Yeah, that didn't work".
Berlind: That's not recommended. Don't set up 200 as your return code for everything your API does. There's got to be a few other codes in there. And by the way, in the documentation, we often say this, make sure that if you have a bunch of "error" codes or "okay" codes, it doesn't matter that you have a list of them and what they mean, right?
O'Neill: Absolutely. But the interesting thing for us when we actually start returning data to people sometimes haven't even seen the error codes their systems put out. We had one conversation with a client where the dev ops engineer said, well that that code can't possibly be there. It's impossible. That means there's been a complete system failure and it's rebooting. And it did that about five or six times a day. The system would be down most of the day. And they went away and had a look at a different logging system and went, "oh yeah, the system's pretty much down all day. You didn't notice.""
Berlind: So let's put a couple of things together. First, we've got APIs out there that sometimes don't behave exactly the way we want them to behave. And then two, we've got API metrics' out there and you monitor those APIs to make sure that they're healthy and running and providing the data that they should be providing. Why do we need another company like APImetrics to do this? It seems like this would be built into the API provisioning systems.
O'Neill: So this is a discussion we have a lot with potential clients. Usually after they've had a bad day where people have complained about the service and they said they've been down, and yet AWS CloudWatch or the gateway had said, "hey, everything's fine." What's happening? What we realized was, unless you're monitoring your APIs from where your customers are or where they're building their application stacks, you don't actually have a clue what the API is doing. So one of the things that we were surprised to find is there's differences between clouds. So, an API that works great on Azure in one location may not work so well from another. There is a dependence in the industry on a small number of clouds, but the cloud that the person building an application talks to your API in may be different to yours. And there may be some fundamental networking incompatibilities between what they are building on and your infrastructure.
Berlind: I see.
O'Neill: You don't know about that. You could have a customer who complains all the time, but actually there's no easy fix or solution and that's the sort of thing that we help people understand.
Berlind: I see. Now when you monitor the API, do you also make requests of the different resources on the API to see if it's returning data, you'd go to that level of depth? Or is it more sort of just checking to make sure that it's responding when it's being pinged or however it is you do it? I don't know how you do it.
O'Neill: No. So, we create a fully functional API calls with the correct layers of security that actually exercise the APIs, ideally in production.
Berlind: All right. You need to know something about the API and how to call it and what the expected return is to make sure that it's working properly then. Is that right?
O'Neill: Yes, absolutely. So we can integrate to common tools like Postman. So you could take an entire Postman collection, for example, and run it through our infrastructure so you can see how the API is responding and passing against all the checks you've set up. Or you can set them up inside our product, and then we look for networking performance issues. We look for whether you returned all the JSON you expect, whether you return what the API was meant to return. But that sort of level is where we excel at. And then we do a bunch of machine learning based analysis on essentially the stability of performance. One of the things we noticed early on is you can get deluged with data and it's very easy to look at the average response times and go, "well, that's a pretty good average", but miss out whether there are actually trailing edges that could cause you problems. So we often see, and I can show you some examples later. APIs where the median response time might be half a second, which is okay, but 5% of calls take over 29 seconds, actual seconds. So if you're doing hundreds of thousands of transactions and 5% of them take 30 seconds or more, that's a problem.
Berlind: That is a problem. And, I suppose when you're able to spot those problems, I mean that sort of response time, not to mention if it's just down altogether, this would also be important to providers, API providers who have SLA agreements with their customers, right? Because sometimes customers will encode specific requirements, performance requirements, uptime requirements right into their SLAs. Is that a issue for some of your customers?
O'Neill: It's becoming an issue. I'll be honest, SLAs are still poorly enforced and poorly specified. But I believe over the next few years, and this is something we're passionate about, over the next few years, it will become essential. You can't build systems reliant on APIs and not specify the quality of service you'll get.
Berlind: I understand.
O'Neill: The other problem we do see with APIs is they are... how shall I put this... you have to prove the SLA was missed. You're very rarely told that the SLA was missed. So in the case of some very, very famous global cloud service providers, they'll only give you money back if you can prove that they didn't deliver to you. That's a lot of work and actually very hard to do because they will bring receipts that will prove that they were working just fine. And this is something we also see in the regulatory space around open banking APIs where it's entirely possible for both parties to bring a regulator proof that they were working fine. And then the regulators is not technically able to determine whether bank "A" was up and TPP service "B" was wrong. And that's something we see a huge gap in the market for that we're trying to fill with API metrics.
Berlind: I see. So, let's see. You've got some news, I think before we arranged to do this interview you said, "Hey, we've got some news coming up." So why don't we dive into that? What's new?
O'Neill: Absolutely. So one of the things we realized is there's a lack of actual actionable data in the API space. There's great services like ProgrammableWeb and others who provide you with the APIs. It's actually very hard to understand which APIs work better than others and how they rank in comparison to each other. So we're launching a service called API Expert at api.expert, that will provide rankings of common APIs people use. The ranking method we have created is called CASC. C-A-S-C. Cloud, API, Service, Consistency. And that's a blended metric where we look at all the data points that come back and it's like a credit score. We score it out of a thousand. The closer to a thousand the closer to perfect the API is. As the score goes down, we expect to see more problems and more issues. So anything above 800 is pretty good. 600 to 800, you'll see some issues in the performance. Below 600 and lower, the API would be really not acceptable for use in our opinion.
O'Neill: Okay. So, you said anything around 100 is good. You meant everything around a thousand is good.
Berlind: I'm sorry. Everything around a thousand is good.
Berlind: I just want to double check that. Okay. So you've got a scoring system and when you're doing a comparison, do you typically compare APIs that target similar applications like storage APIs or a telephony APIs? Is that what you would typically compare one to another or do you just compare anything that the end-user wants to compare?
O'Neill: So, the beauty of the scoring system is we can compare apples to bananas. We don't have to compare like for like. Obviously if you're interested in storage systems, comparing Box and Dropbox is useful. At the moment we have a group of categories we look at corporate infrastructure APIs, we look at social networking APIs, cryptocurrency exchanges, banking and tech APIs. And we're currently expanding our API coverage. And honestly anybody who's out there who has an API they would like us to add to the list, they just need to set the API up in our system, tell us they want to share it. And we've got a community system we're sharing those into our public reporting system. So we are currently adding in things like telephony and so on.
Berlind: So, is there something to look at? Can you give us a little bit to have a demonstration of this?
O'Neill: Absolutely. So let me share my screen. Let's do that. So here we have, this is the last, and this is for a corporate infrastructure. So we're looking at 13 different common corporate APIs here from GitHub through Microsoft Pivotal Tracker, Mailchimp, a few storage systems. And what we're doing is looking at the costs goal for the month. So. what we expect with these big APIs is we'll see they're all pretty good. They all have solid high scores with the exception of Cisco Spark, we're not sure quite what's going on there.
O'Neill: And then we look at the pass rate. "Outliers" is essentially calls that fall outside of what we expect to be the statistical normal performance. And then we look at the median latency. So we can say for December, GitHub had the best uptime. 100% uptime, fewest outliers based on the calls we did. Cisco Spark APIs, we counted four hours of downtime. That's actual periods where we just could not make successful API calls into our account with them. And then speed-wise, Slack was the fastest, 400 milliseconds for an 18 milliseconds at the 95th percentile. 99th percentile, 900 milliseconds. So everything on Slack to took under a second. Same stats for Cisco Spark.
Berlind: So those guys that Slack are not slacking at all. That's what you're saying?
O'Neill: They are not. No. They really have got that working. Cisco, we're not sure what's going on there. Bumping up...
Berlind: They're lacking a little spark in their API.
O'Neill: Haha! Yes, they are!
Berlind: Sorry, sorry I couldn't help myself on that one.
O'Neill: No, I'm sure we can find some more puns in a second as well. So, if we look at the latency data, and obviously our product, if you use it, it goes into much more detail than this. This is just rolled up information, but we want to keep the flavor of what they need to know to really understand how performance is working. So we have best cloud for GitHub if you're building an application built on AWS in North America, that's probably suggesting North America is where they have the data center they're using. Were South America on Azure, there's nearly three X, two X difference between North America and South America on that and that's pretty common. We see huge global variations in performance and speed and then you can pick different metrics. So I'm going to, I'm going to pick an example I know is weird looking. So I'm going to go to DocuSign. So significant difference again between North America and in this case, South Asia and Google. Medians 700 milliseconds, 500 milliseconds, through to two and a half seconds. But if I look at something like DNS lookup time, and DNS, everybody feels that's just... you let the cloud do that. The cloud will sort it out for you. But actually if we look at DocuSign, we see extremely long name lookup times. So DNS resolution time, that's the time it takes the internet to figure out where the resource is that you're calling. So it's doing a name look-up to Google or whomever, and then it's trying to figure out, well, where is that server? Where do I send the query to? We see huge variation there. So you're seeing lookup times of 250 milliseconds, a quarter of a second, half a second. Extreme case here, Google, South Asia, 1.5 seconds. So these are overheads on your API traffic. If we pick somebody else. Let's say look at GitHub, same queries, you'll see it's markedly different. So a lot of the calls only take four milliseconds. So that's indicating they spend a lot of time optimizing the infrastructure, the internet, to make sure no one has trouble finding GitHub.
O'Neill: These are things that API developers don't often think about because it's nothing to do with your API per se...
Berlind: It may not be something to do with your API, whatever the problem is. And DNS in that case yes, I understand. Right? But yeah, overall, overall, any performance hit, there's a whole chain of events that happen every time we make a call, including the DNS and somewhere in there something's going wrong. So you have to go and deconstruct the whole conversation and figure out where the breakdown is. But this is helping you do it. It helps you spot whether it might be in the DNS look up or not.
O'Neill: Yes. And also just spot for clouds people should avoid. So we have with some of our commercial clients, we have extreme dish differences. So a call that works great on Azure from the UK, doesn't work at all from an AWS location in Europe.
Berlind: Hmmm. Interesting.
O'Neill: I had one API we work with where no call from Finland is successful. Finland's just blocked by...
Berlind: Somebody, yeah, gremlins.
O'Neill: No one was really sure what were the root cause is, but it's there. So these are things to know when you're setting up API is, "don't do that, use this."" Being able to tell your developers what they should do rather than just somebody going, well, I use AWS, I'll click deploy default AWS East.
Berlind: You create a lot of data that your customers can use to make some decisions about their APIs and the infrastructure behind them. But at the same time you can make some broad assumptions across the globe that apply to everybody. For example, the ones you were just saying about Finland. Do you create a report for all of your customers to warn them off from doing something like that, since you're making the observation in the first place, you don't want to make them go out and figure it out for themselves, right?
O'Neill: No, so we're actually working on a 2019 roll up and when we're ready to do it, we'll put all of 2019's data up on the API Expert for all the APIs we monitor.
Berlind: Will you do a report? Some sort of qualitative report that somebody can review and say, "Hey, wait a minute. There's there's 10 things in here that David O'Neill is saying don't do, so we better not do those because it just keeps bubbling up in API metrics as being problematic." Do you give them a set of things to avoid or best practices?
O'Neill: We do that, although a lot of it is case by case. But we will be issuing a report this year on some of the "do's and don'ts "to avoid things that we have noticed persistent problems. And think it's just good practice to do. One example I'll give everyone for free is by all means use a CDN service like Akamai or Cloudflare. They're very good. They will give you vast improvements in resolution times, but don't just take it. This is something everyone should trust but verify. Don't just take it as a given that what they have done works. Actually go in and look and see whether you've got the improvement you wanted. We see customers who have paid money for CDN services that are not getting any noticeable improvement. Sometimes that's because they haven't told people where the resources are, or they haven't verified that they're...
Berlind:...CDN's configired properly...
O'Neill:Yeah, and that's something, I think there's a reliance in the industry and this is the broader cloud that the cloud infrastructure vendors do it all for you and it's semi-magical and I click deploy and somebody in Seattle at Amazon or Microsoft has magically made all my problems go away. And the reality is that's not true. There is no magic elf who is going to make your system work perfectly. You don't have to check anything.
Berlind: What?! There's no magic elf?
O'Neill: There are no magic elves. I'm sorry. I should've said this before Christmas.
Berlind: We should probably change that. We should probably suggest, and by the way, that wouldn't be a bad job to have to be a magic elf. Probably good benefits if you could a wing some magic on some APIs there. api.expert is where you find the service. Is this service available separately from the core API metrics offering? Or is it bundled in to customers of API metrics? How does that work?
O'Neill: No, API expert is entirely free. We are putting it out there as... The headline dates are the weekly and monthly reports. They will be completely free just as a resource people can look at. If you scroll down, we also have live data for the previous 24 hours, so if you want to see whether it's up or not we can, we'll provide that data for free. If you want to get into the weeds or get into deeper details, then yes you'll need an APImetrics account. And you'll need to contact us about that. We also are planning to white label it so if you have data sets you want to put out there and you want monitored, we can do that as a managed service. We've actually got some interest from government and banking clients on that already.
Berlind: For API providers for example, that want to be transparent about the operational effectiveness of their business. Sure. Okay.
O'Neill: Yes, and if you want to sign up and actually do it for yourself in the core product we go, we roll all the data up for API Expert so you just get domain level data. In the core project you'll get API specific data, and the ability to test deeper things like test your own infrastructure. Verify down to a per-location granularity rather than a region, and a whole bunch of other tools for alerting and reporting.
Berlind: Oh, very cool. David O'Neill, the CEO and co-founder of APImetrics. Thanks for showing off your new service API Expert today.
O'Neill: No problem. Thank you for having us David. I really appreciate that and had a good time.
Berlind: It was great to have you. And thank you to everybody who watched this video. You know if you want more videos just like this one, you can go to programmableweb.com, and we've got not only the videos loaded up there into articles, but we have a full text transcript of everything that was said in the interview so you can see that, too. Or if you just want to watch the videos, you can go to our YouTube channel at www.youtube.com/programmableweb. For now I'm going to sign off. I'll see you at the next video. Thanks for joining us.