APIs need to be reliable and available and this can be put in question when there are sudden, unexpected increases in traffic. One way to ward off this danger is to use rate limiting. Paul Tarjan over at the Stripe blog takes you through the various types of rate limiting, when to use them and why.
Paul starts by listing the cases where you want to consider implementing rate limiting. These are: when you have significant increases in traffic from particular users, users who have misbehaving scripts that make too many calls, lots of lower priority requests that are clogging up the service or where things occasionally go badly wrong with the service and you need to guarantee high priority requests.
A rate limiter is simply a way to limit the rate at which a client makes requests. But just as important can be a load shedder, which at peak times, gets rid of low priority requests, to ensure priority requests go thru and the system stays up. With these definitions out the way, Paul lists the four types of rate limiter or load shedder.
Request Rate Limiter
This is the standard approach that limits the number of requests any user can make in a second. Stripe, Paul says, rejects many millions of requests a month thru its request rate limiter. Paul recommends keeping the same limit for development and production so that clients don’t experience any side effects from different limits in the two environments.
Concurrent Requests Limiter
This limits the number of concurrent requests that any user can make. This is useful where you have resource-intensive requests that can be slow. With such requests, users might try to impatiently remake requests that will clog up your service. For just such cases, Stripe limits every user to 20 ongoing requests. It helps them control CPU-intensive endpoints. Only 12,000 requests a month are rejected thanks to this limiter.
Fleet Usage Load Shedder
This ensures that a certain percentage of your fleet will always be available for high priority requests. Paul says to divide your API methods into critical/non-critical. Stripe has a Redis cluster to keep track of the number of requests for each type. If non-critical requests reach their 80% allocation, further requests are rejected with a 500 error. Stripe rarely uses this type of load shedder but it has stopped outages.
Worker Utilization Shedder
Most API services use a set of workers to respond to requests in parallel. Your last line of defense if the workers get backed up is to shed low priority requests. This kind of load shedder should only be triggered during major incidents. Stripe divides its traffic into four categories: critical, post, get and test mode requests. If workers become too busy, low priority requests are dropped starting with test mode traffic. You need to shed load slowly so you don’t experience flapping, where the problem disappears, and then reappears again. It takes trial and error to get this right. Stripe only rejected 100 requests last month because of it.
Applying Rate Limiters
To apply a rate limiter, Paul recommends the token bucket algorithm, that has a centralized bucket host where you throw tokens for requests and then slowly drip more tokens into the bucket. If the bucket becomes empty, you reject the request. You can implement the algorithm with Redis or Elasticache.
Paul recommends keeping in mind a few things before implementing your rate limiter. Show clear exceptions to users where the limit is reached and implement your rate limiter in a dark launch to see how it will affect your users before announcing it. Look at what traffic was blocked, decide if that was right and then tune your parameters accordingly.