Fast growing API-as-product business Algolia, offers “search as a service.” Their realtime search API tool is used by e-commerce businesses (like Birchbox and JadoPado), media content providers (like Hacker News), and in web products and services (like the Sunrise calendar and DigitalOcean).
Already, the French startup has onboarded more than 500 customers in over 50 countries. In 2014, Algolia saw a 30% month-over-month growth in revenue, and has generated more than 105 billion user queries since opening its service.
Now with less than 3% customer churn (according to VP of Engineering Sylvain Utard), one of Algolia’s particular sales propositions to maintain and extend its customer base is its search capabilities’ reliability and low latency. CEO and Co-founder Nicolas Dessaigne has previously said that for Algolia, “Performance is our DNA.” To keep latency low and provide instant, integrated search for their customers, Algolia operates 12 data centers around the world.
Algolia uses bare metal servers with the fastest CPU available (Xeon e5 with >3.5 GHz). To provide their API service, they rely on in-memory databases, so that all data that is being searched is in memory, and backed by SSDs to speed up indexing. Algolia and its customers are writing 10 terabytes of data on a single disk every day, and with three disks per machine, and hundreds of machines, that adds up to several petabytes of both new and updated data being rewritten daily.
They also use their own specific kernel settings for every new Linux machine that comes online, and have optimized for failover by duplicating on three servers, so that even if search fails on two servers, performance is not decreased.
Dessaigne says that Algolia’s API call usage is moving from servers to the browser to mobile apps. So the only way for them to reduce latency is to get close to the end users who are doing the searches.
As a result, to aid their customers’ reliance on their search service, they have created a distributed search network, which allows their customers to choose which data centers around the world they want to replicate their data on. Wherever Algolia’s customers’ users are, they need to be able to do the search at the closest server. “It is like a CDN for search,” Dessaigne has previously said.
At API Days Mediterranea last week, Customer Solutions Engineer at Algolia Nicolas Baissas, shared eight “pro tips” on how Algolia manages its distributed API architecture.
The presentation was highly appreciated by the API Days Mediterranea audience with one attendee singling it out as one of the best of the conference:
— Ali Kheyrollahi (@aliostad) May 6, 2015
1. Logistics: No Ideal Worldwide Provider
Baissas pointed out that it is not as easy as just choosing one global cloud provider for their distributed architecture system. For example, Amazon doesn’t have global coverage (it does not have servers located in India, Eastern Europe, or Africa). As a result, Algolia uses eight providers to ensure it has a local server close to all of its user base. CEO Dessaigne has also previously noted that API providers wanting to build a similar global infrastructure must anticipate their needs: it can take up to three months to get new bare metal servers online.
2. Beware of Pricing
Data infrastructure costs also vary a lot by region. Brazil data servers, for example, face additional taxes. For API providers, this can have an impact on their business model: “Do you change your business model in different regions?” Baissas asks. Dessaigne has previously noted that Algolia negotiated a unique price with providers in order to maintain a consistent business model across all regions.
3. Replication and Sync: KISS
Baissas says one of the most complex parts of a distributed architecture is deciding how data syncing will occur. Algolia’s solution was to keep it simple: a master server is responsible for one-to-one synchronization with all regional servers to ensure a single version of truth across the global distribution infrastructure.
4. The Key is in the DNS
Algolia uses Anycast to help them identify the closest server location to use when performing search for end users. This uses the geographic IP of the user to determine which data server should process the search functions. While this is not an exact science, it does work well in most cases.
5.TLD is Important:
Top Level Domains are the roots of a website’s domain, for example, ‘.com’ or ‘org.au.’ It turns out, says Baissas, that the Top Level Domain (TLD) can in itself determine speed across a network. For example domains ending in ‘.io’ tend to perform the slowest, as the .io domain only has 6 central servers. Algolia has found ‘.net’ domains provide the best performance.
6. The Devil is in the Details
Baissas recommends that API providers keen to replicate the Algolia distributed architecture system should use a service like peeringdb.com to test peer access from a user perspective. For example, Australian search requests to a Singapore-based server might be likely to go via Japan, then Hong Kong, and then to Singapore because of a routing agreement Singapore has with Japan. This sort of communication flow can lead to search results being returned in 150 milliseconds rather than 40.
7. Build Your Own Monitoring
Algolia found it easiest to build their own monitoring system to manage their global data architecture. “Current solutions are not good enough, as you must test the quality worldwide,” says Baissas. Algolia’s own infrastructure uses cheap virtual private servers (VPS) that took one week to build, with a VPS in each country. “The turning point for us was that we wanted to test if the indexing process worked, but this type of monitoring was difficult so we built our own,” says Baissas.
8. Failover in API Clients
Baissas recommends a distributed architecture do more than just rely on the DNS to shepherd API calls to the nearest data server. Algolia has ended up implementing retries in API clients so that at first, the API call will try the closest location, then fall back on master machines in the region.
Baissas argues that the “Future of APIs is distributed.” While building a worldwide infrastructure has been a challenge, for Algolia now, it has become a key asset in its business approach, and a key point of differentiation in its market.
For more details on Algolia’s distributed API application architecture, Cofounder and CTO Julien Lemoine has documented the infrastructure on the High Scalability blog.