The Amazon SimpleDB API

Amazon has once again lead the industry by launching their latest infrastructure API, SimpleDB, a programmable database in the cloud (you can see more at our new SimpleDB API profile). It's a forward thinking approach for a pay-as-you-go, scalable database that is very much in line with Amazon's other popular infrastructure services like the S3 API for storage and the EC2 API for virtual computing. As they describe it:

Amazon SimpleDB is a Web Service for running queries on structured data in real time. This service works in close conjunction with Amazon Simple Storage Service (Amazon S3) and Amazon Elastic Compute Cloud (Amazon EC2), collectively providing the ability to store, process and query data sets in the cloud. These services are designed to make web-scale computing easier and more cost-effective for developers.

Traditionally, this type of functionality has been accomplished with a clustered relational database that requires a sizable upfront investment, brings more complexity than is typically needed, and often requires a DBA to maintain and administer. In contrast, Amazon SimpleDB is easy to use and provides the core functionality of a database - real-time lookup and simple querying of structured data - without the operational complexity. Amazon SimpleDB requires no schema, automatically indexes your data and provides a simple API for storage and access. This eliminates the administrative burden of data modeling, index maintenance, and performance tuning. Developers gain access to this functionality within Amazon’s proven computing environment, are able to scale instantly, and pay only for what they use.

Pricing is based on machine utilization ($0.14 per Amazon SimpleDB Machine Hour consumed), data transfer ($0.10 per GB - all data transfer in, $0.18 or less for data out), and structured data storage ($1.50 per GB-month).

Like S3, there's "Simple" in the name for a reason as it's not aiming for multitudes of features but rather focuses on performing a core infrastructure service well. For example the core data structure is like a hash or dictionary, not a full-blown relational model. There's already been lots of discussion and debate about some of the tradeoffs here like the schemaless model, the 1024 character limit per attribute, and the need to zero-pad integers because queries are lexigraphical (see TechMeme for more). It's likely that the outside developer community will probably build wrappers, libraries and frameworks to work around and adapt these.

It's very clear that database storage in the cloud is a service that will eventually be offered by most of the major API providers. In order to track this we've now added a Database line item to our API Scorecard.