Amazon Redshift is a petabyte-scale data warehouse service that makes it simple and cost-effective to efficiently analyze data using existing business intelligence tools. It is optimized for datasets ranging from a few hundred gigabytes to a petabyte or more and costs less than $1,000 per terabyte per year, a tenth the cost of most traditional data warehousing solutions. Amazon announced that Redshift is now in general availability and out of limited preview.
What is nice is the big list of vendors willing to partner with Amazon on Redshift, further underlying Amazon's status as not only the leading but also most open cloud computing platform today. The list of Redshift partners are Actuate, Birst, Jaspersoft, Microstrategy, Pentaho, Pervasive, Tableau Software (among Business Intelligence partners ), and Attunity, Informatica , Talend (among Data Integration partners ) and Cognizant and Full360 (among Systems Integrators and Consulting partners). However, Amazon Redshift is available in US East (N. Virginia) with additional regions coming soon.
Like every AWS service, you can create and manipulate an Amazon Redshift cluster using a set of web service APIs. You can create a cluster programmatically, keep it around as long as you need it, and then delete it, all with a couple of calls. You can connect to Amazon Redshift using many different SQL client applications including SQL Workbench.The final step is to load data. You can bulk load data from Amazon S3 into an Amazon Redshift database with a single command: To take advantage of parallelism, you can split your data into multiple files within the folder and the bulk load command will load the data in parallel into each Compute Node.
Amazon Redshift integrates directly with Amazon S3 and Amazon DynamoDB. Using AWS Data Pipeline we can pull data from Amazon Elastic MapReduce, Amazon RDS, and your Amazon EC2 databases. Amazon Redshift supports Amazon VPC out of the box and it is possible can encrypt all your data and backups with just a few clicks.
Amazon Redshift Cluster Management Guide – The Management Guide shows you how to create and manage Amazon Redshift clusters.
- If you are an application developer, you can use the Amazon Redshift Query API to manage clusters programmatically. Additionally, the AWS SDKs for Java, .NET and other languages provide class libraries that wrap the underlying Amazon Redshift API to simplify your programming tasks. If you prefer a more interactive way of managing clusters, you can use the Amazon Redshift console and the AWS command line interface (AWS CLI). For information about the API and CLI, go to the following manuals:
- Amazon Redshift Database Developer Guide – If you are a database developer, the Amazon Redshift Database Developer Guide explains how to design, build, query, and maintain the databases that make up your data warehouse
Amazon Redshift supports two types of data warehouse nodes, a High Storage Extra Large (XL) with 2TB of storage and a High Storage Eight Extra Large (8XL) with 16TB of storage. The price for an 8XL node is simply eight times the price of an XL node. Detailed pricing is available here and price ranges from $0.850 per hour on demand to $0.114 per hour for reserved single node instances.
This of course puts Amazon directly to the top in the cloud based data warehouse market , and it would be interesting to see how traditional data warehouse vendors as well as players like Windows Azure and Google Cloud APIs react to this.
Terabytes of Data? Just another Amazon Redshift API call away!