Amazon's Redshift for big data analytics -- the pros and cons

The cloud-based big data analytics database service is now available in limited release; here's what you should know

Amazon Web Services recently made its low-cost big data analysis service Redshiftavailable to a limited number of users. You can think of Redshift as a public cloud meeting a big, honking relational database, designed to support a data warehouse. You can also expect to see this cloud service tossed right into the faces of big database vendors -- namely, Oracle.

Using the AWS Management Console or the Amazon Redshift API, enterprises can provision a single 2TB database; alternatively, they can opt for a cluster of 16 2TB High Storage Extra Large (XL) nodes or 16TB Storage Eight Extra Large (8XL) nodes. In addition to 2TB or 16TB of storage, they also have 15GB or 120GB of RAM. Pricing is a reasonable 85 cents per hour for an XL node and $6.80 per hour for the 8XL node.

[ Get the no-nonsense explanations and advice you need to take real advantage of cloud computing in InfoWorld editors' 21-page Cloud Computing Deep Dive PDF special report. | Stay up on the cloud with InfoWorld's Cloud Computing Report newsletter. ]

As with any other technology, you have to consider the good and the bad aspects of Redshift. Here's what's good:

  • The ability to provision huge databases as needed, without going through a costly and slow procurement process to obtain the hardware and software
  • The ability to scale to handle huge databases, perhaps well beyond the petabyte range
  • The potential to use an elastic set of resources to return result sets with enough speed to be actually relevant when operating a business
  • The potential to save huge amounts of money over the years versus the cost of using your own hardware and software

And the bad:

  • The possibility of outages; it's not that your internal data warehouse does not go down at times, but any failures will be public and give cloud computing a black eye internally
  • The costs of data migration and integration; in many instances, you'll need huge amounts of bandwidth to transmit the data from internal systems to the cloud-hosted Redshift, or you'll be shipping USB drives via FedEx to Amazon Web Services
  • A lack of best practices; we just started with public cloud-hosted data warehouses and clearly have some things to learn
  • The possibility of higher costs; although many organizations will find cost savings with cloud-hosted databases such as Redshift, many will discover that their cloud computing bill is much higher than anticipated -- perhaps exceeding the cost of an on-premise database

I predict Redshift will succeed, as will others like it. However, let's open our eyes before we begin the migration. We need to take a breath and do some planning.

This article, "Amazon's Redshift for big data analytics -- the pros and cons ," originally appeared at InfoWorld.com. Read more of David Linthicum's Cloud Computing blog and track the latest developments in cloud computing at InfoWorld.com. For the latest business technology news, follow InfoWorld.com on Twitter.

Copyright © 2013 IDG Communications, Inc.