Amazon Redshift Light Analysis

Good

  • The ability to provision huge databases as needed, without going through a costly and slow procurement process to obtain the hardware and software
  • The ability to scale to handle huge databases, perhaps well beyond the petabyte range
  • The potential to use an elastic set of resources to return result sets with enough speed to be actually relevant when operating a business
  • The potential to save huge amounts of money over the years versus the cost of using your own hardware and software
  • Built and optimized for doing aggregation queries over large sets of data. When we want to answer a question with Redshift, we just write a SQL query and get an answer within a few minutes—if not seconds.
  • From a user perspective , any one of our developers can write a SQL query and they have an answer to their question in less than 5 minutes. Moving from even a Hadoop based workflow to an interactive console session with Redshift is a major improvement. I assume moving from RDBMS to this would definitely be a big one !!
  • Additionally, since much of the user facing bits of Redshift are based on PostgreSQL there is a large ecosystem of mature, well-documented tools and libraries for us to take advantage of.
  • Impressive web management console Amazon provides with Redshift. For a 1.0 product, the console is comprehensive and offers much more information than we expected it to.

Bad

  • The possibility of outages; it’s not that your internal data warehouse does not go down at times, but any failures will be public and give cloud computing a black eye internally
  • The costs of data migration and integration; in many instances, you’ll need huge amounts of bandwidth to transmit the data from internal systems to the cloud-hosted Redshift, or you’ll be shipping USB drives via FedEx to Amazon Web Services
  • A lack of best practices; we just started with public cloud-hosted data warehouses and clearly have some things to learn
  • The possibility of higher costs; although many organizations will find cost savings with cloud-hosted databases such as Redshift, many will discover that their cloud computing bill is much higher than anticipated — perhaps exceeding the cost of an on-premise database
  • Security issues with public cloud and data leakages to that effect
  • Depending on from which instance this is accessed it will be through internet traffic as opposed to MPLS hence it can be slow
  • Write performance of Amazon Redshift is relatively low compared to „classical“ relational databases (in your data center) as you have to upload all data into the cloud
  • High Availability
  • Full table scan
  • Data Loading

Very Good Reference: 

http://word.bitly.com/post/48854093418/speeding-things-up-with-redshift

Summary from above:

Can speed up and expand our ad hoc data analysis.



Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

About Me

Over 20 years of experience developing software to support multi-million dollar revenue scale and leading global engineering teams. Hands-on leadership in building and mentoring software engineering teams. I love History as a subject and also run regularly long distances to keep myself functional.

Newsletter

%d bloggers like this: