How to choose the right CDW: Redshift or Snowflake
- Adam Wealand
- April 2, 2020
The big data and analytics industry has mainly lived up to the hype and transformed into the underlying force that drives businesses. Over the past 15 years, the vast amount of data generated every second has grown exponentially. This has led to the emergence of the enterprise cloud data warehouse (CDW for short) – the latest technology that efficiently manages large analytics workloads.
When I am at conferences, I receive many questions from our visitors on cloud warehouses (and how our solutions work with this technology). Questions span from what, who, why – and many more. The most common question is which enterprise cloud data warehouse is the “best.”
Data warehouses are critical to efficiently utilizing data to derive business insights. When folks ask which one is best – they really mean, “Which data warehouse is best for my business?”
There are three CDW giants in the market at the moment:
Here is an unbiased look at comparing two popular cloud warehouse providers to help you make the right choice. We will look at two of the most popular relational DBMS database models on the market, Redshift, and Snowflake. Although there are more similarities than differences, these differences are quite significant.
What is Redshift?
Redshift can be described as a fully-managed, cloud-ready petabyte-scale data warehouse service that can be seamlessly integrated with business intelligence (BI) tools. So all you have to do is Extract, Transform, Load (ETL) into the warehouse to start making smarter business decisions.
Amazon makes it quite easy for you to start with a few hundred gigabytes of data and scale up or down seamlessly, based on immediate demands. This allows businesses to leverage their data to acquire valuable business insights about themselves or their customers.
To launch your cloud data warehouse, you have to launch a set of nodes known as a Redshift cluster. Once you have provisioned the cluster, data sets can be uploaded to run data analysis queries.
What is Snowflake?
Like Redshift, Snowflake is also a powerful relational database management system. It’s offered as an analytic data warehouse for structured and semi-structured data that follows a Software-as-a-Service (SaaS) model.
This means that it’s not built on top of an existing database or a big data software platform. Instead, Snowflake uses an SQL database engine with unique architecture that was specifically designed for the cloud. This data and analytics solution is also fast, user-friendly, and offers more flexibility than traditional data warehouses.
Both Redshift ETL and Snowflake ETL have an abundance of similarities between the two solutions. However, there are additional unique capabilities and other functionalities that come with each platform. If you’re considering running your data analytics workload entirely on the cloud, for example, the similarities between these two solutions are far greater than their differences.
Snowflake offers cloud-based data storage and analytics in the form of the Snowflake Elastic Data Warehouse. In this scenario, users store data using cloud-based hardware and software. Data is stored in Amazon S3. If you’re using Snowflake ETL, you can leverage the public cloud ecosystem.
Both of these cloud warehouse systems are powerful and offer some unique features to managing data. However, there are definitely differences….
To choose the right solution for your company, you should, at the very least, compare integration, maintenance, and costs. I’ll do just that in the following sections, along with the side-by-side pros and cons of both the solutions.
Comparing Redshift and Snowflake: Integration
If your company is already working with AWS, then Redshift might seem like the natural choice. However, you can also find Snowflake on the AWS Marketplace with on-demand functions.
If you’re already leveraging AWS services, Redshift can be integrated seamlessly.
However, if you are going to use Snowflake, it’s important to note that it doesn’t have the same integrations as Redshift. This will make it challenging to integrate the data warehouse with tools such as AWS Athena.
However, Snowflake makes up for this with a variety of integration options like Apache Spark. While Redshift is the more established solution, Snowflake has made some significant strides over the last few years.
Comparing Redshift and Snowflake: Maintenance
With Amazon’s Redshift, users compete over available resources in a cluster. This problem doesn’t exist with Snowflake since users can start different data warehouses (of various sizes) to look at the same data.
When it comes to vacuuming and cleaning tables, Snowflake provides a turnkey solution. With Redshift, cleaning tables can become a problem as it can be challenging to scale up or down.
Redshift Resize operations can also become expensive and lead to significant downtime. Since compute and storage are separate in Snowflake, users do not have to copy data to scale up or down but can switch data compute capacity at will.
Comparing Redshift and Snowflake: Costs
Both Snowflake ETL and Redshift ETL have very different pricing models.
Redshift calculates costs based on a per-hour per-node basis. So customers can calculate their monthly cost in a simple formula:
Redshift Monthly Cost = [Price Per Hour] x [Cluster Size] x [Hours per Month]
Snowflake’s charges heavily depend on monthly usage patterns. Each bill is generated by the hour for each virtual data warehouse. Besides, data storage costs are also separate from computational costs.
Storage costs on Snowflake can start at an average compressed amount at a flat rate per terabyte. This will be accrued daily and billed each month. However, compute costs are per second or credit. Snowflake’s compute is referred to as virtual warehouses. The size of the virtual warehouse determines how fast queries will run.
This can quickly become complicated because Snowflake offers eight tiers of computational warehouses, with the smallest cluster costing one credit per hour. These costs will double as you go up a level up to 128 credits per hour for the 4XL tier.
As a result, Redshift tends to be less expensive compared to Snowflake’s on-demand pricing.
Comparing Redshift and Snowflake: Pros & Cons
Your data warehouse decision has to be made based on your daily usage patterns and the amount of data you’re going to work with.
The choice between Redshift and Snowflake will be relative to your resources and specific business demands. Snowflake is a better platform to start and grow with. Redshift is a solid cost-efficient solution for enterprise-level implementations.
Regardless of which cloud data warehouse you choose, Datameer Spotlight can help you migrate and harness your data safely and securely. If you decide to go with both data warehouses, as many large organizations do, Datameer Spotlight will integrate Redshift and Snowflake, and your team will develop insights 20 times faster across the enterprise for blazing-fast analytics and data science. Reach out to one of our experts for a free trial.