How to choose the right CDW: Redshift or Snowflake

  • Datameer, Inc.
  • April 2, 2020
texture of red cargo ship container located with blue sky backgr

The big data and analytics industry has mainly lived up to the hype and transformed into the underlying force that drives businesses. Over the past 15 years, the vast amount of data generated every second has grown exponentially. This has led to the emergence of the enterprise cloud data warehouse (CDW for short) – the latest technology that efficiently manages large analytics workloads.

When I am at conferences, I receive many questions from our visitors on cloud warehouses (and how our solutions work with this technology). Questions span from what, who, why – and many more. The most common question is which enterprise cloud data warehouse is the “best.”

Data warehouses are critical to efficiently utilizing data to derive business insights. When folks ask which one is best – they really mean, “ Which data warehouse is best for my business? ” 

There are three CDW giants in the market at the moment:

Amazon Redshift

Google BigQuery

Snowflake

Here is an unbiased look at comparing two popular cloud warehouse providers to help you make the right choice.  We will look at two of the most popular relational DBMS database models on the market, Redshift, and Snowflake. Although there are more similarities than differences, these differences are quite significant.

What is Redshift?

Redshift can be described as a fully-managed, cloud-ready petabyte-scale data warehouse service that can be seamlessly integrated with business intelligence (BI) tools. So all you have to do is Extract, Transform, Load (ETL) into the warehouse to start making smarter business decisions. 

Amazon makes it quite easy for you to start with a few hundred gigabytes of data and scale up or down seamlessly, based on immediate demands. This allows businesses to leverage their data to acquire valuable business insights about themselves or their customers.

To launch your cloud data warehouse, you have to launch a set of nodes known as a Redshift cluster. Once you have provisioned the cluster, data sets can be uploaded to run data analysis queries. 

What is Snowflake?

Like Redshift, Snowflake is also a powerful relational database management system. It’s offered as an analytic data warehouse for structured and semi-structured data that follows a Software-as-a-Service (SaaS) model.

This means that it’s not built on top of an existing database or a big data software platform. Instead, Snowflake uses an SQL database engine with unique architecture that was specifically designed for the cloud. This data and analytics solution is also fast, user-friendly, and offers more flexibility than traditional data warehouses. 

Both Redshift ETL and Snowflake ETL have an abundance of similarities between the two solutions. However, there are additional unique capabilities and other functionalities that come with each platform. If you’re considering running your data analytics workload entirely on the cloud, for example, the similarities between these two solutions are far greater than their differences.

Snowflake offers cloud-based data storage and analytics in the form of the Snowflake Elastic Data Warehouse. In this scenario, users store data using cloud-based hardware and software. Data is stored in Amazon S3. If you’re using Snowflake ETL, you can leverage the public cloud ecosystem.

Both of these cloud warehouse systems are powerful and offer some unique features to managing data. However, there are definitely differences….

To choose the right solution for your company, you should, at the very least, compare integration, maintenance, and costs.  I’ll do just that in the following sections, along with the side-by-side pros and cons of both the solutions.

Comparing Redshift and Snowflake: Integration

If your company is already working with AWS, then Redshift might seem like the natural choice. However, you can also find Snowflake on the AWS Marketplace with on-demand functions.

If you’re already leveraging AWS services, Redshift can be integrated seamlessly.

However, if you are going to use Snowflake, it’s important to note that it doesn’t have the same integrations as Redshift. This will make it challenging to integrate the data warehouse with tools such as AWS Athena.

However, Snowflake makes up for this with a variety of integration options like Apache Spark. While Redshift is the more established solution, Snowflake has made some significant strides over the last few years.

Comparing Redshift and Snowflake: Maintenance

With Amazon’s Redshift, users compete over available resources in a cluster.  This problem doesn’t exist with Snowflake since users can start different data warehouses (of various sizes) to look at the same data.

When it comes to vacuuming and cleaning tables, Snowflake provides a turnkey solution. With Redshift, cleaning tables can become a problem as it can be challenging to scale up or down.

Redshift Resize operations can also become expensive and lead to significant downtime. Since compute and storage are separate in Snowflake, users do not have to copy data to scale up or down but can switch data compute capacity at will. 

Comparing Redshift and Snowflake: Costs

Both Snowflake ETL and Redshift ETL have very different pricing models.

Redshift calculates costs based on a per-hour per-node basis. So customers can calculate their  monthly cost in a simple formula:

Redshift Monthly Cost = [Price Per Hour] x [Cluster Size] x [Hours per Month]

Snowflake’s charges heavily depend on monthly usage patterns. Each bill is generated by the hour for each virtual data warehouse. Besides, data storage costs are also separate from computational costs.

Storage costs on Snowflake can start at an average compressed amount at a flat rate per terabyte. This will be accrued daily and billed each month. However, compute costs are per second or credit. Snowflake’s compute is referred to as virtual warehouses. The size of the virtual warehouse determines how fast queries will run.

This can quickly become complicated because Snowflake offers eight tiers of computational warehouses, with the smallest cluster costing one credit per hour. These costs will double as you go up a level up to 128 credits per hour for the 4XL tier.

As a result, Redshift tends to be less expensive compared to Snowflake’s on-demand pricing.

Comparing Redshift and Snowflake: Pros & Cons

Redshift Pros and Cons and Snowflake Pros and Cons

Conclusion

Your data warehouse decision has to be made based on your daily usage patterns and the amount of data you’re going to work with.  The choice between Redshift and Snowflake will be relative to your resources and specific business demands. Snowflake is a better platform to start and grow with. Redshift is a solid cost-efficient solution for enterprise-level implementations.

Data Transformation

If Snowflake is your choice, Datameer is the premier data transformation solution for Snowflake.

Datameer SaaS Data Transformation is the industry’s first collaborative, multi-persona data transformation platform integrated into Snowflake.  The multi-persona UI, with no-code, low-code, and code (SQL) tools, brings together your entire team – data engineers, analytics engineers, analysts, and data scientists – on a single platform to collaboratively transform and model data.  Catalog-like data documentation and knowledge sharing facilitate trust in the data and crowd-sourced data governance.  Direct integration into Snowflake keeps data secure and lowers costs by leveraging Snowflake’s scalable compute and storage.

Learn more about our innovative SaaS data transformation solution, Sign up for your free trial today!

Related Posts