7 Ways to Save Cost on Snowflake- For Real!
- Ndz Anthony
- October 3, 2023
Snowflake has emerged as a leading data warehousing solution thanks to its speed, flexibility, and scalability. However, while powerful, Snowflake’s pricing model, which is based on the amount of storage and compute used, can lead to high costs if not managed properly. In this blog post, we’ll explore seven practical ways to save costs on Snowflake.
Snowflake’s Pricing Model
Before we dive into cost-saving strategies, it’s important to understand how Snowflake charges its customers. Snowflake’s pricing is based on three key factors:
- Storage Costs: Snowflake charges for the storage space used by the data in Snowflake. When data is loaded into Snowflake, it reorganizes that data into its internal optimized, compressed, columnar format and stores this optimized data in cloud storage. The cost is based on the amount of data stored in Snowflake.
- Compute Costs: Snowflake processes queries using “virtual warehouses”. Each virtual warehouse is an MPP (massively parallel processing) compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider. Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses. The cost is based on the amount of compute resources used to process queries.
- Cloud Services Costs: The cloud services layer in Snowflake is a collection of services that coordinate activities across Snowflake. These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch. The cost is based on the usage of these services.
It’s also worth noting that Snowflake’s pricing can vary based on the specific Snowflake edition you’re using. Different editions come with different features and capabilities, which can impact the overall cost.
We now understand how Snowflake’s pricing works, let’s explore some effective strategies for reducing these costs.
1. Optimize Your Storage
One of the key areas where you can save costs in Snowflake is storage. Snowflake charges based on the amount of data stored in its optimized, compressed, columnar format. Therefore, managing your storage effectively can lead to significant cost savings.
Here are a few strategies to optimize your storage in Snowflake:
- Regularly clean up old or unnecessary data: Over time, your Snowflake environment can accumulate a lot of data that is no longer needed. Regularly deleting this data can free up storage space and reduce costs.
- Use Time Travel wisely: Snowflake’s Time Travel feature allows you to access historical data for a certain period of time. However, this data counts towards your storage usage. Therefore, it’s important to set a Time Travel retention period that balances your need for historical data with your storage costs.
- Compress your data: Snowflake automatically compresses your data, but the level of compression can depend on the data type and structure. Therefore, it can be beneficial to choose data types and structures that result in higher compression.
For more detailed strategies on optimizing your storage in Snowflake, check out this guide from Snowflake’s official documentation.
2. Picking Your Snowflake Edition: It’s All About the Fit
As you set off on your Snowflake journey, you’ll find yourself at a crossroads: which edition to choose? It’s a bit like shopping for a new outfit. You want something that fits you perfectly, not too loose, not too tight, and certainly not overpriced for the features you get.
Snowflake offers a wardrobe of editions, each tailored for different needs:
- Standard: Think of it as your everyday casual wear. It’s comfortable, practical, and gives you unlimited access to Snowflake’s standard features.
- Enterprise: This is your business suit, designed for those who need a bit more. It offers additional features like role-based access control and support for larger scale computing.
- Business Critical: This is your high-security armor, equipped with advanced features like customer-managed keys and high availability support.
- Virtual Private Snowflake (VPS): This is the most advanced edition, offering dedicated, managed, and secure environments for your most sensitive workloads.
Just as you wouldn’t wear a tuxedo to a casual brunch, a small business might not need the robust features of the VPS edition. On the other hand, a larger enterprise handling sensitive data might find the Standard edition too basic for their needs.
3. Use Auto-Suspend and Auto-Resume
One of the most effective ways to manage costs is by leveraging the Auto-Suspend and Auto-Resume features. These features are designed to optimize the usage of your virtual warehouses, ensuring you’re not paying for compute resources when they’re not in use.
Here’s how it works:
- Auto-Suspend: This feature automatically suspends your warehouse after a period of inactivity, which you can specify. When a warehouse is suspended, it doesn’t consume any compute resources, and therefore, you’re not billed for it. I’ve found that setting a short auto-suspend period can lead to significant cost savings, especially for warehouses that are used sporadically throughout the day.
- Auto-Resume: The feature automatically resumes a suspended warehouse when a new query is submitted. This ensures that your warehouse is ready to go when you need it, without you having to manually start it up. It’s a great way to maintain convenience and productivity while keeping costs in check.
By default, auto-suspend is enabled, and auto-resume is disabled. However, you can change these settings based on your needs. For instance, if your warehouse is used heavily during business hours but not at all overnight, you might set a short auto-suspend period and enable auto-resume. This way, your warehouse automatically shuts down during the quiet hours, saving you money, and springs back into action when the workday begins.
Remember, these features are tools in your cost management toolkit. Use them wisely, and they can help you significantly reduce your Snowflake costs. For more information on how to configure these settings, check out this guide from Snowflake’s official documentation.
4. Leverage Snowflake’s Data Sharing Feature
Snowflake’s data Sharing feature allows you to share data across your organization, or even with external business partners, without creating copies of the data. This not only saves storage space, but also ensures that everyone is working with the same, up-to-date data.
Here’s how you can leverage this feature to save costs:
- Share data without duplication: Traditional methods of data sharing often involve creating and distributing copies of the data. With Snowflake, you can share data directly from your existing tables and views. This eliminates the need for data duplication, saving storage space and reducing costs.
- Maintain data consistency: When you share data in Snowflake, any changes you make to the data are immediately reflected for all users. This ensures that everyone is working with the same, consistent data, reducing the need for data reconciliation and error correction.
- Control access to data: Snowflake allows you to control who can access the data you share, and what they can do with it. This allows you to securely share sensitive data, without the risk of unauthorized access or data leaks.
To share data in Snowflake, you simply create a share, grant privileges to the users you want to share with, and add the objects (tables, views, etc.) you want to share. The users can then access the shared data through a database created from the share.
5. Optimize Query Performance
Not all queries are created equal. Some are efficient, getting you the data you need quickly and without consuming excessive resources. Others can be resource hogs, taking up more of your compute time (and budget) than necessary.
What you need to do is:
Optimize your queries: Once you’ve identified inefficient queries, you can take steps to optimize them. This might involve rewriting the query, creating or modifying indexes, or changing your data model. Remember, even small improvements in query efficiency can add up to big savings over time.
Use the QUERY_TAG function: This function allows you to assign a tag to each of your queries. This can be incredibly useful for tracking and optimizing your queries, as you can filter your query history by tag to see exactly how your queries are performing.
Here’s an example of how you can use the QUERY_TAG function in your queries:
SELECT /*QUERY_TAG='Department Analysis'*/ department_id, COUNT(*) FROM employee_table GROUP BY department_id;
In this query, we’re assigning the tag ‘Department Analysis’ to our query. This allows us to easily find this query in our query history and monitor its performance.
6. Monitor and Control Usage
As the volume of data we handle continues to grow, with an estimated 5.3 billion internet users worldwide in 2022 according to Statista, it’s more crucial than ever to keep a close eye on our data usage. Efficient data management isn’t just about handling large volumes of data; it’s also about managing our resources wisely.
Snowflake provides comprehensive usage analytics that allow you to see exactly how your resources are being used. This includes information on storage usage, compute usage, and the performance of your queries.
For instance, you might find that a particular warehouse is using more compute resources than necessary, or that a set of queries is taking longer to execute than expected. With this information, you can optimize your warehouses, rewrite inefficient queries, or even restructure your data to improve performance and reduce costs.
Snowflake also allows you to set resource limits to prevent cost overruns. This way, you can proactively manage your costs and avoid unpleasant surprises at the end of the month.
7. Take Advantage of Snowflake’s Partner Network
No tool is an island. The real power comes from the ecosystem of solutions that work together to help you manage, analyze, and derive insights from your data. Snowflake understands this, which is why they’ve built a robust partner network.
One of the partners that I’ve had the opportunity to work with is Datameer. Now, I’m not here to sell you on Datameer, but rather to share my experiences and observations in the context of cost optimization.
How can Datameer help you optimize your Snowflake usage?
- Simplify data exploration and analysis: Datameer enables your analysts to explore, prepare, and share trusted Snowflake datasets. This will of course lead to more efficient data usage and cost savings.
- Manage resources and cost: One of the things I appreciate about Datameer is its focus on cost control. It gives you full visibility of your Snowflake’s analytics resources and costs, allowing you to manage your resources effectively and stay within budget.
- Fully embrace the Snowflake data cloud: Datameer is designed and optimized for Snowflake to reduce data movement and maximize the return on your Snowflake investment.
- Create a semantic layer for your business: Datameer promotes trust, collaboration, and faster decision-making with a business-friendly view of Snowflake datasets, lineage, and metadata.
According to Ryan Goodman, VP of Analytics at Reliant Funding, Datameer has “lowered our data engineering time by 500%”. This is just one example of how Datameer can help businesses optimize their Snowflake usage and reduce costs.
While it’s important to manage and optimize your Snowflake usage, it’s equally important to have the right tools to help you do so. Datameer is one such tool that I am sure can make your Snowflake journey more efficient and cost-effective.