The Golden Snowflake Warehouse Metrics Part III – Load Monitoring

  • Martin Nettling
  • February 15, 2023

TL;DR: We observed that we waste a bigger part of our Snowflake compute costs on idling warehouses and wanted to understand why, so we set up Snowflake load monitoring. Based on the warehouse metrics productivity , suspendability , credits-per-hour , and cost-effectiveness , we assess our Snowflake warehouses for the impact of idle time. We conclude that there is significant potential to save compute costs by adjusting the load distribution among our warehouses.

Snowflake Load Monitoring

In a previous post, we looked into the states of our Snowflake warehouses and observed that many warehouses spend more time in an idle state than time in a busy state. Busy and idle warehouses burn an equal amount of credits per time, and in fact, our organization spent more than 75% of the compute costs in 2022 on idle warehouse time.

Problem

We want to avoid idling warehouses to save compute costs. For this purpose, we need more data about the utilization of our Snowflake warehouses to understand the situation and take action.

In this blog post, we explain how we measure the utilization of our Snowflake warehouses with an emphasis on the impact of idle time. The goal is to save compute costs by adjusting the workload distribution among our warehouses.

Results and Conclusion

We defined the warehouse metrics productivity , suspendability , credits-per-hour , and cost-effectiveness . We utilize these metrics as key performance indicators (KPIs) for the utilization of our Snowflake warehouses.

WAREHOUSE busy (%) idle (%) suspended (%) productivity (%) suspendability (%) Cost-effective- ness (s / $) Credits / h
ENGINEERING 2.76 22.66 74.58 10.87 76.70 654 0.254
E2E 6.90 13.00 80.09 34.67 86.03 1902 0.199
INTEGRATION_TEST 7.25 5.45 87.31 57.11 94.13 3973 0.127
LOAD 2.75 8.46 88.79 24.53 91.29 1493 0.112
DEMO 0.55 8.31 91.13 6.26 91.60 241 0.089
SALES 1.48 4.85 93.67 23.33 95.08 3221 0.063
QA_ONLY 0.93 4.00 95.07 18.79 95.96 1743 0.049
DAP_TEST 0.56 4.06 95.38 12.14 95.91 453 0.046
Our eight most expensive warehouses and the distribution of warehouse states and the warehouse metrics productivity , suspendability , cost-effectiveness, and credits-per-hour for 2022.

We computed the four metrics for all our warehouses and selected the eight most expensive warehouses according to the credits-per-hour metric, as depicted in the table above. We can draw some overall conclusions as follows.

  1. For most warehouses, we measure productivity values between 10% and 35% meaning that these warehouses spend much more time in an idle state than in a busy state.
  2. The measured suspendability values are mostly over 90% meaning that these warehouses can suspend well (and save compute costs!) in times without load. Please note that we changed the auto-suspend parameter from the default of 600s to 60s to increase the suspendability.
  3. We see differences in the cost-effectiveness of our warehouses by a factor of up to 16, indicating that the amount of compute time per credit varies heavily among our warehouses.

The ENGINEERING warehouse consumes the most credits per hour and has a cost-effectiveness of ~11 minutes of compute time for each credit ( e.g., six times less than the cost-effectiveness of the INTEGRATION_TEST warehouse). Therefore we have a more detailed look at this warehouse. It has a productivity of ~10%, meaning that each query executed by this warehouse causes a tenfold more idle time and hence a tenfold more cost per query compared to a warehouse under constant load. We measure a suspendability of ~75%, meaning that one-quarter of the load-free time is spent in an idle state and three-quarters in a suspended state.

Snowflake Load monitoring

Busy state, idle state, and suspended state of the ENGINEERING warehouse for a typical hour during a workday.

These metrics indicate that we sporadically fire short queries to the ENGINEERING warehouse. Many queries result in an auto-resume of the warehouse, are executed in a few seconds, and leave the warehouse idle for about a minute. The distribution of the warehouse states in the figure above confirms this hypothesis. The ENGINEERING warehouse is the best site to optimize our compute costs. Assuming we can halve the amount of idle time here by re-distributing the workload, we would save 45% of the compute costs for our top credit-burning warehouse. Halving idle time in all warehouses would save us 40% of our total compute costs—an encouraging finding to take action.

Solution – Snowflake Load Monitoring

We utilize the warehouse metrics productivity , suspendability , credits-per-hour , and cost-effectiveness to gauge the impact of the workload on our Snowflake warehouses. These metrics are based on the warehouse states busy, idle, and suspended, described in our post on warehouse states . The four warehouse metrics are defined as follows.

Productivity : Do you feel that your Snowflake warehouse is not very productive? You might well be right. The productivity metric is defined as the ratio of busy time to the sum of busy and idle time, and values range from zero (only idle time, no busy time) to 100% (no idle time at all). The productivity is usually well below 100%, meaning you pay for idle time, which can account for the lion’s share of your Snowflake costs.

Suspendability : You do not want to pay for an idling warehouse, but you have to. The suspendability metric is the ratio of suspended time to the sum of suspended and idle time. The values range from zero (no suspended time) to 100% (no idle time).   If it were possible to set the auto-suspend time to 0, there would be no idle time, and the suspendability would always be 100%. Since the auto-suspend time has a minimum of one minute, the suspendability will typically be below 100%, and the resulting costs for pure idle time might surprise you.

Credits-per-hour : Your active warehouse consumes credits depending on the warehouse size. You can lower the costs when your warehouse can suspend from time to time or choose a smaller warehouse size. This metric helps to focus on the warehouses, which account for the lion’s share of your compute costs.

Cost-effectiveness : You want as much compute time per credit as you can. The cost-effectiveness metric is defined as the ratio of compute time to burned credits (unit is seconds per credit) and helps you to assess whether your credits are well spent. The lower the cost-effectiveness, the more you should consider the workload distribution.

Outlook

We wanted to learn how we utilize our Snowflake warehouses and needed KPIs to assess which warehouses have the greatest potential to save compute costs; we did this by setting up Snowflake load monitoring. We could prioritize our warehouses using the warehouse metrics productivity, suspendability, credits-per-hour, and cost-effectiveness for future efforts. We will continue to work on warehouse metrics and can’t wait to see what other surprises are in store for us.

Contact us today to give feedback, or get in touch with us!

Authors: Martin Nettling, André Rouél, Hendrik Treutler 

Related Posts

Top 5 Snowflake tools for Analysts- talend

Top 5 Snowflake Tools for Analysts

  • Ndz Anthony
  • February 26, 2024