Nico Acosta for Propel

Posted on May 23, 2023 • Originally published at propeldata.com

How to reduce Snowflake costs: A five-point checklist

#snowflake #datawarehouse #cloudcosts

Facing high Snowflake costs?

Check out our practitioner's five-point checklist to reduce Snowflake costs and optimize your data warehouse usage.

Snowflake is an incredibly powerful and scalable data warehouse, but without proper management, your costs can skyrocket and get out of control. In this checklist, we will explore five key strategies for optimizing Snowflake costs, ensuring that your data warehouse remains efficient and cost-effective.

1. Set the correct warehouse auto-suspend

What is the Auto-Suspend feature?

Auto-Suspend is a Snowflake feature that allows your warehouse to automatically suspend itself after a specified period of inactivity. This helps to reduce costs by preventing unnecessary credit usage when the warehouse is not in use.

The default value and why it's inefficient

The default auto-suspend value is 10 minutes, but this is often too long for many workloads. For example, if a query lasts 20 seconds, the warehouse will still be charged for 10 minutes of usage, leading to unnecessary costs.

Example query for querying and modifying auto-suspend

To query the auto-suspend settings for all warehouses, use the following query:

SHOW WAREHOUSES;

The auto_suspend column specifies how long a running warehouse can remain inactive, in seconds, before automatically suspending and stopping credit usage. A null value means that the warehouse will never automatically suspend. You’ll want to avoid having any warehouse with a null auto-suspend. Typically, 60 , meaning an auto-suspend of a minute is a good place to start.

To modify the auto-suspend setting for a specific warehouse to 1 minute (60 seconds), you can run the following:

ALTER WAREHOUSE "<warehouse_name>"
  SET AUTO_SUSPEND = 60;

2. Right-size your warehouses: monitor remote disk spillage

What is remote disk spillage, and why it causes costs to increase

Remote disk spillage occurs when a virtual warehouse runs out of memory and begins spilling intermediate results to remote storage. This can lead to increased query times and higher Snowflake costs due to increased I/O operations and additional credit usage.

Identify the warehouses that have remote disk spillable

To monitor remote disk spillage, you can use Snowflake's QUERY_HISTORY function. Here's an example query to identify the warehouses with remote disk spillage over the last 30 days:

SELECT  
    WAREHOUSE_NAME,
    SUM(BYTES_SPILLED_TO_REMOTE_STORAGE) as TOTAL_BYTES_SPILLED_TO_REMOTE_STORAGE
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE START_TIME >= DATEADD('day', -30, CURRENT_TIMESTAMP())
GROUP BY WAREHOUSE_NAME;

Fixing remote disk spillage by increasing warehouse size

To fix remote disk spillage, you can increase the size of your warehouse. By doing so, you will allocate more memory to the warehouse and reduce the need for remote storage, ultimately lowering costs. With a larger warehouse, your queries will complete more than twice as fast. Make sure to monitor and adjust warehouse sizes as needed to prevent spillage.

3. Use a serving layer for high-concurrency sub-second queries

Sometimes, you require high-concurrency sub-second queries. These are critical for customer-facing dashboards, data APIs, and usage metering, where fast response times are essential.

How a serving layer on top of Snowflake works

A serving layer on top of Snowflake syncs data to high-speed storage, optimizes it, and serves it via an API without consuming Snowflake credits. This reduces Snowflake costs while providing high-performance analytics to your end-user applications.

It is a better alternative to traditional embedded analytics because it reduces the load on Snowflake. Furthermore, by providing an API, it offers greater flexibility and control.

Propel's data API platform provides engineering teams with a unified platform for delivering high-performance customer-facing analytics. Its serving layer solution offers an analytics backend with a GraphQL API and React UI component library, requiring no infrastructure scaling or management. To learn more about Propel, read the docs.

4. Identify and fix inefficient queries

Inefficient queries are the source of a lot of waste. In this section, we show you how to identify them and share some tips to optimize them.

How to identify inefficient queries

Inefficient queries can significantly contribute to higher Snowflake costs. To identify them, you can monitor the QUERY_HISTORY view and look for queries with long execution times or high resource consumption.

You can use the following query to identify your slowest queries in the last 30 days in Snowflake:

SELECT  
    QUERY_TEXT,       
    SUM(TOTAL_ELAPSED_TIME) AS TOTAL_ELAPSED_TIME,
    SUM(BYTES_SCANNED) AS BYTES_SCANNED
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE START_TIME >= DATEADD('day', -30, CURRENT_TIMESTAMP())
GROUP BY QUERY_TEXT
ORDER BY TOTAL_ELAPSED_TIME DESC

Tips to improve query efficiency

To make queries more efficient, consider the following tips:

Use selective filtering and avoid SELECT *.
Optimize JOIN operations.
Make sure tables have an ORDER BY.
Limit the number of rows returned using LIMIT.
Use materialized views for pre-computed results.
Leverage clustering keys to improve query performance.

5. Set up Resource Monitors

What are Resource Monitors?

Resource Monitors in Snowflake allow you to track and manage your credit usage, helping to prevent unexpected costs. You can set thresholds that, when exceeded, trigger actions such as suspending a warehouse or sending notifications.

Example: Creating a Resource Monitor

To create a Resource Monitor that suspends a warehouse when credit usage exceeds a specified amount, you can use the following SQL command:

CREATE RESOURCE MONITOR "monitor_name"
  WITH CREDIT_QUOTA = <quota>
  TRIGGERS
    ON 100 PERCENT DO SUSPEND;

Conclusion

Optimizing Snowflake costs is crucial for maintaining an efficient and cost-effective data warehouse. By following this five-point checklist, you can optimize warehouse usage by preventing remote disk spillage, adjusting auto-suspend settings, leveraging a serving layer for high-concurrency queries, and monitoring resource consumption. So, start implementing these strategies today and make the most of your Snowflake investment.

Top comments (0)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.

DEV Community