Skip to main content
CacheSnowflake

Snowflake Caching

By Ottobre 20, 2023Marzo 5th, 2024No Comments

Snowflake is a cloud data platform that provides fast and scalable data warehousing, data sharing, and data analytics. One of the features that makes Snowflake efficient and cost-effective is its caching mechanism. It allows you to reuse previously computed results and avoid unnecessary data transfers. In this article, I will introduce the three layers of the Snowflake caching mechanism.

Metadata Cache

Snowflake metadata cache is the first layer of caching that Snowflake uses to store information about the data stored in Snowflake. Metadata cache helps Snowflake optimize query planning and execution by providing information such as:

  • Data structures: Metadata cache stores information about how data is organized in Snowflake, such as schemas, tables, columns, and partitions. This information helps Snowflake determine the best way to access and process the data for a given query.
  • Data statistics: Metadata cache stores information about the characteristics of data, such as the number of rows, the distribution of values, and the cardinality of columns. This information helps Snowflake estimate the cost and complexity of a query and choose the most efficient execution plan.
  • Data freshness: Metadata cache stores information about the last modified time of the data and the queries that have accessed or updated the data. This information helps Snowflake check if the data has changed since the last query and if the result cache or warehouse cache can be used to return the results.

Metadata Cache Updates

Snowflake metadata cache is updated automatically whenever the data or the queries change. Snowflake also uses a smart invalidation mechanism to ensure that the metadata cache is always consistent with the latest state of the data. Metadata cache is one of the reasons why Snowflake can provide fast and accurate query results on the cloud.

Query Results Cache

The Query Result Cache is the second layer of caching that Snowflake uses to store information about the data stored in Snowflake.

It’s a mechanism to improve query performance by storing the results of queries for a certain period. When a similar query is executed, Snowflake checks the query result cache to see if the results are already available. If so, Snowflake bypasses query execution and retrieves the results directly from the cache. This can save time and resources, especially for complex or frequently run queries.

Query Result Cache Persistence

You can overridden the default query result cache at the account, user, and session level using the USE_CACHED_RESULT session parameter. The query result cache has some limitations and conditions that affect its behavior:

  • The query result cache expires after 24 hours, unless a query reuses it. The cache can persist up to 31 days if many queries use it.
  • The query must syntactically match a previously executed query.
  • The table data contributing to the query result must not have changed.
  • The query must not include functions that must be evaluated at execution time.

Virtual Warehouse Cache

The warehouse cache of Snowflake is the third layer of caching that Snowflake uses. It is a feature that allows a virtual warehouse to store the data that it has scanned from tables in a local disk cache. This can improve the performance of queries that access the same or similar data, as they can read from the cache instead of fetching the data again from the tables.

The warehouse cache is different from the query result cache, which stores the results of queries for a certain period and can be accessed by any warehouse, on the other hand, warehouse cache is specific to each warehouse and is only available when the warehouse is running.

Warehouse Cache Behavior

The warehouse cache is managed automatically by Snowflake and does not require any configuration or tuning. However, there are some factors that affect the behavior and effectiveness of the warehouse cache:

  • The warehouse cache is dropped when the warehouse is suspended or resized.
  • The warehouse cache is shared among all clusters in a multi-cluster warehouse.
  • The warehouse cache depends on the micro-partitions that are scanned by the queries.

Useful Links

Using Query Result Cache: https://docs.snowflake.com/en/user-guide/querying-persisted-results

Optimizing the warehouse cache: https://docs.snowflake.com/en/user-guide/performance-query-warehouse-cache

Auteur

Leave a Reply