Skip to main content

Caching Mechanisms

Snowflake’s caching architecture is one of the most frequently tested topics on the COF-C02 exam. Unlike traditional databases that rely on a single buffer pool, Snowflake operates three independent caching layers, each serving a different purpose, persisting for a different duration, and carrying different cost implications. Mastering all three β€” and understanding exactly when each fires and when it does not β€” is essential for both the exam and real-world performance optimisation.

πŸ“–

Key Terms β€” Caching Mechanisms

QRC

Result Cache

A centralised, Cloud Services-layer store that holds the complete result set of every successfully executed query for up to 24 hours. Served at zero credit cost when a cache hit occurs.

LDC

Local Disk Cache

Per-virtual-warehouse SSD storage that caches raw columnar micro-partition data fetched from cloud object storage. Persists only while the warehouse is running.

MDC

Metadata Cache

A persistent, Snowflake-managed store in the Cloud Services layer that holds table and column statistics β€” row counts, min/max values, null counts, and distinct value estimates β€” used for query planning and pruning.

CAC

Cache Hit

The condition where a query or data request is satisfied entirely from a cache layer without accessing underlying cloud storage or re-executing compute.

CAC

Cache Invalidation

The process by which a cached entry is marked stale or removed. For the result cache, invalidation occurs when underlying table data changes via DML or DDL.

CAC

Cache Warming

The process of populating the local disk cache by executing queries that bring micro-partition data from cloud storage into warehouse SSD storage.

NON

Non-Deterministic Function

A SQL function that can return different results on successive calls with identical inputs β€” for example, CURRENT_TIMESTAMP(), RANDOM(), and UUID_STRING(). Queries containing these functions cannot be served from the result cache.

MIC

Micro-Partition

Snowflake's internal storage unit. Each micro-partition is a contiguous, columnar, compressed file of approximately 50–500 MB of uncompressed data. Caching operates at this granularity.

CLO

Cloud Services Layer

Snowflake's always-on control plane responsible for authentication, query parsing and optimisation, metadata management, and hosting the result cache and metadata cache.

SER

Serverless Credits

A credit-consumption model for Snowflake background services (such as Automatic Clustering and Snowpipe) that do not require a user-managed virtual warehouse.


Why Caching Matters on the Exam

The COF-C02 exam regularly presents scenarios where a candidate must identify which cache layer is responsible for a performance outcome, or explain why a cache did not fire. Common question patterns include:

  • A query returns instantly the second time β€” which cache is responsible?
  • A warehouse is suspended overnight β€” what happens to cached data?
  • Two different warehouses run the same query β€” do they share a cache?
  • A query uses CURRENT_TIMESTAMP() β€” will the result cache help?

Understanding the answers to these questions requires a thorough grasp of all three layers.

🎯Exam Priority β€” Know All Three Caches

The COF-C02 exam tests each of the three cache layers separately. You must be able to distinguish between them by: where they live (Cloud Services vs warehouse SSD), how long they persist, what they store, and what invalidates them. Confusing the result cache with the local disk cache is one of the most common candidate errors.


The Three-Layer Caching Architecture

Snowflake Three-Layer Caching Architecture

A layered architectural diagram showing the three independent caching mechanisms in Snowflake. At the top sits the Cloud Services Layer, which houses two of the three caches: the Result Cache (Query Result Cache) shown as a large repository storing complete result sets from every executed query, retained for up to 24 hours, and the Metadata Cache shown as a database of table statistics including row counts, min/max values, null counts, and distinct value estimates. Below the Cloud Services layer sits the Compute Layer containing multiple independent Virtual Warehouses (depicted as separate server clusters). Each virtual warehouse has its own Local Disk Cache (SSD) shown as a fast storage layer attached to the warehouse. Arrows show the flow: a client query first checks the Result Cache in Cloud Services. On a miss, the Cloud Services layer routes the query to a Virtual Warehouse. The Virtual Warehouse checks its Local Disk Cache (SSD). On a miss at the SSD layer, the warehouse fetches micro-partition files from the Remote Cloud Storage (shown at the bottom as an object store β€” S3, Azure Blob, or GCS). The fetched data is written into the Local Disk Cache for future reuse. The Metadata Cache feeds into the query optimiser in Cloud Services to enable partition pruning before any data is read. A key annotation indicates that the Result Cache and Metadata Cache are centralised and shared across the account, while Local Disk Cache is strictly per-warehouse and is lost on suspension.

Three-layer Snowflake caching architecture: Result Cache and Metadata Cache in Cloud Services layer, Local Disk Cache (SSD) per virtual warehouse, and cloud object storage at the bottom.

The three layers operate independently and serve different stages of query execution:

Cache LayerLocationWhat Is StoredDurationCost When Hit
Result CacheCloud ServicesComplete result setsUp to 24 hoursZero credits
Local Disk CacheVirtual Warehouse SSDRaw micro-partition dataWhile warehouse is runningReduced credits vs cloud read
Metadata CacheCloud ServicesTable/column statisticsPersistent (Snowflake managed)Zero credits

Layer 1 β€” Result Cache (Query Result Cache)

The result cache is the first cache layer checked for any incoming query. When Snowflake receives a SQL statement, the Cloud Services layer computes a fingerprint of the query text and checks whether an identical result set already exists in the result cache.

How the Result Cache Works

  1. The user submits a SQL query.
  2. Cloud Services parses the query and computes a cache key from the exact SQL text (including whitespace and case).
  3. If a matching entry exists in the result cache and is still valid (not invalidated by DML/DDL on the underlying tables), the result set is returned immediately.
  4. No virtual warehouse is activated. Zero compute credits are consumed.
  5. If no cache hit, the query proceeds to compute execution and the result is stored in the result cache upon completion.

Result Cache Hit vs Miss Decision Flow

A detailed decision-tree flowchart illustrating the result cache lookup process for every incoming Snowflake query. The flow starts with a client submitting a SQL query to Snowflake Cloud Services. The first decision diamond asks: 'Is the SQL text byte-for-byte identical to a previously cached query, including case and whitespace?' If No, the flow branches to 'Cache Miss β€” Route to Virtual Warehouse for execution.' If Yes, a second decision diamond asks: 'Has the result cache entry been invalidated by DML or DDL on the underlying tables since the result was cached?' If Yes (i.e., data has changed), the flow branches to Cache Miss. If No, a third decision diamond asks: 'Is the cached result less than 24 hours old?' If No, Cache Miss. If Yes, a fourth decision diamond asks: 'Does the query contain non-deterministic functions such as CURRENT_TIMESTAMP, RANDOM, UUID_STRING, or SEQ?' If Yes, Cache Miss. A fifth decision diamond asks: 'Is the user executing the query operating under the same role context as when the result was originally cached?' If No, Cache Miss. If all checks pass, the flow reaches 'Cache Hit β€” Return result set immediately, zero compute credits consumed.' The Cache Miss path shows the query going to a virtual warehouse, executing against micro-partitions, and storing the new result in the result cache for future reuse, with a timestamp and table change-detection token saved alongside it.

Flowchart of result cache decision logic: identical SQL text, no invalidation, within 24 hours, no non-deterministic functions, same role context β€” all must be true for a cache hit.

Result Cache Rules β€” Exam Critical

🎯Result Cache β€” Five Conditions for a Cache Hit

All five of the following conditions must be true for a result cache hit to occur:

  1. The SQL text must be byte-for-byte identical β€” same case, same whitespace, same schema qualification.
  2. The cached result must be less than 24 hours old.
  3. The underlying tables must not have been modified by DML or DDL since the result was cached.
  4. The query must not contain non-deterministic functions (CURRENT_TIMESTAMP, RANDOM, UUID_STRING, SEQ, etc.).
  5. The executing user must have the same role context as the original query. If any one of these conditions fails, Snowflake falls through to full query execution.

What the Result Cache Stores and Where

The result cache lives in the Cloud Services layer, meaning it is:

  • Centralised β€” shared across all virtual warehouses in the account
  • Account-scoped β€” multiple warehouses or users can benefit from a result cached by any warehouse
  • Transparent β€” no user configuration is required; it is always active

The cache entry stores the complete result set. For very large result sets, this can consume significant Cloud Services storage, but this is managed entirely by Snowflake and has no user-visible cost impact.

Non-Deterministic Functions β€” Why They Block Caching

A non-deterministic function is one whose output can differ between calls even when given the same inputs. Because Snowflake cannot guarantee that re-serving a cached result would be correct (the function might produce a different value today), any query containing such a function is ineligible for result caching.

Queries That Will NOT Hit the Result Cache
1-- These queries CANNOT be served from the result cache
2-- because they contain non-deterministic functions.
3
4-- CURRENT_TIMESTAMP returns a different value every second
5SELECT order_id, CURRENT_TIMESTAMP() AS run_time
6FROM orders
7WHERE order_date = '2024-01-15';
8
9-- RANDOM() produces a new random value on every execution
10SELECT TOP 100 customer_id, RANDOM() AS sample_weight
11FROM customers;
12
13-- UUID_STRING generates a unique identifier each call
14SELECT UUID_STRING() AS batch_id, COUNT(*) AS row_count
15FROM transactions;
16
17-- SEQ (sequence objects) advance the sequence each call
18SELECT MY_SEQ.NEXTVAL, product_name
19FROM products;
20
21-- -------------------------------------------------------
22-- These queries CAN be served from the result cache
23-- (deterministic, no DML on underlying tables since last run)
24
25SELECT COUNT(*) FROM orders WHERE order_date = '2024-01-15';
26
27SELECT region, SUM(revenue) AS total_revenue
28FROM sales
29GROUP BY region
30ORDER BY total_revenue DESC;
31
32SELECT customer_id, first_name, last_name
33FROM customers
34WHERE country = 'GB';

SHOW Commands and the Result Cache

SHOW commands (e.g., SHOW TABLES, SHOW SCHEMAS, SHOW WAREHOUSES) retrieve metadata from the Cloud Services layer. These commands can benefit from the result cache in a similar fashion to standard queries, and they do not require a running virtual warehouse because they operate entirely within the Cloud Services layer.


Layer 2 β€” Local Disk Cache (Data Cache / Warehouse Cache)

The local disk cache is fundamentally different from the result cache. Rather than storing final query results, it stores raw columnar micro-partition data β€” the actual files fetched from cloud object storage (S3, Azure Blob, or GCS). This cache lives on the SSD storage attached to each virtual warehouse’s compute nodes.

How the Local Disk Cache Works

When a virtual warehouse executes a query, it must fetch the relevant micro-partition files from cloud object storage. These fetches are slower than reading from local SSD. Snowflake automatically caches the fetched micro-partition data on the warehouse’s SSD. On the next query that requires the same micro-partitions, Snowflake reads from SSD rather than cloud storage β€” significantly faster.

Local Disk Cache β€” Warm vs Cold Warehouse Behaviour

A side-by-side comparison diagram illustrating the difference between a cold warehouse (no local disk cache) and a warm warehouse (populated local disk cache). On the left side, labelled 'Cold Warehouse β€” First Query After Resume', a timeline shows a client query arriving at a virtual warehouse. The warehouse has an empty SSD cache (shown as a grey empty box). The warehouse sends fetch requests to Remote Cloud Storage (S3/Azure/GCS). Multiple micro-partition files (shown as coloured blocks P1, P2, P3, P4, P5) are transferred over the network from cloud storage to the warehouse's SSD. The query then reads from the freshly populated SSD. An annotation shows 'High latency β€” cloud storage network fetch required for every micro-partition'. On the right side, labelled 'Warm Warehouse β€” Subsequent Query on Same Data', the same client query arrives. The warehouse SSD is now shown as populated with the same micro-partition blocks P1-P5 (coloured green to indicate they are cached). The query reads directly from SSD without touching cloud storage. An annotation shows 'Low latency β€” data served from local SSD, no cloud storage fetch'. At the bottom of both sides, a critical note states: 'Local Disk Cache is LOST when the warehouse is suspended or terminated. The next resume starts with a cold cache.' A separate annotation in the centre highlights: 'Two different warehouses NEVER share local disk cache β€” each warehouse has its own independent SSD layer.'

Side-by-side diagram: cold warehouse fetching from cloud storage vs warm warehouse reading micro-partitions from local SSD cache. Cache is lost on warehouse suspension.

Key Characteristics of the Local Disk Cache

πŸ”‘Local Disk Cache Is Per-Warehouse and Ephemeral

The local disk cache is strictly per-virtual-warehouse. Two warehouses running the same query against the same table do not share local disk cache. Each warehouse must independently warm its own cache. Additionally, the local disk cache is entirely lost when a warehouse is suspended or terminated β€” the next resume starts with a completely cold cache.

Cache persistence:

  • Persists as long as the warehouse remains in a Running state
  • Lost immediately on Suspend or Terminate
  • Multi-cluster warehouses: each compute cluster within a multi-cluster warehouse maintains its own SSD cache

Cache size:

  • Determined by the warehouse size (larger warehouses have more nodes with more aggregate SSD capacity)
  • Not user-configurable directly; scales with warehouse size selection

Cache eviction:

  • Uses an LRU (Least Recently Used) eviction policy when SSD capacity is reached
  • Older, less-frequently-accessed micro-partition data is evicted first

Cache Warming Strategies

Because the local disk cache is lost on suspension, workload patterns and auto-suspend settings have a direct impact on cache effectiveness:

Optimising Local Disk Cache Warming for Repeated Workloads

1
Identify Repeated Query Patterns

Use QUERY_HISTORY in the ACCOUNT_USAGE schema to identify which tables and columns are queried most frequently by your workload. Tables that are queried many times per hour benefit most from cache warming.

SQL
SELECT
  query_text,
  COUNT(*) AS execution_count,
  AVG(bytes_scanned) AS avg_bytes_scanned,
  AVG(percentage_scanned_from_cache) AS avg_cache_pct
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD('day', -7, CURRENT_TIMESTAMP())
GROUP BY query_text
ORDER BY execution_count DESC
LIMIT 50;
2
Tune Auto-Suspend to Match Workload Patterns

For bursty workloads where queries arrive in waves, increase the auto-suspend delay to keep the warehouse running (and its cache warm) between query bursts. For overnight batch jobs that run once, a shorter auto-suspend is acceptable.

SQL
-- Keep warehouse alive for 10 minutes of idle time
-- to preserve the warm cache between query bursts
ALTER WAREHOUSE my_analytics_wh
SET AUTO_SUSPEND = 600; -- 600 seconds = 10 minutes
πŸ’‘The credit cost of keeping a warehouse running for an extra few minutes is often far less than the cost of re-scanning terabytes of cloud storage due to a cold cache.
3
Use Dedicated Warehouses for Hot Datasets

Assign a dedicated virtual warehouse to workloads that repeatedly query the same set of tables (e.g., a reporting warehouse for a specific data mart). This ensures cache warming efforts are not diluted by unrelated queries from other teams.

πŸ’‘Shared warehouses serve many different workloads, reducing the likelihood that any given dataset stays warm in the cache.
4
Monitor Cache Hit Rates in Query Profile

Open the Query Profile for any query in Snowsight. The 'Bytes Scanned from Cache' metric shows what percentage of data was read from local SSD vs cloud storage. A high cache percentage indicates a warm warehouse and efficient cache utilisation.

SQL
-- Alternatively, query QUERY_HISTORY for cache statistics
SELECT
  query_id,
  query_text,
  bytes_scanned,
  bytes_written,
  ROUND(
    (bytes_scanned - bytes_written_to_result) /
    NULLIF(bytes_scanned, 0) * 100, 2
  ) AS estimated_cache_pct
FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
WHERE start_time >= DATEADD('hour', -24, CURRENT_TIMESTAMP())
ORDER BY bytes_scanned DESC
LIMIT 20;
5
Consider Warehouse Sizing for Cache Capacity

Larger warehouse sizes have more compute nodes and therefore more aggregate SSD storage. If your dataset is very large and cache misses are frequent despite a warm warehouse, consider scaling up the warehouse size to increase total SSD cache capacity.

πŸ’‘Scaling up increases both compute power and SSD cache capacity β€” but only scale up if cache misses are the identified bottleneck, not simply slow query execution.

Layer 3 β€” Metadata Cache (Cloud Services Layer)

The metadata cache is the most fundamental and persistent of the three layers. It is maintained entirely by Snowflake in the Cloud Services layer and stores statistical metadata about every table and its micro-partitions.

What the Metadata Cache Stores

For every table in Snowflake, the metadata cache maintains:

  • Row count β€” total number of rows in the table
  • Min / Max values β€” minimum and maximum values per column per micro-partition (used for pruning)
  • Null count β€” number of null values per column per micro-partition
  • Distinct value estimates β€” approximate cardinality per column
  • Micro-partition file references β€” locations of all micro-partition files in cloud storage

How Queries Use the Metadata Cache

ℹ️Queries Answered Entirely from Metadata β€” No Warehouse Required

Certain query patterns can be answered entirely from the metadata cache, requiring no virtual warehouse activation and consuming zero compute credits:

  • SELECT COUNT(*) FROM table_name with no WHERE clause β€” the row count is in metadata
  • SELECT MIN(clustering_key_col), MAX(clustering_key_col) FROM table_name β€” min/max from metadata
  • SHOW TABLES, SHOW SCHEMAS, SHOW COLUMNS β€” entirely metadata-served
  • DESCRIBE TABLE table_name β€” metadata only

These are common exam distractors: candidates may assume a warehouse is needed, but these queries bypass compute entirely.

Metadata Cache and Query Pruning

The metadata cache is what enables micro-partition pruning β€” Snowflake’s most impactful performance optimisation. Before a virtual warehouse reads a single byte of data, the Cloud Services optimiser consults the metadata cache to determine which micro-partitions cannot possibly contain rows matching the WHERE clause.

For example, if a query filters on order_date = '2024-03-15' and a particular micro-partition’s metadata shows min_date = '2024-05-01' and max_date = '2024-07-31', that micro-partition is pruned (skipped) without being read. This happens at the Cloud Services layer before the warehouse executes.

Metadata cache persistence: The metadata cache is persistent and managed by Snowflake. It is updated automatically whenever DML (INSERT, UPDATE, DELETE, MERGE) or DDL operations occur on a table. Unlike the result cache and local disk cache, users have no mechanism to invalidate or manage the metadata cache β€” Snowflake handles this entirely.


Cache Comparison β€” Side by Side

Result Cache vs Local Disk Cache

Feature
Result Cache
Local Disk Cache
Location
Cloud Services layer (centralised)
Virtual warehouse SSD (per-warehouse)
What is cached
Complete query result sets
Raw micro-partition data files
Shared across warehouses?
βœ“Yes β€” any warehouse can hit a result cached by any other warehouse
No β€” each warehouse has its own isolated SSD cache
Credit cost on cache hit
βœ“Zero credits β€” no warehouse activation
Reduced credits vs cloud storage read, but warehouse still runs
Cache duration
βœ“Up to 24 hours (reset on DML/DDL)
While warehouse is running only (lost on suspend)
Invalidation trigger
DML or DDL on underlying tables; 24-hour expiry
Warehouse suspend or terminate
Non-deterministic functions
Blocks cache hit β€” query must re-execute
βœ“No impact β€” caches raw data regardless of function use
Role context required?
Yes β€” same role as original cached query
βœ“No β€” role has no bearing on SSD data cache
Exact SQL match required?
Yes β€” byte-for-byte identical SQL text
βœ“No β€” caches at data block level, not query level
User configuration required?
None β€” always active, cannot be disabled
None β€” automatic; auto-suspend tuning affects warming

Cache Invalidation Rules

⚠️Cache Invalidation β€” Common Exam Trap

A frequently tested scenario: a query runs, returns results in 100ms (result cache hit). Then someone runs an INSERT on the underlying table. The very next identical query will NOT hit the result cache β€” the DML has invalidated it. This catches candidates who assume the 24-hour window is absolute.

Result Cache Invalidation

The result cache is invalidated for a specific query when:

  1. Any DML (INSERT, UPDATE, DELETE, MERGE, COPY INTO) is executed on any table referenced by the query
  2. DDL changes to the table (ALTER TABLE, DROP COLUMN, etc.)
  3. 24 hours have elapsed since the result was cached (even without any data changes)

Local Disk Cache Invalidation

The local disk cache is lost when:

  1. The virtual warehouse is suspended (even briefly)
  2. The virtual warehouse is terminated
  3. For multi-cluster warehouses: when an individual cluster scales down

Metadata Cache Invalidation

The metadata cache is not invalidated by users β€” Snowflake updates it automatically and continuously. When DML occurs, the metadata for affected micro-partitions is updated immediately. The metadata cache is never β€œstale” from a correctness standpoint.

Testing Cache Behaviour β€” Practical Examples
1-- ============================================================
2-- RESULT CACHE TESTING
3-- ============================================================
4
5-- Run this query first β€” it executes against the warehouse
6SELECT region, SUM(sales_amount) AS total_sales
7FROM fact_sales
8WHERE sale_date BETWEEN '2024-01-01' AND '2024-03-31'
9GROUP BY region
10ORDER BY total_sales DESC;
11
12-- Run the EXACT same query again β€” should hit result cache
13-- Check Query Profile: "Query Result Reuse" indicator
14SELECT region, SUM(sales_amount) AS total_sales
15FROM fact_sales
16WHERE sale_date BETWEEN '2024-01-01' AND '2024-03-31'
17GROUP BY region
18ORDER BY total_sales DESC;
19
20-- Inserting data invalidates the result cache for fact_sales
21INSERT INTO fact_sales VALUES (DEFAULT, 'NORTH', 99999, '2024-01-01');
22
23-- Now the same query will NOT hit result cache β€” cache invalidated
24SELECT region, SUM(sales_amount) AS total_sales
25FROM fact_sales
26WHERE sale_date BETWEEN '2024-01-01' AND '2024-03-31'
27GROUP BY region
28ORDER BY total_sales DESC;
29
30-- ============================================================
31-- METADATA CACHE EXAMPLES β€” NO WAREHOUSE NEEDED
32-- ============================================================
33
34-- COUNT(*) on large table β€” served from metadata cache
35-- No warehouse credit consumption
36SELECT COUNT(*) FROM fact_sales;
37
38-- MIN/MAX on well-clustered column β€” from metadata
39SELECT MIN(sale_date), MAX(sale_date) FROM fact_sales;
40
41-- ============================================================
42-- CHECKING CACHE USAGE IN QUERY HISTORY
43-- ============================================================
44
45SELECT
46query_id,
47query_text,
48execution_status,
49-- This column indicates result cache reuse
50-- 'true' means the result was served from the result cache
51query_tag,
52start_time,
53end_time,
54total_elapsed_time / 1000 AS elapsed_seconds
55FROM SNOWFLAKE.ACCOUNT_USAGE.QUERY_HISTORY
56WHERE start_time >= DATEADD('hour', -1, CURRENT_TIMESTAMP())
57ORDER BY start_time DESC
58LIMIT 20;

Practical Cache Strategies

πŸ’‘Practical Tip β€” Standardise SQL Formatting for Result Cache Hits

Because the result cache requires byte-for-byte identical SQL, teams that generate queries programmatically (via BI tools, dbt, application code) should standardise SQL formatting: consistent capitalisation, whitespace, and schema qualification. Even a trailing space or different alias can prevent a cache hit. Many BI tools have a β€œSQL normalisation” or β€œcache key” setting for this purpose.

When to Rely on Each Cache

ScenarioBest CacheStrategy
Dashboard with fixed date-range queriesResult CacheStandardise SQL; same role for all dashboard users
Large fact table scanned repeatedly in one sessionLocal Disk CacheKeep warehouse running; increase auto-suspend delay
COUNT(*) and simple aggregationsMetadata CacheNo action needed β€” automatic
Ad-hoc analytical queries (different every time)Local Disk CacheWarm warehouse; consider dedicated warehouse per team
Queries with CURRENT_DATE in WHERE clauseLocal Disk CacheResult cache won’t fire (date changes daily)

Cheat Sheet β€” Caching for the Exam

πŸ“‹
Quick Reference

Snowflake Caching β€” COF-C02 Quick Reference

⚑
Result Cache
Location
Cloud Services layerβ€” Centralised, account-wide
Duration
Up to 24 hoursβ€” Reset by DML/DDL on tables
Credit cost
Zeroβ€” No warehouse activation needed
SQL match
Byte-for-byte identicalβ€” Case and whitespace sensitive
Role required
Same role contextβ€” Different role = cache miss
Blocked by
Non-deterministic functionsβ€” CURRENT_TIMESTAMP, RANDOM, etc.
Shared across warehouses
Yesβ€” Any warehouse can benefit
User config needed
Noneβ€” Always active, cannot disable
πŸ’Ύ
Local Disk Cache
Location
Virtual warehouse SSDβ€” Per-warehouse, not shared
Duration
While warehouse is runningβ€” Lost on suspend/terminate
What is stored
Raw micro-partition filesβ€” Columnar data blocks
Credit cost
Reduced vs cloud fetchβ€” Warehouse still active
Shared across warehouses
Noβ€” Each warehouse independent
Invalidation
Warehouse suspend/terminateβ€” Not by DML
Warming strategy
Run queries on same dataβ€” Keep warehouse alive
Multi-cluster
Per-cluster, not sharedβ€” Each cluster has own SSD
πŸ“Š
Metadata Cache
Location
Cloud Services layerβ€” Centralised, always available
Duration
Persistentβ€” Snowflake managed, always current
What is stored
Row counts, min/max, nullsβ€” Per column per micro-partition
Credit cost
Zeroβ€” No warehouse for metadata queries
Updates
Automatic on DML/DDLβ€” Always accurate
Used for
Pruning + query optimisationβ€” Before execution begins
COUNT(*) queries
Served from metadataβ€” No warehouse needed
User management
None requiredβ€” Fully Snowflake managed

Knowledge Check β€” Quizzes

Caching

A BI dashboard runs the same SQL query every 5 minutes using the same Snowflake role. After the first run, subsequent executions return results in under 100 milliseconds. No one has modified the underlying tables. Which cache is responsible?

Caching

A virtual warehouse processes a large query that scans 500 GB of data from cloud storage. The warehouse then auto-suspends due to inactivity. Two hours later, the warehouse resumes and runs the same query again. What is the expected behaviour?

Caching

Which of the following SQL queries will ALWAYS be ineligible for the result cache, regardless of when it runs or whether the underlying data has changed?


Flashcards β€” Caching

Caching
QUESTION

What are the three caching layers in Snowflake?

Click to reveal answer
ANSWER

1. Result Cache (Query Result Cache) β€” stores complete result sets in Cloud Services for up to 24 hours, zero credits on hit. 2. Local Disk Cache β€” stores raw micro-partition data on each warehouse's SSD, persists while warehouse runs. 3. Metadata Cache β€” stores table/column statistics (row counts, min/max, nulls) in Cloud Services, persistent and Snowflake-managed.

Click to see question
Caching
QUESTION

What five conditions must ALL be true for a result cache hit to occur?

Click to reveal answer
ANSWER

1. SQL text must be byte-for-byte identical (case and whitespace sensitive). 2. Cached result must be less than 24 hours old. 3. Underlying tables must not have been modified by DML or DDL since caching. 4. Query must not contain non-deterministic functions (CURRENT_TIMESTAMP, RANDOM, etc.). 5. Executing user must be operating under the same role context as the original query.

Click to see question
Caching
QUESTION

What happens to the local disk cache when a virtual warehouse is suspended?

Click to reveal answer
ANSWER

The local disk cache is completely lost. All micro-partition data cached on the warehouse's SSD is discarded. When the warehouse resumes, it starts with a cold cache and must re-fetch all required micro-partitions from cloud object storage on the next query execution.

Click to see question
Caching
QUESTION

Do two different virtual warehouses share the local disk cache?

Click to reveal answer
ANSWER

No. The local disk cache is strictly per-warehouse. Each virtual warehouse has its own independent SSD layer. If Warehouse A caches micro-partitions for Table X, Warehouse B cannot access those cached partitions β€” it must fetch and cache them independently from cloud storage.

Click to see question
Caching
QUESTION

Which types of queries can be answered entirely from the metadata cache without activating a virtual warehouse?

Click to reveal answer
ANSWER

Queries answered from metadata with no warehouse needed: SELECT COUNT(*) FROM table with no WHERE clause (row count in metadata), SELECT MIN(col), MAX(col) on clustering key columns (min/max in metadata per partition), SHOW TABLES / SHOW SCHEMAS / SHOW COLUMNS (metadata only), and DESCRIBE TABLE. All of these consume zero compute credits.

Click to see question

Summary

Snowflake’s three caching layers form a tiered defence against unnecessary computation and cloud storage reads:

  • Result Cache β€” the most powerful, serving entire result sets at zero cost, but requiring strict conditions (identical SQL, same role, no DML, no non-deterministic functions, within 24 hours)
  • Local Disk Cache β€” reduces cloud storage I/O by keeping frequently accessed micro-partitions on warehouse SSD, but is per-warehouse and ephemeral (lost on suspend)
  • Metadata Cache β€” always-on, Snowflake-managed statistics that power pruning and answer simple aggregate queries without a warehouse

For the exam, the most critical distinctions are: the result cache is centralised and zero cost; the local disk cache is per-warehouse and ephemeral; and the metadata cache is persistent and automatic. Understanding exactly what invalidates each cache β€” and when each will NOT fire β€” is the key to answering exam scenario questions correctly.

Reinforce what you just read

Study the All flashcards with spaced repetition to lock it in.

Study flashcards β†’