📖

Key Terms — Query Performance

Query Profile

Snowsight's visual execution plan for a completed query — shows operator tree, timing, pruning stats, and spilling information.

—

Micro-Partition Pruning

Snowflake's automatic optimisation that skips micro-partitions whose metadata (min/max) indicates they cannot match the query's WHERE clause.

—

Spilling

When query intermediate data exceeds warehouse memory and overflows to local SSD, then to remote cloud storage — a key performance warning sign.

—

Result Cache

A 24-hour Cloud Services cache that returns identical query results instantly with zero warehouse credits.

—

EXPLAIN

SQL command that shows the logical execution plan without actually running the query — no credits consumed.

—

Scale Up

Increasing warehouse size to provide more memory/compute per query — fixes spilling.

—

Scale Out

Adding multi-cluster capacity to handle more concurrent queries — fixes queueing.

Query Lifecycle

Every SQL statement in Snowflake follows the same execution path through three layers:

Query Execution Flow

Flow diagram: 1) User submits SQL. 2) Cloud Services Layer parses, optimises, and creates execution plan. 3) Virtual Warehouse executes the plan against micro-partitions in cloud storage. 4) Results returned via Cloud Services Layer. Each step labelled with what happens: parse/compile, optimise/prune metadata, distribute to compute nodes, scan/join/aggregate, return result set.

Flow diagram showing SQL submission through Cloud Services parsing and optimisation, then execution on Virtual Warehouse compute nodes, then results returned

Cloud Services — parses SQL, resolves objects, checks privileges, generates optimised execution plan, determines which micro-partitions to scan
Virtual Warehouse — executes the plan: scans micro-partitions, applies filters, performs joins, aggregations, and sorts
Cloud Services — returns results to client, caches result for potential reuse

ℹ️No Indexes, No VACUUM, No Manual Tuning

Unlike traditional databases, Snowflake has no indexes to create, no statistics to update, and no VACUUM to run. The query optimiser uses micro-partition metadata (min/max, NDV, null counts) automatically. Your main levers for performance are: warehouse sizing, clustering keys, and writing efficient SQL.

The Query Profile

The Query Profile in Snowsight is your most important diagnostic tool. It shows the actual execution plan as a tree of operator nodes.

Query Profile Operator Tree

Tree diagram of a typical Query Profile: Result node at top, below it an Aggregate node, feeding into a Join node with two branches — left branch shows TableScan on ORDERS with Filter, right branch shows TableScan on CUSTOMERS. Each node shows percentage of total time, rows processed, bytes scanned. The Join node consumes 45% of time. The TableScan shows partitions scanned 120 of 5000 total, indicating good pruning.

Key Operator Nodes

Node	Purpose
TableScan	Reads micro-partitions from storage — shows pruning stats
Filter	Applies WHERE clause predicates
Join	Combines data from multiple tables (hash join, merge join)
Aggregate	GROUP BY, COUNT, SUM operations
Sort	ORDER BY and merge join preparation
Exchange	Redistributes data between compute nodes — often the most expensive

What to Look For

Partitions scanned vs total — Is pruning effective?
Bytes spilled to local/remote storage — Is the warehouse too small?
Percentage bars — Which operators consume the most time?
Exploding joins — Does a join produce far more rows than its inputs?

Viewing Query Profile Programmatically

1-- Get the last query's ID
2SELECT LAST_QUERY_ID();
3
4-- Retrieve operator stats for any query
5SELECT *
6FROM TABLE(GET_QUERY_OPERATOR_STATS(LAST_QUERY_ID()));

Micro-Partition Pruning

Pruning is Snowflake’s primary query optimisation. Every micro-partition stores column-level metadata (min/max values). When your WHERE clause specifies a range or equality predicate, Snowflake compares the filter against each partition’s metadata and skips partitions that cannot contain matching rows.

Micro-Partition Pruning in Action

Eight micro-partitions labelled MP1-MP8 with date ranges. A WHERE clause for dates in February prunes 6 partitions (greyed out) and scans only 2 partitions (highlighted green). Shows: Partitions scanned 2 of 8 total — 75% pruning efficiency.

Diagram showing eight micro-partitions with date ranges, WHERE clause pruning six and scanning two

⚠️Pruning Killer: Functions on WHERE Columns

Applying a function to a column in WHERE disables pruning on that column. WHERE YEAR(order_date) = 2024 prevents Snowflake from using min/max metadata. Rewrite as WHERE order_date >= '2024-01-01' AND order_date < '2025-01-01' to preserve pruning.

Checking Pruning Effectiveness

1-- Run a filtered query and check Query Profile for partition stats
2SELECT order_id, customer_id, order_date, total_amount
3FROM sales.orders
4WHERE order_date BETWEEN '2024-06-01' AND '2024-06-30'
5AND region = 'EMEA';
6
7-- In Query Profile, click TableScan node:
8-- Partitions scanned: 150 / Partitions total: 12,000
9-- = 98.75% of partitions pruned

Spilling to Disk

When a query’s intermediate data exceeds warehouse memory, Snowflake spills data through a hierarchy:

Spilling Hierarchy

Three-tier pyramid: Top (green) = Memory/RAM (fastest, all processing ideally here). Middle (amber) = Local SSD (slower, data spills when memory exceeded). Bottom (red) = Remote Cloud Storage S3/Blob/GCS (slowest, severe performance degradation). Arrows point downward labelled Overflow.

Pyramid showing Memory at top, Local SSD in middle, Remote Storage at bottom

🔑Fixing Spilling: Scale Up the Warehouse

The primary fix for spilling is to increase warehouse size (e.g., Medium to Large). Each size doubles memory and compute. If spilling persists, also optimise the query: reduce data volume with better filters, avoid SELECT *, or break into smaller steps with temporary tables.

Finding Queries That Spill

1SELECT query_id, query_text, warehouse_name, warehouse_size,
2     execution_time / 1000 AS exec_seconds,
3     bytes_spilled_to_local_storage,
4     bytes_spilled_to_remote_storage
5FROM snowflake.account_usage.query_history
6WHERE bytes_spilled_to_remote_storage > 0
7ORDER BY bytes_spilled_to_remote_storage DESC
8LIMIT 20;

Query History

INFORMATION_SCHEMA vs ACCOUNT_USAGE Query History

Feature

INFORMATION_SCHEMA.QUERY_HISTORY

ACCOUNT_USAGE.QUERY_HISTORY

Retention

14 days

✓365 days

Latency

✓Real-time (no delay)

Up to 45 minutes

Scope

Current database only

✓Entire account

Dropped objects

Not available after drop

✓Available for full retention

Privileges

✓Any role with database access

IMPORTED PRIVILEGES on SNOWFLAKE db

🎯Exam Focus: 14 Days vs 365 Days

The exam tests the difference: INFORMATION_SCHEMA = 14 days, real-time, database-scoped. ACCOUNT_USAGE = 365 days, 45-minute latency, full account.

EXPLAIN Command

Using EXPLAIN

1-- Logical plan without execution (zero credits)
2EXPLAIN USING TABULAR
3SELECT c.name, SUM(o.total) AS total_spend
4FROM customers c
5JOIN orders o ON c.id = o.customer_id
6WHERE o.order_date >= '2024-01-01'
7GROUP BY c.name
8ORDER BY total_spend DESC;
9
10-- EXPLAIN shows operators and expected rows
11-- Query Profile shows ACTUAL execution after completion

Warehouse Sizing Strategy

Warehouse Sizing Decision Process

Identify the Problem

Check Query Profile for spilling, poor pruning, or queue times. Examine top 10 slowest queries via ACCOUNT_USAGE.

💡Look for bytes_spilled_to_remote_storage > 0 first — that's the biggest red flag.

Spilling? Scale Up

Increase warehouse size (Medium to Large, Large to X-Large). Each step doubles memory and compute.

SQL

ALTER WAREHOUSE analytics_wh SET WAREHOUSE_SIZE = 'X-LARGE';

Queueing? Scale Out

Enable multi-cluster warehouses (Enterprise+). Adds clusters for concurrent queries.

SQL

ALTER WAREHOUSE analytics_wh SET
  MAX_CLUSTER_COUNT = 3
  SCALING_POLICY = 'STANDARD';

Over-provisioned? Scale Down

If queries finish in seconds with no spilling, use a smaller warehouse to save credits.

SQL

ALTER WAREHOUSE dashboard_wh SET WAREHOUSE_SIZE = 'X-SMALL';

Scale Up vs Scale Out

Feature

Scale Up (Larger Warehouse)

Scale Out (Multi-Cluster)

Problem solved

Slow individual queries, spilling

Too many concurrent queries queueing

Effect on single query

✓Faster — more memory/compute per query

No change — same per-query resources

Effect on concurrency

Minimal improvement

✓Significant — queries distributed across clusters

Cost impact

Doubles credits per size step

Linear with clusters added

Best for

ETL, large aggregations, heavy joins

Dashboards, BI, many concurrent users

Common Anti-Patterns

⚠️Performance Anti-Patterns

SELECT * on large tables — reads all columns, Snowflake is columnar — specify only needed columns
Functions on WHERE columns — disables pruning: WHERE YEAR(date) = 2024
Unnecessary DISTINCT — forces expensive sort/aggregate
Cartesian joins — missing join conditions
Wrong data types — VARCHAR compared to NUMBER forces implicit casting
Overly complex CTEs — deeply nested CTEs can confuse the optimiser

Cheat Sheet

📋

Quick Reference

Query Performance Quick Reference

🔍

Query Profile

Access

Snowsight → Query History → select query → Query Profile

Pruning

TableScan node: partitions scanned vs total

Spilling

Sort/Join nodes: bytes spilled to local/remote

⚡

Warehouse Sizing

Spilling

Scale UP — increase warehouse size

Queueing

Scale OUT — multi-cluster (Enterprise+)

Result Cache

Same SQL + unchanged data = free, 24h TTL— No warehouse needed

📊

Query History

INFO_SCHEMA

14 days, real-time, database scope

ACCOUNT_USAGE

365 days, 45-min latency, full account

Practice Quiz

Query Performance

A Sort operator is spilling 50 GB to remote storage. What is the MOST appropriate fix?

Query Performance

A user runs the same SELECT twice in 10 minutes. Data unchanged. Warehouse is suspended. What happens on the second run?

Query Performance

A query filters WHERE YEAR(created_at) = 2024. Only 200 of 10,000 partitions are pruned. What is the best fix?

Flashcards

Query Performance

QUESTION

What are the two spilling locations and which is worse?

Click to reveal answer

ANSWER

First: local SSD (attached to warehouse nodes). Second: remote cloud storage (S3/Blob/GCS). Remote spilling is much slower and indicates the warehouse should be scaled up immediately.

Click to see question

Query Performance

QUESTION

What three conditions must be met for the result cache to fire?

Click to reveal answer

ANSWER

1) Exact same SQL text. 2) Underlying data unchanged since caching. 3) Same role and context (database, schema). Cache lasts 24 hours. No warehouse needed — served from Cloud Services.

Click to see question

Query Performance

QUESTION

INFORMATION_SCHEMA vs ACCOUNT_USAGE: retention and latency?

Click to reveal answer

ANSWER

INFORMATION_SCHEMA: 14 days, real-time, current database. ACCOUNT_USAGE: 365 days, up to 45-minute latency, full account scope.

Click to see question

Query Performance

QUESTION

How does EXPLAIN differ from Query Profile?

Click to reveal answer

ANSWER

EXPLAIN shows the logical plan BEFORE execution — zero credits. Query Profile shows ACTUAL execution AFTER completion — real timing, data volumes, and spilling.

Click to see question

Query Performance

QUESTION

When should you scale up vs scale out?

Click to reveal answer

ANSWER

Scale UP: individual queries are slow or spilling (more memory per query). Scale OUT (multi-cluster): many concurrent queries queueing (more clusters for concurrency).

Click to see question

Resources

Next Steps

Reinforce what you just read

Study the All flashcards with spaced repetition to lock it in.

Study flashcards →