Skip to main content
πŸ“–

Key Terms β€” Snowflake SQL Essentials

VAR

VARIANT

A Snowflake data type that can store any semi-structured data including JSON, Avro, Parquet, and XML. It holds up to 16 MB of compressed data per value.

β€”

FLATTEN

A Snowflake table function used with LATERAL to explode arrays or objects in a VARIANT column into individual rows.

WF

Window Function

A SQL function that performs a calculation across a set of rows related to the current row (a window), without collapsing the result set like GROUP BY.

β€”

QUALIFY

A Snowflake-specific SQL clause that filters rows based on the result of a window function, avoiding the need for a subquery or CTE.

β€”

PIVOT

A SQL operator that rotates rows into columns, turning distinct values in one column into separate column headers (wide format).

β€”

UNPIVOT

The reverse of PIVOT β€” rotates columns into rows, converting wide-format data into a tall/narrow format.

β€”

GENERATOR

A Snowflake table function that produces synthetic rows without reading from any source table, useful for testing and data generation.

β€”

TABLESAMPLE

A SQL clause that returns a random statistical sample of rows from a table, specified as a percentage.

β€”

PARSE_JSON

A Snowflake function that converts a JSON-formatted string into a VARIANT value for semi-structured data processing.

MR

MATCH_RECOGNIZE

A SQL clause that performs row-pattern recognition, enabling complex event processing across sequences of rows.


Semi-Structured Data and the VARIANT Type

Snowflake is built for both structured and semi-structured data. The VARIANT data type is the cornerstone of Snowflake’s semi-structured data support, allowing you to store and query JSON, Avro, Parquet, ORC, and XML natively without needing to pre-define a schema.

🎯Exam Focus: VARIANT Fundamentals

The COF-C02 exam heavily tests VARIANT behaviour. Remember these key facts:

  • VARIANT can store any semi-structured data format (JSON, Avro, Parquet, ORC, XML)
  • Maximum size: 16 MB per row of compressed VARIANT data
  • Path access uses the colon (:) operator, not dot notation at the top level
  • Casting uses the double-colon (::) operator: col:key::STRING
  • VARIANT preserves the original data type internally (string vs number vs boolean)

Accessing Data Within a VARIANT Column

Snowflake provides several ways to traverse VARIANT data. The most common is the path operator (:) which navigates nested structures using key names.

VARIANT Path Access and Type Casting
1-- Assume a table: events(id INT, payload VARIANT)
2-- where payload contains: {"user": {"id": 42, "name": "Alice"}, "tags": ["a","b"]}
3
4-- Basic path access β€” returns VARIANT
5SELECT payload:user:name FROM events;
6
7-- Cast to a specific SQL type using ::
8SELECT
9payload:user:id::NUMBER AS user_id,
10payload:user:name::STRING AS user_name,
11payload:active::BOOLEAN AS is_active
12FROM events;
13
14-- GET() function β€” equivalent to : operator
15SELECT GET(payload, 'user') FROM events;
16
17-- GET_PATH() function β€” dot-notation path
18SELECT GET_PATH(payload, 'user.name')::STRING FROM events;
19
20-- TYPEOF() β€” inspect the internal data type of a VARIANT value
21SELECT
22TYPEOF(payload:user:id) AS id_type, -- returns 'integer'
23TYPEOF(payload:user:name) AS name_type, -- returns 'string'
24TYPEOF(payload:tags) AS tags_type -- returns 'array'
25FROM events;
26
27-- PARSE_JSON() β€” convert a string to VARIANT
28SELECT PARSE_JSON('{"key": "value", "count": 99}') AS parsed;
29
30-- OBJECT_CONSTRUCT() β€” build a VARIANT object from key-value pairs
31SELECT OBJECT_CONSTRUCT('id', 1, 'name', 'Alice', 'score', 99.5) AS obj;
32
33-- ARRAY_CONSTRUCT() β€” build a VARIANT array
34SELECT ARRAY_CONSTRUCT(10, 20, 30, 'text', NULL) AS arr;

VARIANT Data Model β€” Path Access

How Snowflake stores a VARIANT column as a self-describing columnar structure internally (ELT storage), and how the colon operator traverses nested keys at query time. Each leaf value is stored with its native type tag, enabling efficient casting without full JSON parsing at runtime.

Diagram showing a JSON document stored as a VARIANT column, with colon path operators traversing nested user and tags keys, and double-colon cast operators converting leaf values to SQL types
ℹ️VARIANT Storage: Columnar Under the Hood

Although VARIANT appears to store raw JSON as a blob, Snowflake internally decomposes it into a columnar representation. This means queries that access a specific path (e.g., payload:user:id) can benefit from column pruning and micro-partition pruning β€” they do not need to scan the full VARIANT blob for every row.


The FLATTEN Function

FLATTEN is a Snowflake table function that takes an array or object within a VARIANT column and expands it into multiple rows β€” one per element. It is used with LATERAL to correlate each exploded row back to its parent row.

FLATTEN β€” Exploding Arrays and Objects
1-- Table: users(id INT, profile VARIANT)
2-- profile: {"name": "Bob", "tags": ["admin","editor","viewer"]}
3
4-- Explode the tags array β€” one row per tag
5SELECT
6u.id,
7u.profile:name::STRING AS user_name,
8f.value::STRING AS tag,
9f.index AS tag_position
10FROM users u,
11LATERAL FLATTEN(INPUT => u.profile:tags) f;
12
13-- FLATTEN columns available:
14-- f.VALUE β€” the element value (VARIANT)
15-- f.KEY β€” the key name (for objects, NULL for arrays)
16-- f.INDEX β€” the zero-based position in the array
17-- f.PATH β€” full path to the element from the root
18-- f.SEQ β€” sequence number unique per input row
19-- f.THIS β€” the containing array/object
20
21-- Flattening an OBJECT (key-value pairs)
22SELECT f.key, f.value::STRING
23FROM (SELECT PARSE_JSON('{"a":1,"b":2,"c":3}') AS obj),
24LATERAL FLATTEN(INPUT => obj) f;
25
26-- OUTER => TRUE β€” keep rows even when the array is NULL or empty
27SELECT u.id, f.value::STRING AS tag
28FROM users u,
29LATERAL FLATTEN(INPUT => u.profile:tags, OUTER => TRUE) f;
30
31-- RECURSIVE => TRUE β€” flatten nested arrays/objects recursively
32SELECT f.path, f.value
33FROM (SELECT PARSE_JSON('{"a":{"b":{"c":42}}}') AS nested),
34LATERAL FLATTEN(INPUT => nested, RECURSIVE => TRUE) f;
πŸ’‘OUTER => TRUE for Missing Arrays

If a row has a NULL or missing array path, a standard FLATTEN will silently drop that row from results. Use OUTER => TRUE to retain parent rows with no array elements β€” the FLATTEN output columns will be NULL for those rows.


Window Functions

Window functions perform calculations across a window of rows related to the current row. Unlike GROUP BY, they do not collapse the result set β€” every input row remains in the output.

Window Function Anatomy

Breakdown of a window function expression: the function name (ROW_NUMBER, RANK, SUM, etc.), the OVER clause, the optional PARTITION BY that defines sub-groups, the ORDER BY that sorts within each partition, and the optional frame specification (ROWS BETWEEN / RANGE BETWEEN) that limits which rows are included in the calculation.

Annotated SQL diagram showing FUNCTION_NAME() OVER (PARTITION BY col1 ORDER BY col2 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW), with each clause labelled and explained
Window Function Examples β€” Ranking and Navigation
1-- ROW_NUMBER(): unique sequential number per partition
2SELECT
3order_id,
4customer_id,
5order_date,
6ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
7FROM orders;
8
9-- RANK(): same rank for ties, then skips numbers (1,1,3,4...)
10SELECT
11product_id, sales,
12RANK() OVER (ORDER BY sales DESC) AS sales_rank
13FROM product_sales;
14
15-- DENSE_RANK(): same rank for ties, no gaps (1,1,2,3...)
16SELECT
17product_id, sales,
18DENSE_RANK() OVER (ORDER BY sales DESC) AS dense_rank
19FROM product_sales;
20
21-- LEAD() and LAG() β€” access rows ahead of or behind the current row
22SELECT
23order_date,
24revenue,
25LAG(revenue, 1, 0) OVER (ORDER BY order_date) AS prev_day_revenue,
26LEAD(revenue, 1, 0) OVER (ORDER BY order_date) AS next_day_revenue,
27revenue - LAG(revenue, 1, 0) OVER (ORDER BY order_date) AS day_over_day_change
28FROM daily_revenue;
29
30-- FIRST_VALUE() and LAST_VALUE() within a window frame
31SELECT
32department,
33employee,
34salary,
35FIRST_VALUE(salary) OVER (
36 PARTITION BY department ORDER BY salary DESC
37 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
38) AS dept_max_salary,
39LAST_VALUE(salary) OVER (
40 PARTITION BY department ORDER BY salary DESC
41 ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
42) AS dept_min_salary
43FROM employees;
44
45-- Running total and moving average
46SELECT
47order_date,
48daily_sales,
49SUM(daily_sales) OVER (ORDER BY order_date
50 ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS running_total,
51AVG(daily_sales) OVER (ORDER BY order_date
52 ROWS BETWEEN 6 PRECEDING AND CURRENT ROW) AS seven_day_avg
53FROM daily_sales;
πŸ“‹
Quick Reference

Window Function Quick Reference

πŸ†
Ranking Functions
ROW_NUMBER()
Unique sequential integer per partition β€” no tiesβ€” Always unique, even for equal values
RANK()
Rank with gaps after ties (1,1,3,4...)β€” Skips the next number after a tie
DENSE_RANK()
Rank without gaps after ties (1,1,2,3...)β€” No skipped numbers
NTILE(n)
Divides rows into n equal-sized buckets (quartiles, deciles)β€” Returns bucket number 1..n
PERCENT_RANK()
Relative rank as 0.0 to 1.0 within partitionβ€” (rank-1)/(total rows-1)
CUME_DIST()
Cumulative distribution β€” fraction of rows <= currentβ€” Always > 0, always <= 1
πŸ”­
Navigation Functions
LAG(col, n, def)
Value from n rows before current rowβ€” Default returned if no prior row
LEAD(col, n, def)
Value from n rows after current rowβ€” Default returned if no next row
FIRST_VALUE(col)
First value in the window frameβ€” Frame spec matters β€” use UNBOUNDED
LAST_VALUE(col)
Last value in the window frameβ€” Defaults to CURRENT ROW β€” specify frame!
NTH_VALUE(col, n)
nth value in the window frameβ€” 1-based index
πŸ“Š
Aggregate Window Functions
SUM() OVER
Running total or partition sumβ€” Add ORDER BY for running total
AVG() OVER
Moving average across frameβ€” ROWS BETWEEN for fixed-size windows
COUNT() OVER
Count of rows in windowβ€” Useful for partition size
MIN() / MAX() OVER
Partition min/max without collapsing rows
πŸͺŸ
Frame Specification
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
All rows from start of partition to current rowβ€” Running total
ROWS BETWEEN 6 PRECEDING AND CURRENT ROW
Last 7 rows including currentβ€” 7-day moving window
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
All rows in the partitionβ€” Equivalent to partition aggregate
RANGE BETWEEN
Includes rows with equal ORDER BY valuesβ€” Default frame type when ORDER BY present

QUALIFY β€” Snowflake’s Filter on Window Functions

QUALIFY is a Snowflake-specific SQL clause that filters rows based on the result of a window function. It is evaluated after window functions are computed and after HAVING, which means you can reference the window function result directly without wrapping the query in a subquery or CTE.

🎯QUALIFY Is Snowflake-Specific

QUALIFY does not exist in standard SQL or most other databases (though DuckDB has added it). It is a favourite exam topic because it demonstrates Snowflake’s SQL extensions. Know the query execution order: FROM β†’ WHERE β†’ GROUP BY β†’ HAVING β†’ WINDOW FUNCTIONS β†’ QUALIFY β†’ SELECT β†’ ORDER BY β†’ LIMIT.

QUALIFY β€” Deduplication and Top-N Filtering
1-- Get the most recent order per customer (deduplication)
2-- WITHOUT QUALIFY β€” requires subquery:
3SELECT * FROM (
4SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) AS rn
5FROM orders
6) WHERE rn = 1;
7
8-- WITH QUALIFY β€” cleaner single-level query:
9SELECT *
10FROM orders
11QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) = 1;
12
13-- Top 3 products per category by sales
14SELECT category, product_id, sales
15FROM product_sales
16QUALIFY DENSE_RANK() OVER (PARTITION BY category ORDER BY sales DESC) <= 3;
17
18-- Remove duplicate rows keeping the row with highest id
19SELECT *
20FROM raw_events
21QUALIFY ROW_NUMBER() OVER (PARTITION BY event_key ORDER BY id DESC) = 1;
22
23-- QUALIFY with a WHERE clause β€” WHERE runs before QUALIFY
24SELECT *
25FROM orders
26WHERE order_status = 'completed'
27QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) = 1;

QUALIFY vs Subquery β€” Deduplication Approaches

Feature
QUALIFY (Snowflake)
Subquery / CTE
Syntax complexity
βœ“Single SELECT statement β€” no nesting
Requires subquery or CTE wrapper
Standard SQL
Snowflake extension β€” not portable
βœ“Standard SQL β€” portable across databases
Readability
βœ“Highly readable β€” intent is clear
Can be verbose for simple deduplication
Performance
Same execution plan as equivalent subquery
Same execution plan as QUALIFY
Use in views
Fully supported in Snowflake views
Fully supported
Filter on multiple WFs
βœ“Can use multiple window functions in QUALIFY
Requires aliasing or multiple subqueries
Exam relevance
βœ“Tested as a Snowflake-specific feature
Baseline SQL knowledge only

PIVOT and UNPIVOT

PIVOT and UNPIVOT are SQL operators for reshaping data between wide (pivoted) and tall (normalised) formats. Snowflake supports both as part of the FROM clause.

PIVOT and UNPIVOT Examples
1-- Source data: sales(region TEXT, month TEXT, amount NUMBER)
2-- rows: ('North','Jan',100), ('North','Feb',200), ('South','Jan',150)...
3
4-- PIVOT: rotate month values into columns
5SELECT *
6FROM sales
7PIVOT(SUM(amount) FOR month IN ('Jan', 'Feb', 'Mar', 'Apr'))
8AS p (region, jan_sales, feb_sales, mar_sales, apr_sales);
9-- Result: one row per region, columns jan_sales, feb_sales, mar_sales, apr_sales
10
11-- UNPIVOT: rotate columns back into rows
12-- Source: monthly_sales(region TEXT, jan NUMBER, feb NUMBER, mar NUMBER)
13SELECT region, month_name, monthly_amount
14FROM monthly_summary
15UNPIVOT(monthly_amount FOR month_name IN (jan, feb, mar));
16-- Result: three rows per region with month_name and monthly_amount
17
18-- Dynamic PIVOT using Snowflake scripting (for unknown category values)
19-- Step 1: get distinct months
20SET months = (
21SELECT '[''' || LISTAGG(DISTINCT month, ''',''') || ''']'
22FROM sales
23);
24-- Step 2: use in PIVOT (requires dynamic SQL via Stored Procedure in practice)
25
26-- ANY keyword for dynamic pivot (Snowflake 8.x+)
27SELECT *
28FROM sales
29PIVOT(SUM(amount) FOR month IN (ANY ORDER BY month));
ℹ️PIVOT Column Values Must Be Static

In standard PIVOT syntax, the IN (...) list must contain literal values known at query-write time. For truly dynamic pivots where you do not know the column values in advance, you need to build the SQL string dynamically using a Stored Procedure or Snowflake Scripting. Snowflake’s IN (ANY) extension (introduced in newer versions) allows dynamic pivoting without scripting.


GENERATOR and TABLESAMPLE

These two features are useful for testing, data generation, and sampling β€” and they appear in the COF-C02 exam.

GENERATOR and TABLESAMPLE
1-- GENERATOR: produce synthetic rows
2-- ROWCOUNT => number of rows to generate
3-- SEQ4() or SEQ8() provide unique sequential integers
4SELECT SEQ4() AS id,
5 UNIFORM(1, 100, RANDOM())::INT AS score,
6 DATEADD('day', SEQ4(), '2024-01-01')::DATE AS event_date
7FROM TABLE(GENERATOR(ROWCOUNT => 1000000))
8LIMIT 10;
9
10-- GENERATOR with TIMELIMIT β€” run for N seconds instead of fixed row count
11SELECT SEQ4() AS id, RANDOM() AS val
12FROM TABLE(GENERATOR(TIMELIMIT => 5));
13
14-- TABLESAMPLE: return a random sample of rows
15-- Percentage-based (block sampling β€” by micro-partition)
16SELECT * FROM large_table SAMPLE (10); -- approx 10% of rows
17
18-- Row-count based sampling
19SELECT * FROM large_table SAMPLE (1000 ROWS);
20
21-- Bernoulli sampling (row-level β€” more random, slower)
22SELECT * FROM large_table SAMPLE BERNOULLI (5);
23
24-- Block sampling (micro-partition level β€” faster, less random)
25SELECT * FROM large_table SAMPLE BLOCK (5);
26
27-- Reproducible sample with SEED
28SELECT * FROM orders SAMPLE (10) SEED (42);

GENERATOR vs TABLESAMPLE

GENERATOR creates new rows from scratch using Snowflake's virtual table function β€” no source table needed, output is controlled by ROWCOUNT or TIMELIMIT. TABLESAMPLE reads from an existing table and returns a statistical subset, either at the block (micro-partition) level for speed or at the Bernoulli (row) level for better randomness. Both operations happen within a single virtual warehouse.

Side-by-side diagram: left side shows GENERATOR producing rows ex nihilo with SEQ4 and RANDOM functions; right side shows TABLESAMPLE taking a percentage slice of micro-partitions or individual rows from an existing large_table

TIME_SLICE and Date/Time Functions

TIME_SLICE is a Snowflake-specific function for grouping timestamps into fixed-duration buckets β€” extremely useful for time-series analysis.

TIME_SLICE and Key Date/Time Functions
1-- TIME_SLICE: group timestamps into fixed intervals
2-- Syntax: TIME_SLICE(timestamp_col, slice_length, 'interval_unit')
3SELECT
4TIME_SLICE(event_ts, 15, 'MINUTE') AS fifteen_min_bucket,
5COUNT(*) AS event_count
6FROM events
7GROUP BY 1
8ORDER BY 1;
9
10-- Also supports HOUR, DAY, WEEK, MONTH, QUARTER, YEAR
11SELECT TIME_SLICE(order_date, 1, 'MONTH') AS month_start FROM orders;
12
13-- DATE_TRUNC: truncate a timestamp to a specified granularity
14SELECT
15DATE_TRUNC('month', CURRENT_TIMESTAMP), -- first of current month
16DATE_TRUNC('week', CURRENT_TIMESTAMP), -- Monday of current week
17DATE_TRUNC('hour', CURRENT_TIMESTAMP); -- current hour start
18
19-- DATEADD: add/subtract intervals
20SELECT
21DATEADD('day', 7, CURRENT_DATE), -- 7 days from now
22DATEADD('month', -3, CURRENT_DATE), -- 3 months ago
23DATEADD('hour', 12, CURRENT_TIMESTAMP);
24
25-- DATEDIFF: difference between two dates in specified unit
26SELECT
27DATEDIFF('day', '2024-01-01', '2024-12-31') AS days_diff, -- 364
28DATEDIFF('month', '2024-01-01', '2024-12-31') AS months_diff; -- 11
29
30-- DATE_PART and EXTRACT: extract components
31SELECT
32DATE_PART('year', CURRENT_DATE),
33DATE_PART('month', CURRENT_DATE),
34DATE_PART('dow', CURRENT_DATE), -- day of week (0=Sunday)
35EXTRACT(epoch FROM CURRENT_TIMESTAMP); -- seconds since 1970-01-01
36
37-- Timezone conversion
38SELECT
39CONVERT_TIMEZONE('UTC', 'Europe/London', CURRENT_TIMESTAMP) AS london_time,
40TO_TIMESTAMP_NTZ('2024-06-01 12:00:00') AS no_timezone_ts;

Conditional Expressions and NULL Handling

Conditional Expressions and NULL Handling Functions
1-- IFF: compact if-then-else (Snowflake-specific)
2SELECT IFF(salary > 50000, 'Senior', 'Junior') AS level FROM employees;
3
4-- CASE WHEN: standard SQL conditional
5SELECT
6CASE
7 WHEN score >= 90 THEN 'A'
8 WHEN score >= 80 THEN 'B'
9 WHEN score >= 70 THEN 'C'
10 ELSE 'F'
11END AS grade
12FROM student_scores;
13
14-- DECODE: value-based mapping (Oracle-style β€” Snowflake supports it)
15SELECT DECODE(status, 'A', 'Active', 'I', 'Inactive', 'Unknown') FROM users;
16
17-- NULL handling functions
18SELECT
19NVL(commission, 0) AS commission, -- replace NULL with 0
20NVL2(commission, 'Yes', 'No') AS has_commission, -- NVL2(col, not-null-val, null-val)
21NULLIF(revenue, 0) AS safe_revenue, -- return NULL if revenue = 0
22COALESCE(phone, mobile, email, 'none') AS contact,-- first non-null
23ZEROIFNULL(discount) AS discount, -- NULL -> 0
24NULLIFZERO(quantity) AS quantity -- 0 -> NULL
25FROM sales;
26
27-- String functions
28SELECT
29SPLIT('a,b,c,d', ',') AS arr, -- returns ARRAY VARIANT
30SPLIT_PART('hello.world.test', '.', 2) AS part, -- 'world'
31TRIM(' hello ') AS trimmed,
32LTRIM('---hello', '-') AS left_trim,
33CONTAINS('hello world', 'world') AS found, -- TRUE
34STARTSWITH('hello', 'hel') AS sw, -- TRUE
35ENDSWITH('hello', 'llo') AS ew, -- TRUE
36RPAD('abc', 6, '*') AS rpadded, -- 'abc***'
37LPAD('42', 5, '0') AS lpadded, -- '00042'
38CHARINDEX('world', 'hello world') AS pos -- 7
39FROM dual;
πŸ’‘NVL2 Argument Order β€” Easy to Mix Up

NVL2(expr, not_null_result, null_result) β€” the second argument is returned when expr IS NOT NULL, and the third argument when expr IS NULL. This is the opposite order from what many developers expect intuitively.


MATCH_RECOGNIZE β€” Row Pattern Matching

MATCH_RECOGNIZE is an advanced SQL feature for detecting patterns across sequences of rows. It is used in event sequence analysis, fraud detection, and session analysis.

MATCH_RECOGNIZE β€” Pattern Detection
1-- Detect a 'V-shape' pattern in stock prices:
2-- a decrease followed by an increase
3SELECT *
4FROM stock_prices
5MATCH_RECOGNIZE (
6 PARTITION BY ticker
7 ORDER BY trade_date
8 MEASURES
9 MATCH_NUMBER() AS match_num,
10 FIRST(trade_date) AS start_date,
11 LAST(trade_date) AS end_date,
12 FIRST(close_price) AS start_price,
13 MIN(close_price) AS bottom_price,
14 LAST(close_price) AS end_price
15 ONE ROW PER MATCH
16 AFTER MATCH SKIP TO NEXT ROW
17 PATTERN (DOWN+ UP+)
18 DEFINE
19 DOWN AS close_price < LAG(close_price) OVER (ORDER BY trade_date),
20 UP AS close_price > LAG(close_price) OVER (ORDER BY trade_date)
21);
πŸ“MATCH_RECOGNIZE Exam Depth

For COF-C02, know that MATCH_RECOGNIZE exists in Snowflake for row-pattern matching and understand its use case (event sequence analysis). You are unlikely to need to write a complete MATCH_RECOGNIZE query from scratch in the exam, but you should recognise the syntax and purpose.


StepByStep: Writing a Full Analytical Query

Building an Analytical Query with VARIANT, Window Functions, and QUALIFY

1
Load and Parse Semi-Structured Data

Start with a raw events table that stores JSON payloads in a VARIANT column. Extract the relevant fields and cast them to proper SQL types.

SQL
-- Raw table: raw_events(id INT, payload VARIANT, loaded_at TIMESTAMP)
-- payload: {"session_id": "abc123", "user_id": 42, "event": "purchase",
--            "items": [{"sku":"X1","qty":2},{"sku":"X2","qty":1}], "total": 89.99}

CREATE OR REPLACE VIEW v_events AS
SELECT
id,
payload:session_id::STRING  AS session_id,
payload:user_id::NUMBER     AS user_id,
payload:event::STRING       AS event_type,
payload:total::FLOAT        AS order_total,
loaded_at
FROM raw_events
WHERE payload:event::STRING = 'purchase';
πŸ’‘Always filter on VARIANT paths in a view to isolate relevant event types before downstream processing.
2
Explode Nested Arrays with FLATTEN

Use LATERAL FLATTEN to expand the items array so each ordered item becomes its own row, preserving the parent session context.

SQL
CREATE OR REPLACE VIEW v_order_items AS
SELECT
e.id           AS event_id,
e.session_id,
e.user_id,
e.order_total,
f.value:sku::STRING  AS sku,
f.value:qty::INT     AS quantity,
f.index              AS item_position
FROM v_events e,
LATERAL FLATTEN(INPUT => e.payload:items) f;
3
Add Window Functions for Per-User Ranking

Rank each purchase by total value per user, and compute a running total of spend.

SQL
SELECT
user_id,
session_id,
order_total,
loaded_at,
ROW_NUMBER()  OVER (PARTITION BY user_id ORDER BY order_total DESC)    AS spend_rank,
SUM(order_total) OVER (PARTITION BY user_id ORDER BY loaded_at
  ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)                    AS cumulative_spend
FROM v_events;
4
Filter with QUALIFY to Get Top Purchase Per User

Use QUALIFY to keep only each user's highest-value order without a subquery wrapper.

SQL
SELECT
user_id,
session_id,
order_total,
loaded_at
FROM v_events
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY order_total DESC) = 1;
πŸ’‘QUALIFY executes after window functions are computed, so you can reference the window function result directly without aliasing.

Practice Quizzes

SQL Basics

You have a VARIANT column called data containing the JSON object with a price key set to 9.99. Which expression correctly returns the price as a NUMBER?

SQL Basics

Which of the following is a Snowflake-specific SQL clause NOT found in standard SQL or most other databases?

SQL Basics

A table has an `events` VARIANT column containing an array. Which FLATTEN option ensures that rows with a NULL or empty array are still included in the output?


Flashcard Review

SQL Basics
QUESTION

What is the difference between RANK() and DENSE_RANK() in a window function?

Click to reveal answer
ANSWER

RANK() assigns the same rank to ties but then skips the next rank number β€” for example, two rows tied at rank 2 both get rank 2, and the next row gets rank 4 (not 3). DENSE_RANK() also assigns the same rank to ties but does NOT skip numbers β€” the next row after two tied rank-2 rows gets rank 3.

Click to see question
SQL Basics
QUESTION

What does TYPEOF() return when called on a VARIANT value that contains a JSON array?

Click to reveal answer
ANSWER

TYPEOF() returns the string 'array' for a VARIANT value that holds a JSON array. Other possible return values include 'string', 'integer', 'decimal', 'double', 'boolean', 'object', and 'null'. It returns NULL if the VARIANT itself is NULL.

Click to see question
SQL Basics
QUESTION

What is the key difference between TABLESAMPLE BERNOULLI and TABLESAMPLE BLOCK?

Click to reveal answer
ANSWER

BERNOULLI sampling operates at the individual row level β€” each row is included independently with the specified probability. This produces better randomness but requires scanning all rows. BLOCK sampling operates at the micro-partition level β€” entire micro-partitions are either included or excluded. This is much faster (Snowflake can skip excluded partitions entirely) but less precisely random.

Click to see question
SQL Basics
QUESTION

In what order does Snowflake evaluate the QUALIFY clause relative to other SQL clauses?

Click to reveal answer
ANSWER

The evaluation order is: FROM β†’ WHERE β†’ GROUP BY β†’ HAVING β†’ WINDOW FUNCTIONS β†’ QUALIFY β†’ SELECT (column aliasing) β†’ DISTINCT β†’ ORDER BY β†’ LIMIT/FETCH. QUALIFY runs after window functions are computed, which is why you can reference window function results directly in QUALIFY without a subquery.

Click to see question
SQL Basics
QUESTION

What does OBJECT_CONSTRUCT('k1', v1, 'k2', v2) return and what type is it?

Click to reveal answer
ANSWER

OBJECT_CONSTRUCT returns a VARIANT value containing a JSON object built from the alternating key-value pairs. The keys must be strings, and the values can be any SQL type β€” they are automatically converted to their VARIANT equivalents. NULL values are omitted from the object by default. Use OBJECT_CONSTRUCT_KEEP_NULL to retain NULL values.

Click to see question

Additional Resources

Official Snowflake Documentation


Next Steps

Reinforce what you just read

Study the All flashcards with spaced repetition to lock it in.

Study flashcards β†’