Working with semi-structured data like JSON is common for data engineers and analysts. Snowflake offers tools to manage JSON, XML, and Avro data types through its VARIANT column type. In this tutorial, we will learn how to query JSON data in Snowflake using simple SELECT statements, the colon syntax (:) for accessing JSON fields, and the LATERAL FLATTEN function to break arrays into multiple rows. By following examples, you will discover how to extract and work with nested data, enhancing the flexibility and power of your Snowflake queries.
Snowflake Interview Questions and Answers
1. What is Snowflake?
Answer:
Snowflake is a cloud-based data warehouse that provides scalability, concurrency, and high performance. It separates storage and compute, allowing independent scaling.
2. Explain Snowflake Architecture.
Answer:
- Database Storage: Stores structured and semi-structured data.
- Compute Layer (Virtual Warehouses): Handles queries. Multiple warehouses can access the same data simultaneously.
- Cloud Services Layer: Handles authentication, optimization, metadata, and query parsing.
3. What are Snowflake Virtual Warehouses?
Answer:
Virtual warehouses are compute clusters used for query execution. They can be started/stopped independently.
Example:
CREATE WAREHOUSE my_warehouse
WITH WAREHOUSE_SIZE = 'XSMALL'
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE;
4. How does Snowflake handle semi-structured data?
Answer:
- Supports
VARIANT,OBJECT, andARRAYdata types. - Uses
FLATTEN()to query nested JSON, XML, or Avro data.
SELECT value:id, value:name
FROM my_table, LATERAL FLATTEN(input => json_column);
5. What is Time Travel in Snowflake?
Answer:
Time Travel allows querying historical data (up to 90 days for Enterprise edition).
-- Query a table as it existed 1 hour ago
SELECT * FROM my_table AT (OFFSET => -3600);
6. What is a Snowflake Stage?
Answer:
Stages are locations for loading/unloading data. Types: Internal (in Snowflake) and External (S3, Azure, GCS).
CREATE STAGE my_stage
URL='s3://mybucket/data/'
CREDENTIALS=(AWS_KEY_ID='xxx' AWS_SECRET_KEY='yyy');
/*
The INTERNAL or EXTERNAL are optional. When it sees S3 path, it recognizes as EXTERNAL
/*
/*
The other way you can write
*/
CREATE STORAGE INTEGRATION s3_integration
TYPE = EXTERNAL_STAGE
STORAGE_PROVIDER = S3
ENABLED = TRUE
STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/SnowflakeS3Role'
STORAGE_ALLOWED_LOCATIONS = ('s3://mybucket/data/');
CREATE STAGE my_stage
URL='s3://mybucket/data/'
STORAGE_INTEGRATION = s3_integration;
7. How to load CSV into Snowflake?
COPY INTO my_table
FROM @my_stage/file.csv
FILE_FORMAT = (TYPE = CSV FIELD_OPTIONALLY_ENCLOSED_BY='"' SKIP_HEADER=1);
8. What is Snowflake Cloning?
Answer:
Cloning creates zero-copy clones of databases, schemas, or tables. It’s fast and space-efficient.
CREATE TABLE my_table_clone CLONE my_table;
9. Explain Snowflake Streams.
Answer:
Streams track changes (INSERT, UPDATE, DELETE) on a table for CDC (Change Data Capture) purposes.
CREATE STREAM my_stream ON TABLE my_table;
10. What is Snowflake Task?
Answer:
Tasks are scheduled SQL queries or procedures. They can be chained.
CREATE TASK my_task
WAREHOUSE = my_warehouse
SCHEDULE = 'USING CRON 0 * * * *'
AS
INSERT INTO audit_table SELECT * FROM raw_table;
11. What is the difference between Snowflake Internal and External Stage?
| Type | Location | Use case |
|---|---|---|
| Internal | Snowflake | Small files, staging |
| External | S3/Azure/GCS | Large files, cloud storage |
12. How to create a Snowflake Table?
CREATE TABLE employees (
emp_id INT,
emp_name STRING,
salary NUMBER(10,2),
join_date DATE
);
13. How to handle NULLs in Snowflake?
SELECT COALESCE(salary, 0) AS salary
FROM employees;
14. How to update data in Snowflake?
UPDATE employees
SET salary = salary * 1.1
WHERE salary < 50000;
15. How to delete data?
DELETE FROM employees
WHERE emp_id = 100;
16. How to merge data in Snowflake?
MERGE INTO target_table t
USING source_table s
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.value = s.value
WHEN NOT MATCHED THEN INSERT (id, value) VALUES (s.id, s.value);
17. What is Snowflake Schema vs Star Schema?
- Star Schema: One fact table with multiple dimension tables.
- Snowflake Schema: Dimension tables are normalized further into multiple tables.
18. What is Zero-Copy Cloning?
Answer:
Cloning tables/databases without duplicating data, using metadata pointers.
19. What is Snowflake Data Sharing?
Answer:
Securely share live data with other Snowflake accounts without copying.
CREATE SHARE my_share;
GRANT USAGE ON DATABASE my_db TO SHARE my_share;
20. What is the difference between VARIANT, OBJECT, and ARRAY?
| Type | Purpose |
|---|---|
| VARIANT | Any semi-structured type |
| OBJECT | Key-value pairs (JSON) |
| ARRAY | Ordered list |
21. How to query JSON in Snowflake?
SELECT data:name, data:age
FROM my_table;
22. What is Snowflake Task vs Stream?
- Stream: Captures data changes.
- Task: Automates queries/procedures, can use stream as source.
23. How to create a materialized view?
CREATE MATERIALIZED VIEW mv_emp AS
SELECT emp_id, salary FROM employees;
24. How to scale Snowflake Warehouse automatically?
ALTER WAREHOUSE my_warehouse SET AUTO_SUSPEND = 300;
ALTER WAREHOUSE my_warehouse SET AUTO_RESUME = TRUE;
25. What are transient tables?
- Don’t have Time Travel or Fail-safe (less cost).
- Temporary but persistent within session.
CREATE TRANSIENT TABLE temp_table (...);
26. Difference between Temporary and Transient tables
| Temp Table | Transient Table |
|---|---|
| Session-only | Persistent |
| Auto-dropped | Needs manual drop |
| No fail-safe | No fail-safe |
27. How to copy JSON data into Snowflake?
COPY INTO my_table
FROM @my_stage/data.json
FILE_FORMAT=(TYPE=JSON);
28. How to get current warehouse usage?
SELECT * FROM INFORMATION_SCHEMA.WAREHOUSE_LOAD_HISTORY
WHERE WAREHOUSE_NAME='MY_WAREHOUSE';
29. What is Snowflake Fail-safe?
- Provides 7-day recovery after Time Travel retention ends.
- Used for emergency recovery only.
30. How to optimize Snowflake queries?
- Use clustering keys for large tables.
- Minimize cross joins.
- Use CTEs carefully.
- Avoid SELECT *.
31. What is a Snowflake Role?
- Snowflake is role-based access control (RBAC).
- Roles grant privileges to users.
CREATE ROLE analyst;
GRANT SELECT ON TABLE employees TO ROLE analyst;
32. How to secure Snowflake?
- Use MFA, network policies, RBAC.
- Data encryption is always-on.
- Mask sensitive data using dynamic data masking.
33. Dynamic Data Masking example
CREATE MASKING POLICY ssn_mask AS (val STRING)
RETURNS STRING ->
CASE
WHEN CURRENT_ROLE() IN ('FULL_ACCESS') THEN val
ELSE 'XXX-XX-XXXX'
END;
34. Difference between Snowflake Account and User?
- Account: Entire Snowflake environment.
- User: Individual login within an account.
35. How to unload data to S3?
COPY INTO @my_stage/output/
FROM my_table
FILE_FORMAT=(TYPE=CSV);
36. How to check query history?
SELECT * FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE USER_NAME='MY_USER';
37. How to create a secure view?
CREATE SECURE VIEW my_secure_view AS
SELECT * FROM employees;
- Secure views prevent access to underlying data or query history.
38. How to handle large tables efficiently?
- Use micro-partitions effectively.
- Use clustering keys for range queries.
- Avoid unnecessary scans.
39. How to find duplicate records?
SELECT emp_id, COUNT(*)
FROM employees
GROUP BY emp_id
HAVING COUNT(*) > 1;
40. How to rename a table?
ALTER TABLE old_name RENAME TO new_name;
41. How to copy data from one table to another?
INSERT INTO table2 SELECT * FROM table1;
42. How to truncate a table?
TRUNCATE TABLE employees;
43. What is the difference between VARIANT and STRING for JSON?
- VARIANT: Snowflake-native JSON type. Can query using
:. - STRING: Just plain text, cannot be queried as JSON.
44. How to create a Snowflake Procedure?
CREATE OR REPLACE PROCEDURE add_bonus(salary_threshold NUMBER)
RETURNS STRING
LANGUAGE SQL
AS
$$
UPDATE employees SET salary = salary * 1.1
WHERE salary < salary_threshold;
RETURN 'Done';
$$;
45. How to call a procedure?
CALL add_bonus(50000);
46. How to set session parameters?
ALTER SESSION SET TIMEZONE='UTC';
ALTER SESSION SET QUERY_TAG='MY_TAG';
47. How to list all databases?
SHOW DATABASES;
48. How to clone a database?
CREATE DATABASE db_clone CLONE db_original;
49. How to create a user in Snowflake?
CREATE USER test_user
PASSWORD='StrongPass123'
DEFAULT_ROLE = 'analyst'
DEFAULT_WAREHOUSE = 'my_warehouse';
50. How to grant privileges to a user?
GRANT ROLE analyst TO USER test_user;
GRANT USAGE ON DATABASE my_db TO ROLE analyst;
Snowpipe
Here’s a complete list of Snowpipe interview questions, categorized by level and with concise answers + examples 👇
🧩 BASIC Snowpipe Interview Questions
1️⃣ What is Snowpipe?
Snowpipe is Snowflake’s continuous data ingestion service that automatically loads data into tables as soon as it’s available in a stage (internal or external like S3, Azure Blob, or GCS).
It provides near real-time loading using auto-ingest notifications or REST API.
2️⃣ How does Snowpipe differ from COPY INTO?
| Feature | COPY INTO | Snowpipe |
|---|---|---|
| Mode | Manual/batch | Continuous/streaming |
| Trigger | User runs command | Event (notification) or REST API |
| Latency | Minutes to hours | Seconds to minutes |
| Cost | Uses warehouse compute | Uses Snowflake-managed compute |
| Use case | Scheduled batch load | Real-time or micro-batch ingestion |
3️⃣ What are the two ways Snowpipe can be triggered?
- Auto-ingest (Event notifications) → S3, Azure, GCS
- Cloud event notifies Snowflake automatically when a new file arrives.
- REST API / SDK call → Manual trigger from code (e.g., Python, Lambda, Airflow).
4️⃣ What is a “Stage” in Snowpipe?
A Stage is a location (internal or external) where files are placed before loading into Snowflake.
Snowpipe listens to that stage for new files.
Example:
CREATE STAGE my_stage
URL='s3://mybucket/data/'
STORAGE_INTEGRATION = my_integration;
5️⃣ How do you create a Snowpipe?
CREATE PIPE my_pipe
AUTO_INGEST = TRUE
AS
COPY INTO my_table
FROM @my_stage
FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY='"');
✅ AUTO_INGEST = TRUE enables event-driven loading.
6️⃣ How does Snowpipe achieve near real-time ingestion?
- It monitors the stage (via event notifications or REST API).
- When new files arrive, it queues the load request immediately.
- It uses serverless compute for ingestion — no warehouse needed.
⚙️ INTERMEDIATE Snowpipe Interview Questions
7️⃣ What permissions are needed for Snowpipe to read from S3?
You must configure:
- Storage Integration with IAM role
- IAM Role must grant:
s3:GetObject,s3:GetObjectVersion,s3:ListBucket
8️⃣ What happens if Snowpipe fails to load a file?
- The error is logged in the
LOAD_HISTORYview orCOPY_HISTORYtable function. - Snowpipe retries automatically for transient errors.
- Bad records can be redirected using the
ON_ERRORclause.
Example:
COPY INTO my_table
FROM @my_stage
ON_ERROR = 'CONTINUE';
9️⃣ How can you check Snowpipe load status?
SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.LOAD_HISTORY
WHERE PIPE_NAME = 'MY_PIPE'
ORDER BY LAST_LOAD_TIME DESC;
🔟 How is Snowpipe billed?
- You are not billed for warehouse compute, but
- You are billed for Snowpipe compute time, which depends on:
- Amount of data loaded
- Number of files
- Complexity of file formats
1️⃣1️⃣ Can you use Snowpipe for internal stages?
✅ Yes, but auto-ingest is supported only for external stages (S3, Azure, GCS).
For internal stages, use REST API to trigger loads manually.
1️⃣2️⃣ How can you pause or resume a Snowpipe?
ALTER PIPE my_pipe SET PIPE_EXECUTION_PAUSED = TRUE; -- Pause
ALTER PIPE my_pipe SET PIPE_EXECUTION_PAUSED = FALSE; -- Resume
1️⃣3️⃣ Can Snowpipe load JSON, Parquet, or Avro?
✅ Yes, it supports all Snowflake file formats:
- CSV
- JSON
- PARQUET
- AVRO
- ORC
Just define a FILE_FORMAT during creation.
🧠 ADVANCED Snowpipe Interview Questions
1️⃣4️⃣ What is the difference between “auto-ingest” and “manual Snowpipe”?
| Method | Trigger | Use Case |
|---|---|---|
| Auto-ingest | Cloud event notification (S3 → Snowflake) | Fully automated pipelines |
| Manual | REST API call | Custom workflows (Lambda, Airflow, etc.) |
1️⃣5️⃣ How does Snowpipe guarantee exactly-once loading?
- It keeps track of file load history using internal metadata.
- If a file is re-sent, it’s ignored unless the file name changes.
- This ensures idempotency.
1️⃣6️⃣ How do you monitor Snowpipe performance?
Use Snowflake Account Usage views:
SELECT *
FROM SNOWFLAKE.ACCOUNT_USAGE.PIPE_USAGE_HISTORY
WHERE PIPE_NAME = 'MY_PIPE';
Or use AWS CloudWatch / Azure Monitor logs for external events.
1️⃣7️⃣ How do you handle schema evolution in Snowpipe?
- Use
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE - Add new columns with
ALTER TABLE - Use
COPY_OPTIONSfor missing/extra fields.
Example:
CREATE PIPE my_pipe
AS
COPY INTO my_table
FROM @my_stage
FILE_FORMAT = (TYPE = 'JSON')
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;
1️⃣8️⃣ Can you use Snowpipe with Streams & Tasks?
✅ Yes — common pattern:
- Snowpipe → loads raw data into staging table
- Stream → tracks new rows
- Task → transforms and merges into target tables
Sample code
Here’s a full end-to-end Snowflake example for Snowpipe → Stream → Task with concise sample code. This is a very common pattern in modern Snowflake ingestion pipelines.
1️⃣ Create Raw Table
CREATE OR REPLACE TABLE raw_sales (
order_id INT,
customer_id INT,
order_date DATE,
amount NUMBER(10,2)
);
2️⃣ Create Snowpipe (auto-ingest from S3 stage)
Step 2a: Create Stage
CREATE OR REPLACE STAGE sales_stage
URL='s3://mybucket/sales/'
STORAGE_INTEGRATION = my_s3_integration;
Step 2b: Create Pipe
CREATE OR REPLACE PIPE sales_pipe
AUTO_INGEST = TRUE
AS
COPY INTO raw_sales
FROM @sales_stage
FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY='"' SKIP_HEADER=1)
ON_ERROR = 'CONTINUE';
✅ This will automatically load any new files in the S3 stage into raw_sales.
3️⃣ Create Stream (tracks newly loaded rows in raw table)
CREATE OR REPLACE STREAM raw_sales_stream
ON TABLE raw_sales
APPEND_ONLY = TRUE;
APPEND_ONLY = TRUE→ tracks newly inserted rows only.- The stream acts as a Change Data Capture (CDC) mechanism.
4️⃣ Create Curated Table (final target)
CREATE OR REPLACE TABLE curated_sales (
order_id INT,
customer_id INT,
order_date DATE,
amount NUMBER(10,2),
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
5️⃣ Create Task (runs periodically to transform/merge new rows)
CREATE OR REPLACE TASK transform_sales_task
WAREHOUSE = compute_wh
SCHEDULE = 'USING CRON 0 * * * * UTC' -- every hour
AS
MERGE INTO curated_sales tgt
USING (
SELECT *
FROM raw_sales_stream
) src
ON tgt.order_id = src.order_id
WHEN MATCHED THEN UPDATE SET
amount = src.amount,
updated_at = CURRENT_TIMESTAMP
WHEN NOT MATCHED THEN INSERT (
order_id, customer_id, order_date, amount, updated_at
)
VALUES (
src.order_id, src.customer_id, src.order_date, src.amount, CURRENT_TIMESTAMP
);
- What it does:
- Reads new rows from the stream.
- MERGE into curated table: update existing rows, insert new rows.
- Runs automatically according to schedule (or can be event-driven if desired).
1️⃣ CRON syntax in Snowflake Tasks
Snowflake uses a standard cron expression (5 fields) to schedule tasks:
MIN HOUR DOM MON DOW
| Field | Meaning | Allowed Values |
|---|---|---|
| MIN | Minute of the hour | 0–59 |
| HOUR | Hour of the day | 0–23 |
| DOM | Day of the month | 1–31 |
| MON | Month | 1–12 or JAN–DEC |
| DOW | Day of the week | 0–6 (Sun=0) or SUN–SAT |
Note: In Snowflake, you also append the time zone like
UTC,America/Los_Angeles, etc.
2️⃣ '0 * * * * UTC' Explained
0 * * * * UTC
| Field | Value | Meaning |
|---|---|---|
| MIN | 0 | Run at minute 0 |
| HOUR | * | Every hour |
| DOM | * | Every day of the month |
| MON | * | Every month |
| DOW | * | Every day of the week |
| TZ | UTC | Run in UTC timezone |
✅ Interpretation:
Run at the start of every hour (e.g., 00:00, 01:00, 02:00 UTC …).
3️⃣ Examples
| Cron Expression | Run Schedule |
|---|---|
0 * * * * UTC | Every hour at minute 0 |
30 9 * * * UTC | Every day at 09:30 UTC |
0 0 * * 1 UTC | Every Monday at 00:00 UTC |
0 0 1 * * UTC | First day of every month at 00:00 UTC |
6️⃣ Activate Task
ALTER TASK transform_sales_task RESUME;
✅ Summary Flow
- S3 → New CSV uploaded
- Snowpipe → Auto-loads into
raw_sales - Stream → Tracks new rows in
raw_sales_stream - Task → Periodically merges data into
curated_sales
That creates a continuous ingestion + transformation pipeline.
1️⃣9️⃣ How to manually trigger Snowpipe via REST API?
Using Snowflake’s endpoint:
POST https://<account>.snowflakecomputing.com/v1/data/pipes/<pipe_name>/insertFiles
With JSON body:
{ "files": [ { "path": "data/file1.csv" }, { "path": "data/file2.csv" } ] }
2️⃣0️⃣ What are common interview scenarios?
- File arrives late or incomplete — how to handle?
- Duplicate file uploads — what happens?
- How to backfill data for missed files?
- Compare Snowpipe vs Streams + Tasks latency trade-offs.
🚀 Bonus: Common Real-world Design Pattern
S3 → Event Notification → SQS → Snowpipe → Raw Table → Stream → Task → Curated Table
That’s the standard modern Snowflake ingestion pipeline architecture.
⚙️ End-to-End Flow (in few words)
| Step | Component | Purpose (few words) |
|---|---|---|
| 1️⃣ S3 | Data landing zone | Raw files uploaded (CSV, JSON, etc.) |
| 2️⃣ Event Notification | Triggers automation | Detects new file arrival in S3 |
| 3️⃣ SQS | Message queue | Delivers file event to Snowflake reliably |
| 4️⃣ Snowpipe | Continuous ingestion | Auto-loads new files into raw table |
| 5️⃣ Raw Table | Staging area | Stores unprocessed raw data |
| 6️⃣ Stream | Change tracker | Tracks newly loaded rows (CDC-style) |
| 7️⃣ Task | Scheduled transform | Runs SQL to process/merge data |
| 8️⃣ Curated Table | Final dataset | Cleaned, transformed, analytics-ready data |
🧠 In One Line
Files land in S3 → Event triggers SQS → Snowpipe loads raw → Stream tracks changes → Task transforms → Curated table ready for analytics.






