Working with semi-structured data like JSON is common for data engineers and analysts. Snowflake offers tools to manage JSON, XML, and Avro data types through its VARIANT column type. In this tutorial, we will learn how to query JSON data in Snowflake using simple SELECT statements, the colon syntax (:) for accessing JSON fields, and the LATERAL FLATTEN function to break arrays into multiple rows. By following examples, you will discover how to extract and work with nested data, enhancing the flexibility and power of your Snowflake queries.

Snowflake Interview Questions and Answers

1. What is Snowflake?

Answer:
Snowflake is a cloud-based data warehouse that provides scalability, concurrency, and high performance. It separates storage and compute, allowing independent scaling.

2. Explain Snowflake Architecture.

Answer:

  • Database Storage: Stores structured and semi-structured data.
  • Compute Layer (Virtual Warehouses): Handles queries. Multiple warehouses can access the same data simultaneously.
  • Cloud Services Layer: Handles authentication, optimization, metadata, and query parsing.

3. What are Snowflake Virtual Warehouses?

Answer:
Virtual warehouses are compute clusters used for query execution. They can be started/stopped independently.

Example:

CREATE WAREHOUSE my_warehouse
WITH WAREHOUSE_SIZE = 'XSMALL'
AUTO_SUSPEND = 300
AUTO_RESUME = TRUE;

4. How does Snowflake handle semi-structured data?

Answer:

  • Supports VARIANT, OBJECT, and ARRAY data types.
  • Uses FLATTEN() to query nested JSON, XML, or Avro data.
SELECT value:id, value:name
FROM my_table, LATERAL FLATTEN(input => json_column);

5. What is Time Travel in Snowflake?

Answer:
Time Travel allows querying historical data (up to 90 days for Enterprise edition).

-- Query a table as it existed 1 hour ago
SELECT * FROM my_table AT (OFFSET => -3600);

6. What is a Snowflake Stage?

Answer:
Stages are locations for loading/unloading data. Types: Internal (in Snowflake) and External (S3, Azure, GCS).

CREATE STAGE my_stage
URL='s3://mybucket/data/'
CREDENTIALS=(AWS_KEY_ID='xxx' AWS_SECRET_KEY='yyy');

/*
The INTERNAL or EXTERNAL are optional. When it sees S3 path, it recognizes as EXTERNAL
/*
/*
The other way you can write
*/
CREATE STORAGE INTEGRATION s3_integration
  TYPE = EXTERNAL_STAGE
  STORAGE_PROVIDER = S3
  ENABLED = TRUE
  STORAGE_AWS_ROLE_ARN = 'arn:aws:iam::123456789012:role/SnowflakeS3Role'
  STORAGE_ALLOWED_LOCATIONS = ('s3://mybucket/data/');

CREATE STAGE my_stage
URL='s3://mybucket/data/'
STORAGE_INTEGRATION = s3_integration;


7. How to load CSV into Snowflake?

COPY INTO my_table
FROM @my_stage/file.csv
FILE_FORMAT = (TYPE = CSV FIELD_OPTIONALLY_ENCLOSED_BY='"' SKIP_HEADER=1);

8. What is Snowflake Cloning?

Answer:
Cloning creates zero-copy clones of databases, schemas, or tables. It’s fast and space-efficient.

CREATE TABLE my_table_clone CLONE my_table;

9. Explain Snowflake Streams.

Answer:
Streams track changes (INSERT, UPDATE, DELETE) on a table for CDC (Change Data Capture) purposes.

CREATE STREAM my_stream ON TABLE my_table;

10. What is Snowflake Task?

Answer:
Tasks are scheduled SQL queries or procedures. They can be chained.

CREATE TASK my_task
  WAREHOUSE = my_warehouse
  SCHEDULE = 'USING CRON 0 * * * *'
AS
  INSERT INTO audit_table SELECT * FROM raw_table;

11. What is the difference between Snowflake Internal and External Stage?

TypeLocationUse case
InternalSnowflakeSmall files, staging
ExternalS3/Azure/GCSLarge files, cloud storage

12. How to create a Snowflake Table?

CREATE TABLE employees (
  emp_id INT,
  emp_name STRING,
  salary NUMBER(10,2),
  join_date DATE
);

13. How to handle NULLs in Snowflake?

SELECT COALESCE(salary, 0) AS salary
FROM employees;

14. How to update data in Snowflake?

UPDATE employees
SET salary = salary * 1.1
WHERE salary < 50000;

15. How to delete data?

DELETE FROM employees
WHERE emp_id = 100;

16. How to merge data in Snowflake?

MERGE INTO target_table t
USING source_table s
ON t.id = s.id
WHEN MATCHED THEN UPDATE SET t.value = s.value
WHEN NOT MATCHED THEN INSERT (id, value) VALUES (s.id, s.value);

17. What is Snowflake Schema vs Star Schema?

  • Star Schema: One fact table with multiple dimension tables.
  • Snowflake Schema: Dimension tables are normalized further into multiple tables.

18. What is Zero-Copy Cloning?

Answer:
Cloning tables/databases without duplicating data, using metadata pointers.


19. What is Snowflake Data Sharing?

Answer:
Securely share live data with other Snowflake accounts without copying.

CREATE SHARE my_share;
GRANT USAGE ON DATABASE my_db TO SHARE my_share;

20. What is the difference between VARIANT, OBJECT, and ARRAY?

TypePurpose
VARIANTAny semi-structured type
OBJECTKey-value pairs (JSON)
ARRAYOrdered list

21. How to query JSON in Snowflake?

SELECT data:name, data:age
FROM my_table;

22. What is Snowflake Task vs Stream?

  • Stream: Captures data changes.
  • Task: Automates queries/procedures, can use stream as source.

23. How to create a materialized view?

CREATE MATERIALIZED VIEW mv_emp AS
SELECT emp_id, salary FROM employees;

24. How to scale Snowflake Warehouse automatically?

ALTER WAREHOUSE my_warehouse SET AUTO_SUSPEND = 300;
ALTER WAREHOUSE my_warehouse SET AUTO_RESUME = TRUE;

25. What are transient tables?

  • Don’t have Time Travel or Fail-safe (less cost).
  • Temporary but persistent within session.
CREATE TRANSIENT TABLE temp_table (...);

26. Difference between Temporary and Transient tables

Temp TableTransient Table
Session-onlyPersistent
Auto-droppedNeeds manual drop
No fail-safeNo fail-safe

27. How to copy JSON data into Snowflake?

COPY INTO my_table
FROM @my_stage/data.json
FILE_FORMAT=(TYPE=JSON);

28. How to get current warehouse usage?

SELECT * FROM INFORMATION_SCHEMA.WAREHOUSE_LOAD_HISTORY
WHERE WAREHOUSE_NAME='MY_WAREHOUSE';

29. What is Snowflake Fail-safe?

  • Provides 7-day recovery after Time Travel retention ends.
  • Used for emergency recovery only.

30. How to optimize Snowflake queries?

  • Use clustering keys for large tables.
  • Minimize cross joins.
  • Use CTEs carefully.
  • Avoid SELECT *.

31. What is a Snowflake Role?

  • Snowflake is role-based access control (RBAC).
  • Roles grant privileges to users.
CREATE ROLE analyst;
GRANT SELECT ON TABLE employees TO ROLE analyst;

32. How to secure Snowflake?

  • Use MFA, network policies, RBAC.
  • Data encryption is always-on.
  • Mask sensitive data using dynamic data masking.

33. Dynamic Data Masking example

CREATE MASKING POLICY ssn_mask AS (val STRING) 
RETURNS STRING ->
CASE
  WHEN CURRENT_ROLE() IN ('FULL_ACCESS') THEN val
  ELSE 'XXX-XX-XXXX'
END;

34. Difference between Snowflake Account and User?

  • Account: Entire Snowflake environment.
  • User: Individual login within an account.

35. How to unload data to S3?

COPY INTO @my_stage/output/
FROM my_table
FILE_FORMAT=(TYPE=CSV);

36. How to check query history?

SELECT * FROM TABLE(INFORMATION_SCHEMA.QUERY_HISTORY())
WHERE USER_NAME='MY_USER';

37. How to create a secure view?

CREATE SECURE VIEW my_secure_view AS
SELECT * FROM employees;
  • Secure views prevent access to underlying data or query history.

38. How to handle large tables efficiently?

  • Use micro-partitions effectively.
  • Use clustering keys for range queries.
  • Avoid unnecessary scans.

39. How to find duplicate records?

SELECT emp_id, COUNT(*)
FROM employees
GROUP BY emp_id
HAVING COUNT(*) > 1;

40. How to rename a table?

ALTER TABLE old_name RENAME TO new_name;

41. How to copy data from one table to another?

INSERT INTO table2 SELECT * FROM table1;

42. How to truncate a table?

TRUNCATE TABLE employees;

43. What is the difference between VARIANT and STRING for JSON?

  • VARIANT: Snowflake-native JSON type. Can query using :.
  • STRING: Just plain text, cannot be queried as JSON.

44. How to create a Snowflake Procedure?

CREATE OR REPLACE PROCEDURE add_bonus(salary_threshold NUMBER)
RETURNS STRING
LANGUAGE SQL
AS
$$
UPDATE employees SET salary = salary * 1.1
WHERE salary < salary_threshold;
RETURN 'Done';
$$;

45. How to call a procedure?

CALL add_bonus(50000);

46. How to set session parameters?

ALTER SESSION SET TIMEZONE='UTC';
ALTER SESSION SET QUERY_TAG='MY_TAG';

47. How to list all databases?

SHOW DATABASES;

48. How to clone a database?

CREATE DATABASE db_clone CLONE db_original;

49. How to create a user in Snowflake?

CREATE USER test_user
PASSWORD='StrongPass123'
DEFAULT_ROLE = 'analyst'
DEFAULT_WAREHOUSE = 'my_warehouse';

50. How to grant privileges to a user?

GRANT ROLE analyst TO USER test_user;
GRANT USAGE ON DATABASE my_db TO ROLE analyst;

Snowpipe

Here’s a complete list of Snowpipe interview questions, categorized by level and with concise answers + examples 👇


🧩 BASIC Snowpipe Interview Questions

1️⃣ What is Snowpipe?

Snowpipe is Snowflake’s continuous data ingestion service that automatically loads data into tables as soon as it’s available in a stage (internal or external like S3, Azure Blob, or GCS).
It provides near real-time loading using auto-ingest notifications or REST API.


2️⃣ How does Snowpipe differ from COPY INTO?

FeatureCOPY INTOSnowpipe
ModeManual/batchContinuous/streaming
TriggerUser runs commandEvent (notification) or REST API
LatencyMinutes to hoursSeconds to minutes
CostUses warehouse computeUses Snowflake-managed compute
Use caseScheduled batch loadReal-time or micro-batch ingestion

3️⃣ What are the two ways Snowpipe can be triggered?

  1. Auto-ingest (Event notifications) → S3, Azure, GCS
    • Cloud event notifies Snowflake automatically when a new file arrives.
  2. REST API / SDK call → Manual trigger from code (e.g., Python, Lambda, Airflow).

4️⃣ What is a “Stage” in Snowpipe?

A Stage is a location (internal or external) where files are placed before loading into Snowflake.
Snowpipe listens to that stage for new files.

Example:

CREATE STAGE my_stage
URL='s3://mybucket/data/'
STORAGE_INTEGRATION = my_integration;

5️⃣ How do you create a Snowpipe?

CREATE PIPE my_pipe
AUTO_INGEST = TRUE
AS
COPY INTO my_table
FROM @my_stage
FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY='"');

AUTO_INGEST = TRUE enables event-driven loading.


6️⃣ How does Snowpipe achieve near real-time ingestion?

  • It monitors the stage (via event notifications or REST API).
  • When new files arrive, it queues the load request immediately.
  • It uses serverless compute for ingestion — no warehouse needed.

⚙️ INTERMEDIATE Snowpipe Interview Questions

7️⃣ What permissions are needed for Snowpipe to read from S3?

You must configure:

  • Storage Integration with IAM role
  • IAM Role must grant:
    • s3:GetObject, s3:GetObjectVersion, s3:ListBucket

8️⃣ What happens if Snowpipe fails to load a file?

  • The error is logged in the LOAD_HISTORY view or COPY_HISTORY table function.
  • Snowpipe retries automatically for transient errors.
  • Bad records can be redirected using the ON_ERROR clause.

Example:

COPY INTO my_table
FROM @my_stage
ON_ERROR = 'CONTINUE';

9️⃣ How can you check Snowpipe load status?

SELECT * FROM SNOWFLAKE.ACCOUNT_USAGE.LOAD_HISTORY
WHERE PIPE_NAME = 'MY_PIPE'
ORDER BY LAST_LOAD_TIME DESC;

🔟 How is Snowpipe billed?

  • You are not billed for warehouse compute, but
  • You are billed for Snowpipe compute time, which depends on:
    • Amount of data loaded
    • Number of files
    • Complexity of file formats

1️⃣1️⃣ Can you use Snowpipe for internal stages?

✅ Yes, but auto-ingest is supported only for external stages (S3, Azure, GCS).
For internal stages, use REST API to trigger loads manually.


1️⃣2️⃣ How can you pause or resume a Snowpipe?

ALTER PIPE my_pipe SET PIPE_EXECUTION_PAUSED = TRUE;   -- Pause
ALTER PIPE my_pipe SET PIPE_EXECUTION_PAUSED = FALSE;  -- Resume

1️⃣3️⃣ Can Snowpipe load JSON, Parquet, or Avro?

✅ Yes, it supports all Snowflake file formats:

  • CSV
  • JSON
  • PARQUET
  • AVRO
  • ORC

Just define a FILE_FORMAT during creation.


🧠 ADVANCED Snowpipe Interview Questions

1️⃣4️⃣ What is the difference between “auto-ingest” and “manual Snowpipe”?

MethodTriggerUse Case
Auto-ingestCloud event notification (S3 → Snowflake)Fully automated pipelines
ManualREST API callCustom workflows (Lambda, Airflow, etc.)

1️⃣5️⃣ How does Snowpipe guarantee exactly-once loading?

  • It keeps track of file load history using internal metadata.
  • If a file is re-sent, it’s ignored unless the file name changes.
  • This ensures idempotency.

1️⃣6️⃣ How do you monitor Snowpipe performance?

Use Snowflake Account Usage views:

SELECT * 
FROM SNOWFLAKE.ACCOUNT_USAGE.PIPE_USAGE_HISTORY 
WHERE PIPE_NAME = 'MY_PIPE';

Or use AWS CloudWatch / Azure Monitor logs for external events.


1️⃣7️⃣ How do you handle schema evolution in Snowpipe?

  • Use MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE
  • Add new columns with ALTER TABLE
  • Use COPY_OPTIONS for missing/extra fields.

Example:

CREATE PIPE my_pipe 
AS
COPY INTO my_table
FROM @my_stage
FILE_FORMAT = (TYPE = 'JSON')
MATCH_BY_COLUMN_NAME = CASE_INSENSITIVE;

1️⃣8️⃣ Can you use Snowpipe with Streams & Tasks?

✅ Yes — common pattern:

  • Snowpipe → loads raw data into staging table
  • Stream → tracks new rows
  • Task → transforms and merges into target tables

Sample code

Here’s a full end-to-end Snowflake example for Snowpipe → Stream → Task with concise sample code. This is a very common pattern in modern Snowflake ingestion pipelines.


1️⃣ Create Raw Table

CREATE OR REPLACE TABLE raw_sales (
    order_id INT,
    customer_id INT,
    order_date DATE,
    amount NUMBER(10,2)
);

2️⃣ Create Snowpipe (auto-ingest from S3 stage)

Step 2a: Create Stage

CREATE OR REPLACE STAGE sales_stage
URL='s3://mybucket/sales/'
STORAGE_INTEGRATION = my_s3_integration;

Step 2b: Create Pipe

CREATE OR REPLACE PIPE sales_pipe
AUTO_INGEST = TRUE
AS
COPY INTO raw_sales
FROM @sales_stage
FILE_FORMAT = (TYPE = 'CSV' FIELD_OPTIONALLY_ENCLOSED_BY='"' SKIP_HEADER=1)
ON_ERROR = 'CONTINUE';

✅ This will automatically load any new files in the S3 stage into raw_sales.


3️⃣ Create Stream (tracks newly loaded rows in raw table)

CREATE OR REPLACE STREAM raw_sales_stream
ON TABLE raw_sales
APPEND_ONLY = TRUE;
  • APPEND_ONLY = TRUE → tracks newly inserted rows only.
  • The stream acts as a Change Data Capture (CDC) mechanism.

4️⃣ Create Curated Table (final target)

CREATE OR REPLACE TABLE curated_sales (
    order_id INT,
    customer_id INT,
    order_date DATE,
    amount NUMBER(10,2),
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

5️⃣ Create Task (runs periodically to transform/merge new rows)

CREATE OR REPLACE TASK transform_sales_task
WAREHOUSE = compute_wh
SCHEDULE = 'USING CRON 0 * * * * UTC'  -- every hour
AS
MERGE INTO curated_sales tgt
USING (
    SELECT *
    FROM raw_sales_stream
) src
ON tgt.order_id = src.order_id
WHEN MATCHED THEN UPDATE SET
    amount = src.amount,
    updated_at = CURRENT_TIMESTAMP
WHEN NOT MATCHED THEN INSERT (
    order_id, customer_id, order_date, amount, updated_at
)
VALUES (
    src.order_id, src.customer_id, src.order_date, src.amount, CURRENT_TIMESTAMP
);
  • What it does:
    1. Reads new rows from the stream.
    2. MERGE into curated table: update existing rows, insert new rows.
    3. Runs automatically according to schedule (or can be event-driven if desired).

1️⃣ CRON syntax in Snowflake Tasks

Snowflake uses a standard cron expression (5 fields) to schedule tasks:

MIN HOUR DOM MON DOW
FieldMeaningAllowed Values
MINMinute of the hour0–59
HOURHour of the day0–23
DOMDay of the month1–31
MONMonth1–12 or JAN–DEC
DOWDay of the week0–6 (Sun=0) or SUN–SAT

Note: In Snowflake, you also append the time zone like UTC, America/Los_Angeles, etc.


2️⃣ '0 * * * * UTC' Explained

0 * * * * UTC
FieldValueMeaning
MIN0Run at minute 0
HOUR*Every hour
DOM*Every day of the month
MON*Every month
DOW*Every day of the week
TZUTCRun in UTC timezone

Interpretation:

Run at the start of every hour (e.g., 00:00, 01:00, 02:00 UTC …).


3️⃣ Examples

Cron ExpressionRun Schedule
0 * * * * UTCEvery hour at minute 0
30 9 * * * UTCEvery day at 09:30 UTC
0 0 * * 1 UTCEvery Monday at 00:00 UTC
0 0 1 * * UTCFirst day of every month at 00:00 UTC

6️⃣ Activate Task

ALTER TASK transform_sales_task RESUME;

✅ Summary Flow

  1. S3 → New CSV uploaded
  2. Snowpipe → Auto-loads into raw_sales
  3. Stream → Tracks new rows in raw_sales_stream
  4. Task → Periodically merges data into curated_sales

That creates a continuous ingestion + transformation pipeline.


1️⃣9️⃣ How to manually trigger Snowpipe via REST API?

Using Snowflake’s endpoint:

POST https://<account>.snowflakecomputing.com/v1/data/pipes/<pipe_name>/insertFiles

With JSON body:

{ "files": [ { "path": "data/file1.csv" }, { "path": "data/file2.csv" } ] }

2️⃣0️⃣ What are common interview scenarios?

  • File arrives late or incomplete — how to handle?
  • Duplicate file uploads — what happens?
  • How to backfill data for missed files?
  • Compare Snowpipe vs Streams + Tasks latency trade-offs.

🚀 Bonus: Common Real-world Design Pattern

S3 → Event Notification → SQS → Snowpipe → Raw Table → Stream → Task → Curated Table

That’s the standard modern Snowflake ingestion pipeline architecture.

⚙️ End-to-End Flow (in few words)

StepComponentPurpose (few words)
1️⃣ S3Data landing zoneRaw files uploaded (CSV, JSON, etc.)
2️⃣ Event NotificationTriggers automationDetects new file arrival in S3
3️⃣ SQSMessage queueDelivers file event to Snowflake reliably
4️⃣ SnowpipeContinuous ingestionAuto-loads new files into raw table
5️⃣ Raw TableStaging areaStores unprocessed raw data
6️⃣ StreamChange trackerTracks newly loaded rows (CDC-style)
7️⃣ TaskScheduled transformRuns SQL to process/merge data
8️⃣ Curated TableFinal datasetCleaned, transformed, analytics-ready data

🧠 In One Line

Files land in S3 → Event triggers SQS → Snowpipe loads raw → Stream tracks changes → Task transforms → Curated table ready for analytics.