Delta Live Tables (DLT) in Databricks is a modern, fully managed framework for building reliable ETL pipelines that support both streaming and batch data. It simplifies data engineering by enabling automated data quality checks, event logging, and pipeline monitoring. With DLT, you can create tables, views, and expectations from notebooks or SQL scripts, track pipeline execution through event logs, and efficiently handle duplicates and streaming data. Whether using Core or Pro features, DLT helps teams maintain production-ready pipelines with lineage, auditing, and governance, making it ideal for modern data workflows.
1️⃣ Event Logs
Q: How many event logs do we need?
A: Each DLT pipeline automatically generates one event log.
- 10 pipelines → 10 event logs.
- Stored under:
dbfs:/pipelines/<pipeline_name>/system/events
Q: Can I create a centralized event log?
A: ✅ Yes. You can union multiple event logs to create a centralized monitoring view:
paths = [
"dbfs:/pipelines/pipeline1/system/events",
"dbfs:/pipelines/pipeline2/system/events"
]
combined = spark.read.format("delta").load(paths)
combined.createOrReplaceTempView("all_event_logs")
Q: How to query the events folder?
A:
pipeline_name = "silver_orders_pipeline"
events_path = f"dbfs:/pipelines/{pipeline_name}/system/events"
event_log_df = spark.read.format("delta").load(events_path)
display(event_log_df)
- Filter by
levelforERROR/WARN, or explodeexpectationsto check failed rules.
Q: What schema does the event log have?
| Column | Description |
|---|---|
| timestamp | Event time |
| level | INFO/WARN/ERROR |
| pipeline_id | Pipeline identifier |
| pipeline_name | Name of pipeline |
| batch_id | Batch number |
| origin | Source table/stream info |
| target | Target table info |
| message | Event description |
| expectations | Array of data quality results |
| metrics | Map of processed rows, failures, latency |
Q: What is event_log() in SQL?
event_log()is a built-in DLT SQL function to query the event log for a table or pipeline:
SELECT * FROM event_log(TABLE(`catalog`.`schema`.`table`))
- Returns the same schema as above.
2️⃣ DLT Pipelines and Assets
Q: What types of assets can a DLT pipeline take?
| Asset Type | Description | Example |
|---|---|---|
| Notebook | Python/SQL notebook defining tables & expectations | /Repos/dlt_pipelines/orders_pipeline |
| Python file/module | .py file with DLT table definitions | /Workspace/dlt/code/orders_pipeline.py |
| SQL Script | SQL notebook with CREATE LIVE TABLE statements | /Repos/dlt_sql/orders_pipeline.sql |
| Delta Table references | Existing Delta tables as sources | dlt.read("bronze_orders") |
| Config files | JSON or YAML for environment parameters | /Workspace/dlt/config/prod_config.json |
Q: DLT object means?
- Any entity managed/tracked by DLT:
- Tables (
@dlt.table) - Views (
@dlt.view) - Pipelines
- Expectations
- Event logs
- Tables (
Q: DLT options – Core vs Pro?
| Tier | Features |
|---|---|
| Core | Batch & streaming pipelines, tables/views, expectations, event logs, schema evolution |
| Pro | Core + continuous streaming, REST API, versioning, Unity Catalog integration, SLA/alerting, centralized monitoring |
3️⃣ DLT Data Handling
Q: DLT duplicates handling?
- By default, duplicates are not removed in append mode.
- Strategies to handle duplicates:
- Drop duplicates:
dlt.read("bronze_orders").dropDuplicates(["order_id"])
- MERGE/Upsert for idempotent pipelines
- Expectations to detect duplicates:
@dlt.expect("no_duplicate_orders", "count_distinct(order_id) = count(*)")
- Streaming watermarks to handle late-arriving duplicates
Q: Can DLT handle duplicates automatically?
- ❌ Not automatically — you must define deduplication logic (dropDuplicates or merge)
- ✅ DLT tracks duplicates via expectations in event logs
Q: DLT append flow?
- Source data →
dlt.read()/dlt.read_stream() - Transformations & expectations
- Append to target table via
@dlt.table - Event logs capture batch info and DQ metrics
- Downstream consumers read appended data
✅ 4️⃣ Quick Reference Examples
Python DLT table (Append + Expectation):
import dlt
from pyspark.sql.functions import col
@dlt.table
@dlt.expect("positive_amount", "amount > 0")
def silver_orders():
return dlt.read_stream("bronze_orders").dropDuplicates(["order_id"])
SQL DLT table example:
CREATE LIVE TABLE silver_orders
COMMENT "Deduplicated orders"
AS
SELECT DISTINCT *
FROM LIVE.bronze_orders
WHERE amount > 0;
Query event log SQL example:
SELECT *
FROM event_log(TABLE(`prod_catalog`.`sales`.`silver_orders`))
WHERE level IN ('ERROR','WARN')
ORDER BY timestamp DESC;
Centralized event log view (Python):
paths = [
"dbfs:/pipelines/pipeline1/system/events",
"dbfs:/pipelines/pipeline2/system/events"
]
combined = spark.read.format("delta").load(paths)
combined.createOrReplaceTempView("all_event_logs")
This compilation covers all the questions you asked today on:
- Event logs (
event_log(), schema, querying) - Pipeline assets (notebooks, SQL, Python)
- DLT objects (tables, views, expectations)
- DLT tiers (Core vs Pro)
- Duplicate handling and append flow






