Delta Live Tables (DLT) in Databricks is a modern, fully managed framework for building reliable ETL pipelines that support both streaming and batch data. It simplifies data engineering by enabling automated data quality checks, event logging, and pipeline monitoring. With DLT, you can create tables, views, and expectations from notebooks or SQL scripts, track pipeline execution through event logs, and efficiently handle duplicates and streaming data. Whether using Core or Pro features, DLT helps teams maintain production-ready pipelines with lineage, auditing, and governance, making it ideal for modern data workflows.

1️⃣ Event Logs

Q: How many event logs do we need?
A: Each DLT pipeline automatically generates one event log.

  • 10 pipelines → 10 event logs.
  • Stored under: dbfs:/pipelines/<pipeline_name>/system/events

Q: Can I create a centralized event log?
A: ✅ Yes. You can union multiple event logs to create a centralized monitoring view:

paths = [
    "dbfs:/pipelines/pipeline1/system/events",
    "dbfs:/pipelines/pipeline2/system/events"
]
combined = spark.read.format("delta").load(paths)
combined.createOrReplaceTempView("all_event_logs")

Q: How to query the events folder?
A:

pipeline_name = "silver_orders_pipeline"
events_path = f"dbfs:/pipelines/{pipeline_name}/system/events"
event_log_df = spark.read.format("delta").load(events_path)
display(event_log_df)
  • Filter by level for ERROR/WARN, or explode expectations to check failed rules.

Q: What schema does the event log have?

ColumnDescription
timestampEvent time
levelINFO/WARN/ERROR
pipeline_idPipeline identifier
pipeline_nameName of pipeline
batch_idBatch number
originSource table/stream info
targetTarget table info
messageEvent description
expectationsArray of data quality results
metricsMap of processed rows, failures, latency

Q: What is event_log() in SQL?

  • event_log() is a built-in DLT SQL function to query the event log for a table or pipeline:
SELECT * FROM event_log(TABLE(`catalog`.`schema`.`table`))
  • Returns the same schema as above.

2️⃣ DLT Pipelines and Assets

Q: What types of assets can a DLT pipeline take?

Asset TypeDescriptionExample
NotebookPython/SQL notebook defining tables & expectations/Repos/dlt_pipelines/orders_pipeline
Python file/module.py file with DLT table definitions/Workspace/dlt/code/orders_pipeline.py
SQL ScriptSQL notebook with CREATE LIVE TABLE statements/Repos/dlt_sql/orders_pipeline.sql
Delta Table referencesExisting Delta tables as sourcesdlt.read("bronze_orders")
Config filesJSON or YAML for environment parameters/Workspace/dlt/config/prod_config.json

Q: DLT object means?

  • Any entity managed/tracked by DLT:
    • Tables (@dlt.table)
    • Views (@dlt.view)
    • Pipelines
    • Expectations
    • Event logs

Q: DLT options – Core vs Pro?

TierFeatures
CoreBatch & streaming pipelines, tables/views, expectations, event logs, schema evolution
ProCore + continuous streaming, REST API, versioning, Unity Catalog integration, SLA/alerting, centralized monitoring

3️⃣ DLT Data Handling

Q: DLT duplicates handling?

  • By default, duplicates are not removed in append mode.
  • Strategies to handle duplicates:
    1. Drop duplicates:
dlt.read("bronze_orders").dropDuplicates(["order_id"])
  1. MERGE/Upsert for idempotent pipelines
  2. Expectations to detect duplicates:
@dlt.expect("no_duplicate_orders", "count_distinct(order_id) = count(*)")
  1. Streaming watermarks to handle late-arriving duplicates

Q: Can DLT handle duplicates automatically?

  • ❌ Not automatically — you must define deduplication logic (dropDuplicates or merge)
  • ✅ DLT tracks duplicates via expectations in event logs

Q: DLT append flow?

  1. Source data → dlt.read() / dlt.read_stream()
  2. Transformations & expectations
  3. Append to target table via @dlt.table
  4. Event logs capture batch info and DQ metrics
  5. Downstream consumers read appended data

4️⃣ Quick Reference Examples

Python DLT table (Append + Expectation):

import dlt
from pyspark.sql.functions import col

@dlt.table
@dlt.expect("positive_amount", "amount > 0")
def silver_orders():
    return dlt.read_stream("bronze_orders").dropDuplicates(["order_id"])

SQL DLT table example:

CREATE LIVE TABLE silver_orders
COMMENT "Deduplicated orders"
AS
SELECT DISTINCT *
FROM LIVE.bronze_orders
WHERE amount > 0;

Query event log SQL example:

SELECT *
FROM event_log(TABLE(`prod_catalog`.`sales`.`silver_orders`))
WHERE level IN ('ERROR','WARN')
ORDER BY timestamp DESC;

Centralized event log view (Python):

paths = [
    "dbfs:/pipelines/pipeline1/system/events",
    "dbfs:/pipelines/pipeline2/system/events"
]
combined = spark.read.format("delta").load(paths)
combined.createOrReplaceTempView("all_event_logs")

This compilation covers all the questions you asked today on:

  • Event logs (event_log(), schema, querying)
  • Pipeline assets (notebooks, SQL, Python)
  • DLT objects (tables, views, expectations)
  • DLT tiers (Core vs Pro)
  • Duplicate handling and append flow