Databricks DLT with @dp: A Complete Guide to Streaming and Batch Processing

Modern data engineering requires handling both real-time streaming data and traditional batch workloads efficiently. Databricks has evolved Delta Live Tables (DLT) into Lakeflow Declarative Pipelines, enabling data engineers to build reliable and scalable pipelines with less code.

Using the dp module, developers can define streaming tables and materialized views declaratively. Databricks automatically manages dependencies, execution order, monitoring, and recovery.

In this guide, you’ll learn how to use @dp.table for streaming pipelines and @dp.materialized_view for batch processing with practical examples.

What is dp in Databricks?

The dp module stands for Declarative Pipelines and is imported as:

from pyspark import pipelines as dp

Instead of manually orchestrating ETL jobs, you define datasets and transformations. Databricks handles the pipeline execution behind the scenes.

Benefits of Using dp

Simplified pipeline development
Automatic dependency tracking
Built-in monitoring and observability
Support for both streaming and batch processing
Easy implementation of Medallion Architecture
Reduced operational complexity

Understanding Streaming Tables

Streaming tables continuously process incoming data as it arrives.

Common streaming sources include:

Amazon S3
Azure Data Lake Storage
Google Cloud Storage
Apache Kafka
Change Data Capture (CDC) streams

Example: Creating a Bronze Streaming Table

Suppose CSV files are continuously arriving in cloud storage.

			
from pyspark import pipelines as dp
@dp.table(
    name="bronze_orders"
)
def bronze_orders():
    return (
        spark.readStream
        .format("cloudFiles")
        .option("cloudFiles.format", "csv")
        .load("/Volumes/raw/orders/")
    )

		

How It Works

spark.readStream enables streaming mode.
Auto Loader detects new files automatically.
Data is processed incrementally.
A streaming table named bronze_orders is created.

Building a Silver Streaming Layer

After ingesting raw data, the next step is cleansing and validation.

Example: Bronze to Silver Transformation

			
from pyspark import pipelines as dp
@dp.table(
    name="silver_orders"
)
def silver_orders():
    return (
        spark.readStream.table("bronze_orders")
        .filter("order_amount > 0")
    )

		

Purpose of the Silver Layer

Remove invalid records
Standardize data formats
Apply business rules
Improve data quality

Understanding Materialized Views

Materialized Views are optimized for batch processing and reporting workloads.

Unlike streaming tables, materialized views store precomputed results that can be refreshed automatically when source data changes.

Example: Creating a Gold Layer Materialized View

			
from pyspark import pipelines as dp
@dp.materialized_view(
    name="daily_sales"
)
def daily_sales():
    return (
        spark.table("silver_orders")
        .groupBy("order_date")
        .sum("order_amount")
    )

		

What Happens?

Reads clean data from the Silver layer
Aggregates sales by date
Stores precomputed results
Improves dashboard and reporting performance

Complete Medallion Architecture Example

The following example demonstrates a complete Bronze-Silver-Gold pipeline.

Pipeline Code

			
from pyspark import pipelines as dp
# Bronze Layer
@dp.table(name="bronze_orders")
def bronze_orders():
    return (
        spark.readStream
        .format("cloudFiles")
        .option("cloudFiles.format", "csv")
        .load("/Volumes/raw/orders/")
    )
# Silver Layer
@dp.table(name="silver_orders")
def silver_orders():
    return (
        spark.readStream.table("bronze_orders")
        .filter("order_amount > 0")
    )
# Gold Layer
@dp.materialized_view(name="daily_sales")
def daily_sales():
    return (
        spark.table("silver_orders")
        .groupBy("order_date")
        .sum("order_amount")
    )

		

Data Flow

			
Raw Files
    ↓
Bronze (Streaming)
    ↓
Silver (Streaming)
    ↓
Gold (Materialized View)

		

Working with Multiple Streaming Sources

Organizations often receive data from multiple locations or business units.

Example: Combining US and EU Streams

			
from pyspark import pipelines as dp
dp.create_streaming_table("all_orders")
@dp.append_flow(target="all_orders")
def us_orders():
    return (
        spark.readStream
        .format("cloudFiles")
        .option("cloudFiles.format", "csv")
        .load("/data/us/")
    )
@dp.append_flow(target="all_orders")
def eu_orders():
    return (
        spark.readStream
        .format("cloudFiles")
        .option("cloudFiles.format", "csv")
        .load("/data/eu/")
    )

		

Advantages

Consolidates multiple sources
Simplifies downstream processing
Supports scalable ingestion patterns

How to Run a Lakeflow Declarative Pipeline

Step 1: Create a Notebook

Create a Databricks notebook and add your pipeline code.

from pyspark import pipelines as dp

Step 2: Create a Pipeline

Navigate to:

			
Workflows
   → Lakeflow Declarative Pipelines
   → Create Pipeline

Step 3: Configure Pipeline Settings

Provide the following information:

Notebook Path
Catalog
Schema
Storage Location

Example:

			
Catalog : demo
Schema  : bronze

Step 4: Select an Execution Mode

Triggered Mode

Runs the pipeline once and then stops.

Best suited for:

Daily ETL jobs
Scheduled reports
Batch processing

Continuous Mode

Runs continuously and processes incoming data automatically.

Best suited for:

Real-time analytics
Event-driven architectures
Streaming applications

Streaming Tables vs Materialized Views

Feature	Streaming Table	Materialized View
Read Method	spark.readStream	spark.table
Processing Type	Streaming	Batch
Decorator	@dp.table	@dp.materialized_view
Real-Time Updates	Yes	No
Reporting Workloads	Limited	Excellent
Aggregations	Supported	Recommended

Best Practices for Databricks DLT Pipelines

Use the Medallion Architecture

Separate your data into:

Bronze (Raw Data)
Silver (Cleaned Data)
Gold (Business-Ready Data)

Keep Transformations Modular

Create smaller pipeline functions rather than one large transformation.

Use Materialized Views for Reporting

Aggregations and dashboard datasets perform better when stored as materialized views.

Monitor Pipeline Health

Leverage Databricks pipeline monitoring to identify failures and performance bottlenecks quickly.

Conclusion

Databricks Lakeflow Declarative Pipelines simplify modern data engineering by providing a declarative framework for building both streaming and batch pipelines. Using @dp.table for streaming ingestion and @dp.materialized_view for batch transformations enables teams to develop scalable, maintainable, and production-ready data pipelines with minimal operational effort.

By adopting these patterns, organizations can implement robust Medallion Architectures and accelerate their journey toward real-time analytics and data-driven decision-making.

Srini

Data Engineer with deep AI and Generative AI expertise, crafting high-performance data pipelines in PySpark, Databricks, and SQL. Skilled in Python, AWS, and Linux—building scalable, cloud-native solutions for smart applications.

Fediverse reactions

Latest Posts

How to Create a Subagent in Claude Code: A Step-by-Step Guide

July 7, 2026
Migrating from Hive Metastore to Unity Catalog in Databricks: Complete Step-by-Step Guide

June 14, 2026
2-Minute Random Topics to Improve English Speaking Skills

June 14, 2026
Spark Structured Streaming vs Spark Declarative Pipelines (SDP): What Every Data Engineer Should Know

June 7, 2026

What is dp in Databricks?

Benefits of Using dp

Understanding Streaming Tables

Example: Creating a Bronze Streaming Table

How It Works

Building a Silver Streaming Layer

Example: Bronze to Silver Transformation

Purpose of the Silver Layer

Understanding Materialized Views

Example: Creating a Gold Layer Materialized View

What Happens?

Complete Medallion Architecture Example

Pipeline Code

Data Flow

Working with Multiple Streaming Sources

Example: Combining US and EU Streams

Advantages

How to Run a Lakeflow Declarative Pipeline

Step 1: Create a Notebook

Step 2: Create a Pipeline

Step 3: Configure Pipeline Settings

Step 4: Select an Execution Mode

Triggered Mode

Continuous Mode

Streaming Tables vs Materialized Views

Best Practices for Databricks DLT Pipelines

Use the Medallion Architecture

Keep Transformations Modular

Use Materialized Views for Reporting

Monitor Pipeline Health

Conclusion

Like this:

Fediverse reactions

Latest Posts

How to Create a Subagent in Claude Code: A Step-by-Step Guide

Migrating from Hive Metastore to Unity Catalog in Databricks: Complete Step-by-Step Guide

2-Minute Random Topics to Improve English Speaking Skills

Spark Structured Streaming vs Spark Declarative Pipelines (SDP): What Every Data Engineer Should Know

Subscribe for DAILY TIPS

Databricks DLT with @dp: A Complete Guide to Streaming and Batch Processing

What is dp in Databricks?

Benefits of Using dp

Understanding Streaming Tables

Example: Creating a Bronze Streaming Table

How It Works

Building a Silver Streaming Layer

Example: Bronze to Silver Transformation

Purpose of the Silver Layer

Understanding Materialized Views

Example: Creating a Gold Layer Materialized View

What Happens?

Complete Medallion Architecture Example

Pipeline Code

Data Flow

Working with Multiple Streaming Sources

Example: Combining US and EU Streams

Advantages

How to Run a Lakeflow Declarative Pipeline

Step 1: Create a Notebook

Step 2: Create a Pipeline

Step 3: Configure Pipeline Settings

Step 4: Select an Execution Mode

Triggered Mode

Continuous Mode

Streaming Tables vs Materialized Views

Best Practices for Databricks DLT Pipelines

Use the Medallion Architecture

Keep Transformations Modular

Use Materialized Views for Reporting

Monitor Pipeline Health

Conclusion

Share this:

Like this:

Fediverse reactions

Latest Posts

How to Create a Subagent in Claude Code: A Step-by-Step Guide

Migrating from Hive Metastore to Unity Catalog in Databricks: Complete Step-by-Step Guide

2-Minute Random Topics to Improve English Speaking Skills

Spark Structured Streaming vs Spark Declarative Pipelines (SDP): What Every Data Engineer Should Know

Discover more from Srinimf