Modern data engineering requires handling both real-time streaming data and traditional batch workloads efficiently. Databricks has evolved Delta Live Tables (DLT) into Lakeflow Declarative Pipelines, enabling data engineers to build reliable and scalable pipelines with less code.
Using the dp module, developers can define streaming tables and materialized views declaratively. Databricks automatically manages dependencies, execution order, monitoring, and recovery.
In this guide, you’ll learn how to use @dp.table for streaming pipelines and @dp.materialized_view for batch processing with practical examples.
What is dp in Databricks?
The dp module stands for Declarative Pipelines and is imported as:
from pyspark import pipelines as dp
Instead of manually orchestrating ETL jobs, you define datasets and transformations. Databricks handles the pipeline execution behind the scenes.
Benefits of Using dp
- Simplified pipeline development
- Automatic dependency tracking
- Built-in monitoring and observability
- Support for both streaming and batch processing
- Easy implementation of Medallion Architecture
- Reduced operational complexity
Understanding Streaming Tables
Streaming tables continuously process incoming data as it arrives.
Common streaming sources include:
- Amazon S3
- Azure Data Lake Storage
- Google Cloud Storage
- Apache Kafka
- Change Data Capture (CDC) streams
Example: Creating a Bronze Streaming Table
Suppose CSV files are continuously arriving in cloud storage.
from pyspark import pipelines as dp@dp.table( name="bronze_orders")def bronze_orders(): return ( spark.readStream .format("cloudFiles") .option("cloudFiles.format", "csv") .load("/Volumes/raw/orders/") )
How It Works
spark.readStreamenables streaming mode.- Auto Loader detects new files automatically.
- Data is processed incrementally.
- A streaming table named
bronze_ordersis created.
Building a Silver Streaming Layer
After ingesting raw data, the next step is cleansing and validation.
Example: Bronze to Silver Transformation
from pyspark import pipelines as dp@dp.table( name="silver_orders")def silver_orders(): return ( spark.readStream.table("bronze_orders") .filter("order_amount > 0") )
Purpose of the Silver Layer
- Remove invalid records
- Standardize data formats
- Apply business rules
- Improve data quality
Understanding Materialized Views
Materialized Views are optimized for batch processing and reporting workloads.
Unlike streaming tables, materialized views store precomputed results that can be refreshed automatically when source data changes.
Example: Creating a Gold Layer Materialized View
from pyspark import pipelines as dp@dp.materialized_view( name="daily_sales")def daily_sales(): return ( spark.table("silver_orders") .groupBy("order_date") .sum("order_amount") )
What Happens?
- Reads clean data from the Silver layer
- Aggregates sales by date
- Stores precomputed results
- Improves dashboard and reporting performance
Complete Medallion Architecture Example
The following example demonstrates a complete Bronze-Silver-Gold pipeline.
Pipeline Code
from pyspark import pipelines as dp# Bronze Layer@dp.table(name="bronze_orders")def bronze_orders(): return ( spark.readStream .format("cloudFiles") .option("cloudFiles.format", "csv") .load("/Volumes/raw/orders/") )# Silver Layer@dp.table(name="silver_orders")def silver_orders(): return ( spark.readStream.table("bronze_orders") .filter("order_amount > 0") )# Gold Layer@dp.materialized_view(name="daily_sales")def daily_sales(): return ( spark.table("silver_orders") .groupBy("order_date") .sum("order_amount") )
Data Flow
Raw Files ↓Bronze (Streaming) ↓Silver (Streaming) ↓Gold (Materialized View)
Working with Multiple Streaming Sources
Organizations often receive data from multiple locations or business units.
Example: Combining US and EU Streams
from pyspark import pipelines as dpdp.create_streaming_table("all_orders")@dp.append_flow(target="all_orders")def us_orders(): return ( spark.readStream .format("cloudFiles") .option("cloudFiles.format", "csv") .load("/data/us/") )@dp.append_flow(target="all_orders")def eu_orders(): return ( spark.readStream .format("cloudFiles") .option("cloudFiles.format", "csv") .load("/data/eu/") )
Advantages
- Consolidates multiple sources
- Simplifies downstream processing
- Supports scalable ingestion patterns
How to Run a Lakeflow Declarative Pipeline
Step 1: Create a Notebook
Create a Databricks notebook and add your pipeline code.
from pyspark import pipelines as dp
Step 2: Create a Pipeline
Navigate to:
Workflows → Lakeflow Declarative Pipelines → Create Pipeline
Step 3: Configure Pipeline Settings
Provide the following information:
- Notebook Path
- Catalog
- Schema
- Storage Location
Example:
Catalog : demoSchema : bronze
Step 4: Select an Execution Mode
Triggered Mode
Runs the pipeline once and then stops.
Best suited for:
- Daily ETL jobs
- Scheduled reports
- Batch processing
Continuous Mode
Runs continuously and processes incoming data automatically.
Best suited for:
- Real-time analytics
- Event-driven architectures
- Streaming applications
Streaming Tables vs Materialized Views
| Feature | Streaming Table | Materialized View |
|---|---|---|
| Read Method | spark.readStream | spark.table |
| Processing Type | Streaming | Batch |
| Decorator | @dp.table | @dp.materialized_view |
| Real-Time Updates | Yes | No |
| Reporting Workloads | Limited | Excellent |
| Aggregations | Supported | Recommended |
Best Practices for Databricks DLT Pipelines
Use the Medallion Architecture
Separate your data into:
- Bronze (Raw Data)
- Silver (Cleaned Data)
- Gold (Business-Ready Data)
Keep Transformations Modular
Create smaller pipeline functions rather than one large transformation.
Use Materialized Views for Reporting
Aggregations and dashboard datasets perform better when stored as materialized views.
Monitor Pipeline Health
Leverage Databricks pipeline monitoring to identify failures and performance bottlenecks quickly.
Conclusion
Databricks Lakeflow Declarative Pipelines simplify modern data engineering by providing a declarative framework for building both streaming and batch pipelines. Using @dp.table for streaming ingestion and @dp.materialized_view for batch transformations enables teams to develop scalable, maintainable, and production-ready data pipelines with minimal operational effort.
By adopting these patterns, organizations can implement robust Medallion Architectures and accelerate their journey toward real-time analytics and data-driven decision-making.






Start Discussion