You can automate deployment and operation of DLT pipelines — just like CI/CD for notebooks, jobs, or models — using Databricks REST APIs, CLI, or Infrastructure-as-Code tools such as Terraform.
🧩 1. Core Concept
A DLT pipeline defines:
- Data source (input tables, files, CDC streams)
- Transformation logic (in Python or SQL using
@dlt.table/CREATE LIVE TABLE) - Target tables (Bronze/Silver/Gold)
- Expectations (data quality rules)
- Settings (clusters, storage, libraries, etc.)
Automation means you deploy, configure, and trigger this pipeline without manually using the UI.
🚀 2. Automation Methods
✅ Option 1: Databricks REST API
Use the DLT Pipelines API to:
- Create or Update a pipeline
- Start/Stop a pipeline run
- Monitor pipeline status
Example (using curl or from Python):
curl -X POST https://<databricks-instance>/api/2.0/pipelines \
-H "Authorization: Bearer <DATABRICKS_TOKEN>" \
-H "Content-Type: application/json" \
-d '{
"name": "auto_dlt_pipeline",
"storage": "dbfs:/pipelines/auto_dlt_storage",
"configuration": {
"source": "git",
"notebook_path": "/Repos/org/project/dlt_pipeline"
},
"clusters": [
{"label": "default", "num_workers": 2}
],
"development": false
}'
Then trigger it:
curl -X POST https://<databricks-instance>/api/2.0/pipelines/<pipeline_id>/updates \
-H "Authorization: Bearer <DATABRICKS_TOKEN>"
✅ Option 2: Databricks CLI
Using the CLI simplifies scripting in CI/CD tools (like GitHub Actions, Azure DevOps, Jenkins).
Example:
databricks pipelines create --json-file pipeline_config.json
databricks pipelines start --pipeline-id <id>
✅ Option 3: Terraform
Databricks provides a Terraform provider for DLT automation:
resource "databricks_pipeline" "my_dlt" {
name = "auto_dlt_pipeline"
storage = "dbfs:/pipelines/auto_dlt_storage"
library {
notebook {
path = "/Repos/org/project/dlt_pipeline"
}
}
cluster {
label = "default"
num_workers = 2
}
continuous = true
}
Then run:
terraform init
terraform apply
Terraform ensures idempotent deployment — i.e., same config → same environment.
🔁 3. Integrating with CI/CD
Typical CI/CD flow:
- Source Control (GitHub/GitLab) → store notebooks + configs
- CI Stage: Lint/test notebooks, validate DLT syntax
- CD Stage:
- Use REST API or Terraform to deploy to Databricks workspace
- Use Databricks CLI or API to trigger pipeline run
- Monitor with
GET /api/2.0/pipelines/<id>/updates
📊 4. Monitoring and Alerts
Use DLT event logs or Databricks SQL dashboard to:
- Track pipeline success/failure
- Monitor expectations (data quality)
- Send alerts via Databricks SQL alerts or Webhook
🧠 Example End-to-End Flow
GitHub → Azure DevOps → Databricks DLT
- Developer pushes changes to GitHub (DLT notebooks).
- Azure DevOps pipeline triggers.
- Pipeline uses Terraform or REST API to update DLT config.
- Starts the pipeline run.
- Monitors completion and logs results to Slack or Teams.
⚡ Pro Tips
- Use development mode = false for production.
- Always version your DLT notebooks in Repos (Git).
- Store secrets (DB credentials, tokens) in Databricks Secrets.
- Enable expectations in Advanced DLT to enforce data quality.
- Use continuous pipelines for near-real-time updates.
Where to Set development = false in Databricks DLT
The development flag controls whether your DLT pipeline runs in development mode (True) or production mode (False).
🧩 What It Means
development = true→ Development Mode- Uses fewer cluster resources
- Automatically reloads notebooks on code changes
- Useful for testing & debugging
development = false→ Production Mode- Uses a stable, optimized cluster
- Does not auto-reload notebooks
- Ideal for scheduled or automated production runs
🧱 Where to Change It
1️⃣ In the Databricks UI
When creating or editing a DLT pipeline:
- Go to Workflows → Delta Live Tables → Your Pipeline
- Click Edit
- In the right-side Pipeline settings panel, look for Development mode
- Toggle OFF (this sets
development = false) - Click Save
2️⃣ In JSON Config (REST API or CLI)
If you deploy using REST API or JSON:
{
"name": "auto_dlt_pipeline",
"storage": "dbfs:/pipelines/auto_dlt_storage",
"development": false,
"configuration": {
"source": "git",
"notebook_path": "/Repos/org/project/dlt_pipeline"
},
"clusters": [
{
"label": "default",
"num_workers": 4
}
]
}
✅ Set
"development": falsein the JSON payload.
3️⃣ In Terraform
When using Terraform to automate deployment:
resource "databricks_pipeline" "my_dlt" {
name = "prod_dlt_pipeline"
storage = "dbfs:/pipelines/prod_dlt_storage"
development = false # <— this makes it Production Mode
library {
notebook {
path = "/Repos/org/project/dlt_pipeline"
}
}
cluster {
label = "default"
num_workers = 4
}
}
✅ The key line:
development = false
⚡ Summary
| Deployment Method | Where to Change | Example Value |
|---|---|---|
| Databricks UI | Toggle Development mode off | — |
| REST API / JSON | "development": false | JSON config |
| Terraform | development = false | HCL config |
Deploy 100 DLT Pipeline into Production
Deploying 100 Delta Live Tables (DLT) pipelines into production requires automation, version control, and orchestration — you definitely don’t want to do this manually. Let’s break it down step by step.
⚙️ 1️⃣ Strategy Overview
To deploy 100 pipelines efficiently:
- Source Control
- Store all DLT pipelines as notebooks or SQL scripts in Git repos (one repo per team/project or a mono-repo).
- Versioning ensures reproducibility.
- Pipeline Config Template
- Create a JSON or Terraform template for pipeline deployment.
- Each pipeline has configurable fields:
{ "name": "<PIPELINE_NAME>", "storage": "dbfs:/pipelines/<PIPELINE_NAME>", "development": false, "configuration": {"source": "git", "notebook_path": "/Repos/project/<PIPELINE_NAME>"}, "clusters": [{"label": "default", "num_workers": 4}], "continuous": true }
- Automation Tool / Orchestration
- REST API → Create / update pipelines programmatically.
- Terraform → Define all pipelines as resources and apply in batch.
- CI/CD system → Orchestrate deployment across all pipelines.
🧱 2️⃣ Using REST API Automation
- Prepare a JSON config for each pipeline (can be generated programmatically in Python or bash).
import json
pipelines = ["pipeline_01", "pipeline_02", "pipeline_03"] # up to 100
for p in pipelines:
config = {
"name": p,
"storage": f"dbfs:/pipelines/{p}",
"development": False,
"configuration": {"source": "git", "notebook_path": f"/Repos/project/{p}"},
"clusters": [{"label": "default", "num_workers": 4}],
"continuous": True
}
with open(f"{p}.json", "w") as f:
json.dump(config, f)
- Deploy pipelines via REST API loop:
for p in pipeline_01 pipeline_02 pipeline_03
do
curl -X POST https://<databricks-instance>/api/2.0/pipelines \
-H "Authorization: Bearer <DATABRICKS_TOKEN>" \
-H "Content-Type: application/json" \
-d @"${p}.json"
done
This scales to 100+ pipelines.
- Optionally, trigger each pipeline run after creation:
curl -X POST https://<databricks-instance>/api/2.0/pipelines/<pipeline_id>/updates \
-H "Authorization: Bearer <DATABRICKS_TOKEN>"
🧱 3️⃣ Using Terraform (Recommended for Large Scale)
- Define each pipeline as a Terraform resource:
resource "databricks_pipeline" "pipeline_01" {
name = "pipeline_01"
storage = "dbfs:/pipelines/pipeline_01"
development = false
continuous = true
library {
notebook {
path = "/Repos/project/pipeline_01"
}
}
cluster {
label = "default"
num_workers = 4
}
}
# Repeat for pipeline_02, pipeline_03... Or use a module with for_each
- Apply all pipelines:
terraform init
terraform apply -auto-approve
✅ Benefits:
- Idempotent (won’t recreate existing pipelines)
- Version controlled
- Easy to update all pipelines in batch
⚙️ 4️⃣ Scheduling / Triggering Pipelines
After deployment:
- Databricks Jobs → Schedule each pipeline run automatically
- REST API / CI/CD → Trigger runs programmatically if needed
- Monitoring → Use
GET /pipelines/{pipeline_id}/updatesfor all pipelines
⚡ 5️⃣ Best Practices
- Use naming conventions for pipelines (
prod_<team>_<data_domain>). - Use Terraform modules to avoid duplicating config 100 times.
- Store secrets in Databricks Secret Scopes — don’t hardcode credentials.
- Test pipelines in development mode first, then deploy with
development=falseto prod. - Monitor logs centrally (
event_logtable, Databricks SQL dashboards).






