Step-by-Step Guide to Automate DLT Pipeline Deployment on Databricks

You can automate deployment and operation of DLT pipelines — just like CI/CD for notebooks, jobs, or models — using Databricks REST APIs, CLI, or Infrastructure-as-Code tools such as Terraform.

🧩 1. Core Concept

A DLT pipeline defines:

Data source (input tables, files, CDC streams)
Transformation logic (in Python or SQL using @dlt.table / CREATE LIVE TABLE)
Target tables (Bronze/Silver/Gold)
Expectations (data quality rules)
Settings (clusters, storage, libraries, etc.)

Automation means you deploy, configure, and trigger this pipeline without manually using the UI.

🚀 2. Automation Methods

✅ Option 1: Databricks REST API

Use the DLT Pipelines API to:

Create or Update a pipeline
Start/Stop a pipeline run
Monitor pipeline status

Example (using curl or from Python):

curl -X POST https://<databricks-instance>/api/2.0/pipelines \
  -H "Authorization: Bearer <DATABRICKS_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
        "name": "auto_dlt_pipeline",
        "storage": "dbfs:/pipelines/auto_dlt_storage",
        "configuration": {
          "source": "git",
          "notebook_path": "/Repos/org/project/dlt_pipeline"
        },
        "clusters": [
          {"label": "default", "num_workers": 2}
        ],
        "development": false
      }'

Then trigger it:

curl -X POST https://<databricks-instance>/api/2.0/pipelines/<pipeline_id>/updates \
  -H "Authorization: Bearer <DATABRICKS_TOKEN>"

✅ Option 2: Databricks CLI

Using the CLI simplifies scripting in CI/CD tools (like GitHub Actions, Azure DevOps, Jenkins).

Example:

databricks pipelines create --json-file pipeline_config.json
databricks pipelines start --pipeline-id <id>

✅ Option 3: Terraform

Databricks provides a Terraform provider for DLT automation:

resource "databricks_pipeline" "my_dlt" {
  name    = "auto_dlt_pipeline"
  storage = "dbfs:/pipelines/auto_dlt_storage"

  library {
    notebook {
      path = "/Repos/org/project/dlt_pipeline"
    }
  }

  cluster {
    label       = "default"
    num_workers = 2
  }

  continuous = true
}

Then run:

terraform init
terraform apply

Terraform ensures idempotent deployment — i.e., same config → same environment.

🔁 3. Integrating with CI/CD

Typical CI/CD flow:

Source Control (GitHub/GitLab) → store notebooks + configs
CI Stage: Lint/test notebooks, validate DLT syntax
CD Stage:
- Use REST API or Terraform to deploy to Databricks workspace
- Use Databricks CLI or API to trigger pipeline run
- Monitor with GET /api/2.0/pipelines/<id>/updates

📊 4. Monitoring and Alerts

Use DLT event logs or Databricks SQL dashboard to:

Track pipeline success/failure
Monitor expectations (data quality)
Send alerts via Databricks SQL alerts or Webhook

🧠 Example End-to-End Flow

GitHub → Azure DevOps → Databricks DLT

Developer pushes changes to GitHub (DLT notebooks).
Azure DevOps pipeline triggers.
Pipeline uses Terraform or REST API to update DLT config.
Starts the pipeline run.
Monitors completion and logs results to Slack or Teams.

⚡ Pro Tips

Use development mode = false for production.
Always version your DLT notebooks in Repos (Git).
Store secrets (DB credentials, tokens) in Databricks Secrets.
Enable expectations in Advanced DLT to enforce data quality.
Use continuous pipelines for near-real-time updates.

Where to Set `development = false` in Databricks DLT

The development flag controls whether your DLT pipeline runs in development mode (True) or production mode (False).

🧩 What It Means

development = true → Development Mode
- Uses fewer cluster resources
- Automatically reloads notebooks on code changes
- Useful for testing & debugging
development = false → Production Mode
- Uses a stable, optimized cluster
- Does not auto-reload notebooks
- Ideal for scheduled or automated production runs

🧱 Where to Change It

1️⃣ In the Databricks UI

When creating or editing a DLT pipeline:

Go to Workflows → Delta Live Tables → Your Pipeline
Click Edit
In the right-side Pipeline settings panel, look for Development mode
Toggle OFF (this sets development = false)
Click Save

2️⃣ In JSON Config (REST API or CLI)

If you deploy using REST API or JSON:

{
  "name": "auto_dlt_pipeline",
  "storage": "dbfs:/pipelines/auto_dlt_storage",
  "development": false,
  "configuration": {
    "source": "git",
    "notebook_path": "/Repos/org/project/dlt_pipeline"
  },
  "clusters": [
    {
      "label": "default",
      "num_workers": 4
    }
  ]
}

✅ Set "development": false in the JSON payload.

3️⃣ In Terraform

When using Terraform to automate deployment:

resource "databricks_pipeline" "my_dlt" {
  name        = "prod_dlt_pipeline"
  storage     = "dbfs:/pipelines/prod_dlt_storage"
  development = false   # <— this makes it Production Mode

  library {
    notebook {
      path = "/Repos/org/project/dlt_pipeline"
    }
  }

  cluster {
    label       = "default"
    num_workers = 4
  }
}

✅ The key line: development = false

⚡ Summary

Deployment Method	Where to Change	Example Value
Databricks UI	Toggle Development mode off	—
REST API / JSON	`"development": false`	JSON config
Terraform	`development = false`	HCL config

Deploy 100 DLT Pipeline into Production

Deploying 100 Delta Live Tables (DLT) pipelines into production requires automation, version control, and orchestration — you definitely don’t want to do this manually. Let’s break it down step by step.

⚙️ 1️⃣ Strategy Overview

To deploy 100 pipelines efficiently:

Source Control
- Store all DLT pipelines as notebooks or SQL scripts in Git repos (one repo per team/project or a mono-repo).
- Versioning ensures reproducibility.
Pipeline Config Template
- Create a JSON or Terraform template for pipeline deployment.
- Each pipeline has configurable fields: { "name": "<PIPELINE_NAME>", "storage": "dbfs:/pipelines/<PIPELINE_NAME>", "development": false, "configuration": {"source": "git", "notebook_path": "/Repos/project/<PIPELINE_NAME>"}, "clusters": [{"label": "default", "num_workers": 4}], "continuous": true }
Automation Tool / Orchestration
- REST API → Create / update pipelines programmatically.
- Terraform → Define all pipelines as resources and apply in batch.
- CI/CD system → Orchestrate deployment across all pipelines.

🧱 2️⃣ Using REST API Automation

Prepare a JSON config for each pipeline (can be generated programmatically in Python or bash).

import json

pipelines = ["pipeline_01", "pipeline_02", "pipeline_03"]  # up to 100

for p in pipelines:
    config = {
        "name": p,
        "storage": f"dbfs:/pipelines/{p}",
        "development": False,
        "configuration": {"source": "git", "notebook_path": f"/Repos/project/{p}"},
        "clusters": [{"label": "default", "num_workers": 4}],
        "continuous": True
    }
    with open(f"{p}.json", "w") as f:
        json.dump(config, f)

Deploy pipelines via REST API loop:

for p in pipeline_01 pipeline_02 pipeline_03
do
  curl -X POST https://<databricks-instance>/api/2.0/pipelines \
       -H "Authorization: Bearer <DATABRICKS_TOKEN>" \
       -H "Content-Type: application/json" \
       -d @"${p}.json"
done

This scales to 100+ pipelines.

Optionally, trigger each pipeline run after creation:

curl -X POST https://<databricks-instance>/api/2.0/pipelines/<pipeline_id>/updates \
  -H "Authorization: Bearer <DATABRICKS_TOKEN>"

🧱 3️⃣ Using Terraform (Recommended for Large Scale)

Define each pipeline as a Terraform resource:

resource "databricks_pipeline" "pipeline_01" {
  name        = "pipeline_01"
  storage     = "dbfs:/pipelines/pipeline_01"
  development = false
  continuous  = true

  library {
    notebook {
      path = "/Repos/project/pipeline_01"
    }
  }

  cluster {
    label       = "default"
    num_workers = 4
  }
}

# Repeat for pipeline_02, pipeline_03... Or use a module with for_each

Apply all pipelines:

terraform init
terraform apply -auto-approve

✅ Benefits:

Idempotent (won’t recreate existing pipelines)
Version controlled
Easy to update all pipelines in batch

⚙️ 4️⃣ Scheduling / Triggering Pipelines

After deployment:

Databricks Jobs → Schedule each pipeline run automatically
REST API / CI/CD → Trigger runs programmatically if needed
Monitoring → Use GET /pipelines/{pipeline_id}/updates for all pipelines

⚡ 5️⃣ Best Practices

Use naming conventions for pipelines (prod_<team>_<data_domain>).
Use Terraform modules to avoid duplicating config 100 times.
Store secrets in Databricks Secret Scopes — don’t hardcode credentials.
Test pipelines in development mode first, then deploy with development=false to prod.
Monitor logs centrally (event_log table, Databricks SQL dashboards).

Srini

Data Engineer with deep AI and Generative AI expertise, crafting high-performance data pipelines in PySpark, Databricks, and SQL. Skilled in Python, AWS, and Linux—building scalable, cloud-native solutions for smart applications.

Latest Posts

AI Agents for Beginners: Everything You Need to Know

May 10, 2026
Quick SQL Interview Questions for Data Engineers (Little Tricky)

April 26, 2026
AWS Glue Crawler Issue with Dynamic S3 Folder Paths? Here’s the Complete Fix

April 19, 2026
How to Create a Generic Stored Procedure for KPI Calculation (SQL + AWS Lambda)

April 4, 2026