You can automate deployment and operation of DLT pipelines — just like CI/CD for notebooks, jobs, or models — using Databricks REST APIs, CLI, or Infrastructure-as-Code tools such as Terraform.

🧩 1. Core Concept

A DLT pipeline defines:

  • Data source (input tables, files, CDC streams)
  • Transformation logic (in Python or SQL using @dlt.table / CREATE LIVE TABLE)
  • Target tables (Bronze/Silver/Gold)
  • Expectations (data quality rules)
  • Settings (clusters, storage, libraries, etc.)

Automation means you deploy, configure, and trigger this pipeline without manually using the UI.

🚀 2. Automation Methods

Option 1: Databricks REST API

Use the DLT Pipelines API to:

  • Create or Update a pipeline
  • Start/Stop a pipeline run
  • Monitor pipeline status

Example (using curl or from Python):

curl -X POST https://<databricks-instance>/api/2.0/pipelines \
  -H "Authorization: Bearer <DATABRICKS_TOKEN>" \
  -H "Content-Type: application/json" \
  -d '{
        "name": "auto_dlt_pipeline",
        "storage": "dbfs:/pipelines/auto_dlt_storage",
        "configuration": {
          "source": "git",
          "notebook_path": "/Repos/org/project/dlt_pipeline"
        },
        "clusters": [
          {"label": "default", "num_workers": 2}
        ],
        "development": false
      }'

Then trigger it:

curl -X POST https://<databricks-instance>/api/2.0/pipelines/<pipeline_id>/updates \
  -H "Authorization: Bearer <DATABRICKS_TOKEN>"

Option 2: Databricks CLI

Using the CLI simplifies scripting in CI/CD tools (like GitHub Actions, Azure DevOps, Jenkins).

Example:

databricks pipelines create --json-file pipeline_config.json
databricks pipelines start --pipeline-id <id>

Option 3: Terraform

Databricks provides a Terraform provider for DLT automation:

resource "databricks_pipeline" "my_dlt" {
  name    = "auto_dlt_pipeline"
  storage = "dbfs:/pipelines/auto_dlt_storage"

  library {
    notebook {
      path = "/Repos/org/project/dlt_pipeline"
    }
  }

  cluster {
    label       = "default"
    num_workers = 2
  }

  continuous = true
}

Then run:

terraform init
terraform apply

Terraform ensures idempotent deployment — i.e., same config → same environment.

🔁 3. Integrating with CI/CD

Typical CI/CD flow:

  1. Source Control (GitHub/GitLab) → store notebooks + configs
  2. CI Stage: Lint/test notebooks, validate DLT syntax
  3. CD Stage:
    • Use REST API or Terraform to deploy to Databricks workspace
    • Use Databricks CLI or API to trigger pipeline run
    • Monitor with GET /api/2.0/pipelines/<id>/updates

📊 4. Monitoring and Alerts

Use DLT event logs or Databricks SQL dashboard to:

  • Track pipeline success/failure
  • Monitor expectations (data quality)
  • Send alerts via Databricks SQL alerts or Webhook

🧠 Example End-to-End Flow

GitHub → Azure DevOps → Databricks DLT

  1. Developer pushes changes to GitHub (DLT notebooks).
  2. Azure DevOps pipeline triggers.
  3. Pipeline uses Terraform or REST API to update DLT config.
  4. Starts the pipeline run.
  5. Monitors completion and logs results to Slack or Teams.

⚡ Pro Tips

  • Use development mode = false for production.
  • Always version your DLT notebooks in Repos (Git).
  • Store secrets (DB credentials, tokens) in Databricks Secrets.
  • Enable expectations in Advanced DLT to enforce data quality.
  • Use continuous pipelines for near-real-time updates.

Where to Set development = false in Databricks DLT

The development flag controls whether your DLT pipeline runs in development mode (True) or production mode (False).

🧩 What It Means

  • development = trueDevelopment Mode
    • Uses fewer cluster resources
    • Automatically reloads notebooks on code changes
    • Useful for testing & debugging
  • development = falseProduction Mode
    • Uses a stable, optimized cluster
    • Does not auto-reload notebooks
    • Ideal for scheduled or automated production runs

🧱 Where to Change It

1️⃣ In the Databricks UI

When creating or editing a DLT pipeline:

  1. Go to Workflows → Delta Live Tables → Your Pipeline
  2. Click Edit
  3. In the right-side Pipeline settings panel, look for Development mode
  4. Toggle OFF (this sets development = false)
  5. Click Save

2️⃣ In JSON Config (REST API or CLI)

If you deploy using REST API or JSON:

{
  "name": "auto_dlt_pipeline",
  "storage": "dbfs:/pipelines/auto_dlt_storage",
  "development": false,
  "configuration": {
    "source": "git",
    "notebook_path": "/Repos/org/project/dlt_pipeline"
  },
  "clusters": [
    {
      "label": "default",
      "num_workers": 4
    }
  ]
}

✅ Set "development": false in the JSON payload.

3️⃣ In Terraform

When using Terraform to automate deployment:

resource "databricks_pipeline" "my_dlt" {
  name        = "prod_dlt_pipeline"
  storage     = "dbfs:/pipelines/prod_dlt_storage"
  development = false   # <— this makes it Production Mode

  library {
    notebook {
      path = "/Repos/org/project/dlt_pipeline"
    }
  }

  cluster {
    label       = "default"
    num_workers = 4
  }
}

✅ The key line: development = false

Summary

Deployment MethodWhere to ChangeExample Value
Databricks UIToggle Development mode off
REST API / JSON"development": falseJSON config
Terraformdevelopment = falseHCL config

Deploy 100 DLT Pipeline into Production

Deploying 100 Delta Live Tables (DLT) pipelines into production requires automation, version control, and orchestration — you definitely don’t want to do this manually. Let’s break it down step by step.

⚙️ 1️⃣ Strategy Overview

To deploy 100 pipelines efficiently:

  1. Source Control
    • Store all DLT pipelines as notebooks or SQL scripts in Git repos (one repo per team/project or a mono-repo).
    • Versioning ensures reproducibility.
  2. Pipeline Config Template
    • Create a JSON or Terraform template for pipeline deployment.
    • Each pipeline has configurable fields: { "name": "<PIPELINE_NAME>", "storage": "dbfs:/pipelines/<PIPELINE_NAME>", "development": false, "configuration": {"source": "git", "notebook_path": "/Repos/project/<PIPELINE_NAME>"}, "clusters": [{"label": "default", "num_workers": 4}], "continuous": true }
  3. Automation Tool / Orchestration
    • REST API → Create / update pipelines programmatically.
    • Terraform → Define all pipelines as resources and apply in batch.
    • CI/CD system → Orchestrate deployment across all pipelines.

🧱 2️⃣ Using REST API Automation

  1. Prepare a JSON config for each pipeline (can be generated programmatically in Python or bash).
import json

pipelines = ["pipeline_01", "pipeline_02", "pipeline_03"]  # up to 100

for p in pipelines:
    config = {
        "name": p,
        "storage": f"dbfs:/pipelines/{p}",
        "development": False,
        "configuration": {"source": "git", "notebook_path": f"/Repos/project/{p}"},
        "clusters": [{"label": "default", "num_workers": 4}],
        "continuous": True
    }
    with open(f"{p}.json", "w") as f:
        json.dump(config, f)
  1. Deploy pipelines via REST API loop:
for p in pipeline_01 pipeline_02 pipeline_03
do
  curl -X POST https://<databricks-instance>/api/2.0/pipelines \
       -H "Authorization: Bearer <DATABRICKS_TOKEN>" \
       -H "Content-Type: application/json" \
       -d @"${p}.json"
done

This scales to 100+ pipelines.

  1. Optionally, trigger each pipeline run after creation:
curl -X POST https://<databricks-instance>/api/2.0/pipelines/<pipeline_id>/updates \
  -H "Authorization: Bearer <DATABRICKS_TOKEN>"

🧱 3️⃣ Using Terraform (Recommended for Large Scale)

  • Define each pipeline as a Terraform resource:
resource "databricks_pipeline" "pipeline_01" {
  name        = "pipeline_01"
  storage     = "dbfs:/pipelines/pipeline_01"
  development = false
  continuous  = true

  library {
    notebook {
      path = "/Repos/project/pipeline_01"
    }
  }

  cluster {
    label       = "default"
    num_workers = 4
  }
}

# Repeat for pipeline_02, pipeline_03... Or use a module with for_each
  • Apply all pipelines:
terraform init
terraform apply -auto-approve

✅ Benefits:

  • Idempotent (won’t recreate existing pipelines)
  • Version controlled
  • Easy to update all pipelines in batch

⚙️ 4️⃣ Scheduling / Triggering Pipelines

After deployment:

  1. Databricks Jobs → Schedule each pipeline run automatically
  2. REST API / CI/CD → Trigger runs programmatically if needed
  3. Monitoring → Use GET /pipelines/{pipeline_id}/updates for all pipelines

5️⃣ Best Practices

  • Use naming conventions for pipelines (prod_<team>_<data_domain>).
  • Use Terraform modules to avoid duplicating config 100 times.
  • Store secrets in Databricks Secret Scopes — don’t hardcode credentials.
  • Test pipelines in development mode first, then deploy with development=false to prod.
  • Monitor logs centrally (event_log table, Databricks SQL dashboards).