How to Read Databricks Job Logs Programmatically

If you’re working with production workloads in Databricks, analyzing job logs is crucial for debugging, monitoring, and automation.

While the Databricks UI provides a convenient way to check logs, many data engineers and developers prefer to access logs programmatically—especially for automation, reporting, or alerts.

In this post, you’ll learn how to programmatically read Databricks job logs using Python and REST APIs, along with a few additional methods.

🔧 Why Read Logs Programmatically?

✅ Automate error tracking or health monitoring
✅ Build dashboards for job status
✅ Integrate logs with monitoring tools
✅ Store logs in S3, Azure Blob, or internal systems
✅ Trigger actions based on job failures

🛠️ 1. Read Logs Using Databricks REST API in Python

The Databricks REST API v2.1 provides powerful endpoints to access job run data, including logs.

✅ Step 1: Setup Authentication

Generate a Personal Access Token from your Databricks account settings. Use it like this:

import requests
DATABRICKS_INSTANCE = 'https://<your-databricks-instance>'  # e.g., https://adb-123456.7.azuredatabricks.net
TOKEN = 'dapiXXXXXXXXXXXXXXXXXXXX'
HEADERS = {'Authorization': f'Bearer {TOKEN}'}

✅ Step 2: List Job Runs

You can list past runs of a specific job by ID:

job_id = 12345  # Replace with your actual Job ID
response = requests.get(
    f'{DATABRICKS_INSTANCE}/api/2.1/jobs/runs/list?job_id={job_id}',
    headers=HEADERS
)
runs = response.json()

✅ Step 3: Get Logs and Notebook Output

To fetch the log output of a specific run:

run_id = runs['runs'][0]['run_id']  # Fetch latest run ID
log_response = requests.get(
    f'{DATABRICKS_INSTANCE}/api/2.1/jobs/runs/get-output?run_id={run_id}',
    headers=HEADERS
)
log_output = log_response.json()
print("Logs:", log_output.get("logs", "No logs available"))
print("Notebook Output:", log_output.get("notebook_output", {}).get("result", "No notebook output"))

You can also access:

error → if the job failed
metadata → status, cluster details, etc.

Recommended Books

📂 2. Read Logs from DBFS (Databricks File System)

If you write logs to DBFS from your code (e.g., using print() or logging), you can read them like this:

# Within a Databricks notebook
dbutils.fs.head("dbfs:/logs/my-job-log.txt")
# From Python code
with open("/dbfs/logs/my-job-log.txt") as f:
    print(f.read())

To enable this, make sure your jobs log output explicitly:

import logging
logging.basicConfig(level=logging.INFO)
logging.info("This log will be stored and can be read later.")

💻 3. Optional: Use `databricks-cli` for Scripting

If you prefer shell scripts, install the Databricks CLI:

databricks configure --token

Then use:

databricks jobs runs get-output --run-id <run_id>

This CLI can be incorporated into CI/CD pipelines and monitoring scripts.

🧠 Best Practices

Use print() or Python logging to generate readable logs
Store logs in a consistent path in DBFS or a cloud bucket
Automate log retrieval and integrate with alerting systems
Combine with Databricks webhooks or monitoring tools like Prometheus or PagerDuty

📌 Conclusion

Reading Databricks job logs programmatically gives you full control over monitoring and automation. Whether you’re troubleshooting failures or building dashboards, the Databricks API, DBFS, and CLI offer flexible options.

If you’re running production jobs at scale, log automation isn’t just a good practice—it’s essential.

Srini

Data Engineer with deep AI and Generative AI expertise, crafting high-performance data pipelines in PySpark, Databricks, and SQL. Skilled in Python, AWS, and Linux—building scalable, cloud-native solutions for smart applications.