Accessing a PostgreSQL database securely is a foundational task for cloud-based data pipelines. Whether you’re building ETL jobs on AWS Glue or running advanced analytics in Databricks, managing database credentials properly — without hardcoding — is essential.

This blog compares two secure methods to connect to PostgreSQL:

  • AWS Glue + AWS Secrets Manager
  • Databricks + Secret Scopes with Access Control Lists (ACLs)

We’ll walk through connection steps, security controls, and how Databricks ACLs ensure your secrets stay safe.

✅ Scenario

You need to:

  • Connect to a PostgreSQL database.
  • Prevent hardcoding of credentials.
  • Use a secure, auditable, and role-restricted method.

🔐 Option 1: AWS Glue + AWS Secrets Manager

🔸 Overview

AWS Secrets Manager stores secrets securely, such as:

  • Username
  • Password
  • Host, Port, DB Name

It encrypts secrets using AWS KMS, supports automatic rotation, and integrates with IAM for fine-grained access control.

🔸 Example (PySpark in Glue)

import boto3, json

def get_secret(secret_name):
    client = boto3.client('secretsmanager')
    response = client.get_secret_value(SecretId=secret_name)
    return json.loads(response['SecretString'])

secret = get_secret("postgres-creds")

jdbc_url = f"jdbc:postgresql://{secret['host']}:{secret['port']}/{secret['dbname']}"
df = spark.read.format("jdbc")\
    .option("url", jdbc_url)\
    .option("user", secret["username"])\
    .option("password", secret["password"])\
    .option("driver", "org.postgresql.Driver")\
    .option("dbtable", "public.my_table")\
    .load()

✅ Pros

  • Central secret storage across all AWS services.
  • IAM-enforced access control.
  • Supports secret rotation and audit logging via CloudTrail.

💼 Option 2: Databricks + Secret Scopes + ACLs

Databricks Secret Scopes let you store credentials securely within a workspace. They are ideal for notebook and job-based environments.

🔸 Secret Scope Types

  1. Databricks-backed scopes (default)
  2. Azure Key Vault-backed scopes (for Azure Databricks)

🔒 What are Databricks ACLs?

Access Control Lists (ACLs) in Databricks control who can read or write secrets in a given scope.

You can:

  • Allow specific users or groups to read secrets.
  • Restrict write/delete privileges to admins only.
  • Prevent accidental or malicious exposure of credentials.

🔹 ACL Example

To allow the analytics-team group to read from postgres-creds scope:

databricks secrets put-acl --scope postgres-creds --principal analytics-team --permission READ

Permissions available:

  • READ: User can read secrets
  • WRITE: User can create, update, delete secrets
  • MANAGE: Full access, including ACL changes

🔹 Enable ACLs on a scope

databricks secrets put-acl --scope postgres-creds --principal admins --permission MANAGE

Note: ACLs must be enabled at workspace level by admin.

🔸 Databricks PostgreSQL Connection Example

username = dbutils.secrets.get(scope="postgres-creds", key="username")
password = dbutils.secrets.get(scope="postgres-creds", key="password")
host = dbutils.secrets.get(scope="postgres-creds", key="host")
dbname = dbutils.secrets.get(scope="postgres-creds", key="dbname")

jdbc_url = f"jdbc:postgresql://{host}:5432/{dbname}"

df = spark.read.format("jdbc") \
    .option("url", jdbc_url) \
    .option("user", username) \
    .option("password", password) \
    .option("driver", "org.postgresql.Driver") \
    .option("dbtable", "public.my_table") \
    .load()

⚖️ Feature Comparison: AWS vs. Databricks

FeatureAWS Glue + Secrets ManagerDatabricks + Secret Scopes + ACLs
Secret storageCentralized in Secrets ManagerWorkspace-scoped or Azure Vault
IAM/ACL support✅ IAM roles + CloudTrail✅ ACL-based user/group control
Audit logs✅ CloudTrail✅ Workspace audit logs
Secret rotation✅ Built-in rotation❌ Manual (unless external vault)
Ease of useModerate (code-heavy)Very easy (notebook-native)
Cross-service integration✅ Lambda, EC2, Glue, etc.✅ Notebooks, Jobs, Workflows
Fine-grained permissions✅ IAM policies✅ ACLs per secret scope

🧠 Developer Experience

  • AWS Glue requires managing IAM, boto3 code, and JSON parsing — powerful, but verbose.
  • Databricks offers a simpler approach via dbutils.secrets.get() and easy ACL configuration — ideal for fast development cycles.

🛡️ Best Practices for Secrets in Both

  • Never hardcode credentials in notebooks or jobs.
  • Use IAM/ACL policies with least privilege.
  • Audit secret access periodically.
  • Rotate credentials regularly.
  • Add ?ssl=true to PostgreSQL JDBC URLs for encrypted connections.

🧾 Conclusion

If you’re deciding between AWS Glue and Databricks for connecting to PostgreSQL:

Use CaseRecommendation
Enterprise with AWS-native stackSecrets Manager + Glue for central management
Notebook-driven workflowsDatabricks + Secret Scopes with ACLs
Need auto-rotation of secretsPrefer AWS Secrets Manager
Need rapid experimentationPrefer Databricks Secret Scopes

Regardless of platform, secure secrets handling is a non-negotiable. Tools like Secrets Manager and Databricks ACLs ensure your pipelines are not only efficient but compliant and safe.

Fediverse reactions

Discover more from Srinimf

Subscribe now to keep reading and get access to the full archive.

Continue reading