When building data pipelines on AWS, you’ll often combine AWS Glue with services like S3, Kinesis, Kafka, SNS, EventBridge, Lambda, Redshift, and Athena. Each service plays a key role in data ingestion, transformation, and analytics. In this guide, we’ll cover the most crucial aspects of these services — including parallelism, scalability, performance optimization, failure handling, and access control. Whether you are preparing for an interview or working on real-world projects, this blog will help you quickly refresh the fundamentals of AWS Glue and its related ecosystem.

I cover:

  • Parallelism
  • Failures & Failure handling
  • Performance
  • Scalability
  • Access & Permissions

This way, you get a complete refresh in ~1 hour.

🔹 AWS Glue

Parallelism

  • Achieved via Spark executors/partitions, workers (Standard/G.1X/G.2X), and JDBC partitioned reads.
  • DynamicFrames allow parallel transformations.

Failures & Handling

  • Job retries configurable.
  • Streaming jobs use checkpointing.
  • Errors logged to CloudWatch.
  • Schema evolution handled via Glue Data Catalog updates.

Performance

  • Predicate pushdown, partition pruning.
  • Columnar formats (Parquet/ORC).
  • Optimize spark.sql.shuffle.partitions.
  • Use DataFrames instead of DynamicFrames where possible.

Scalability

  • Auto-scaling workers.
  • Can run workflows with multiple dependent jobs.
  • Handles TB–PB scale data when partitioned well.

Access & Permissions

  • IAM role required with S3/DB/Redshift/Kinesis permissions.
  • Lake Formation can provide fine-grained table/column security.

🔹 Amazon S3

Parallelism

  • Parallel reads/writes by Glue, Athena, Redshift COPY, or Lambda using multiple threads.

Failures & Handling

  • Object upload retries (S3 auto-retries on failure).
  • Versioning to recover deleted/overwritten files.

Performance

  • Use Parquet/ORC instead of CSV/JSON.
  • Partition folders by date/region.
  • Avoid many small files (optimize file size ~128 MB).

Scalability

  • Virtually unlimited objects/storage.
  • Scales automatically with request rates.

Access & Permissions

  • Controlled via IAM roles, bucket policies, ACLs.
  • Can use KMS encryption for compliance.

🔹 Amazon Kinesis

Parallelism

  • Parallelism via shards (each shard = 1 MB/sec write, 2 MB/sec read).
  • Glue or Lambda can process shards in parallel.

Failures & Handling

  • Data stored up to 7 days (retry window).
  • Dead-letter queues (DLQ) with Lambda for failed records.
  • Consumer retries on checkpoint failures.

Performance

  • Increase shard count for throughput.
  • Use enhanced fan-out for high-performance consumers.

Scalability

  • Scale shards up/down elastically.
  • Managed by AWS, scales with demand.

Access & Permissions

  • IAM roles/policies for producers/consumers.
  • VPC endpoints for private access.

🔹 Apache Kafka (MSK)

Parallelism

  • Parallelism via partitions per topic.
  • Consumers can read partitions in parallel.

Failures & Handling

  • Offsets stored in Kafka or external DBs.
  • Consumer retries configurable.
  • Replication across brokers for fault tolerance.

Performance

  • Tune partition count, batch size, compression.
  • Use multiple brokers for throughput.

Scalability

  • Add more brokers & partitions.
  • Horizontally scalable cluster.

Access & Permissions

  • IAM (for MSK) or SASL/SSL for client authentication.
  • Fine-grained access via Kafka ACLs.

🔹 Amazon SNS

Parallelism

  • Fan-out to multiple subscribers in parallel (Lambda, SQS, HTTP).

Failures & Handling

  • Retry policy for undelivered messages.
  • DLQ support with SQS.

Performance

  • High throughput pub/sub.
  • Message filtering for efficient delivery.

Scalability

  • Fully managed, scales automatically to millions of messages/sec.

Access & Permissions

  • Controlled with IAM policies and topic policies.
  • Encryption with KMS supported.

🔹 Amazon EventBridge

Parallelism

  • Routes events to multiple targets in parallel.

Failures & Handling

  • Retries failed targets.
  • DLQ via SQS.
  • Archive/replay for event recovery.

Performance

  • Near real-time delivery.
  • Rule filtering to reduce unnecessary processing.

Scalability

  • Scales automatically with event volume.

Access & Permissions

  • IAM roles required for publishing/consuming events.
  • Resource-based policies on event buses.

🔹 AWS Lambda

Parallelism

  • Event-driven: each event triggers one function instance.
  • Account concurrency limit (default 1,000, can increase).

Failures & Handling

  • Automatic retries (for async invokes).
  • DLQ/SNS/SQS for failed events.

Performance

  • Cold start overhead (mitigated with provisioned concurrency).
  • Tune memory → improves CPU/network.

Scalability

  • Scales automatically with request load.
  • Burst concurrency supported.

Access & Permissions

  • IAM execution role defines access.
  • Can run inside VPC for private resources.

AWS Lambda itself does not manage schema evolution.
Lambda is just a compute service — it processes the payload (JSON, Avro, Parquet, etc.) that you send in. Schema evolution comes into play when Lambda is integrated with data services like Glue, Kafka, Kinesis, or EventBridge.


🔹 How Schema Evolution Relates to Lambda

  1. With AWS Glue Data Catalog
    • If your Lambda function reads data from an S3 bucket registered in Glue, the schema is stored in the Glue Data Catalog.
    • Schema evolution (new columns, type changes) is managed in Glue, not in Lambda.
    • Your Lambda code must be written defensively (e.g., handle missing/new fields in JSON).
  2. With AWS Glue Schema Registry (Kafka/Kinesis)
    • If Lambda consumes from Kafka (MSK) or Kinesis with a schema registered in Glue Schema Registry, it validates the message schema before processing.
    • Schema evolution (like backward/forward compatibility rules) is enforced by the Schema Registry, not Lambda.
    • If the schema breaks compatibility, Lambda may fail unless you add fallback handling.
  3. With EventBridge
    • EventBridge allows schema discovery and stores event structure in the Schema Registry.
    • Lambda subscribers can use this schema (code bindings can be auto-generated).
    • If the schema evolves, Lambda still gets the event, but your function must be updated to handle new/changed fields.

🔹 Best Practices for Handling Schema Evolution in Lambda

  • Use default values for missing fields when parsing payloads.
  • Wrap parsing logic in try/except to handle unknown/new fields gracefully.
  • Version control your schema in Glue Schema Registry if using Kafka/Kinesis.
  • Test backward compatibility before deploying Lambda updates.
  • Log unexpected fields so you can adapt your function when schema changes.

Summary:
Schema evolution is not native to Lambda — it’s handled by Glue Schema Registry (for Kafka/Kinesis), Glue Data Catalog (for S3/ETL), or EventBridge Schema Registry. Lambda just needs to be coded flexibly to handle evolving schemas without breaking.

🔹 Amazon Redshift

Parallelism

  • MPP (Massively Parallel Processing) architecture with leader + compute nodes.
  • COPY command loads data in parallel from S3.

Failures & Handling

  • Cluster snapshots and automated backups.
  • Query retries via WLM (Workload Management).

Performance

  • Use columnar storage & compression.
  • Sort keys & distribution keys tuned.
  • Spectrum for external S3 queries.

Scalability

  • RA3 nodes → separate compute & storage scaling.
  • Elastic Resize for cluster scaling.

Access & Permissions

  • IAM roles for S3/Glue access.
  • Redshift-specific users/roles for table-level permissions.

🔹 Amazon Athena

Parallelism

  • Queries execute in parallel across multiple nodes.
  • Parallel reads from S3 partitions.

Failures & Handling

  • Fails if schema mismatch, bad data, or permission issues.
  • Retry by fixing schema/data.

Performance

  • Partition pruning.
  • Columnar storage (Parquet/ORC).
  • Avoid small files (compaction).

Scalability

  • Serverless; scales automatically with query size.
  • Handles TB–PB of data depending on partitions.

Access & Permissions

  • Uses Glue Data Catalog or Hive Metastore.
  • IAM policies for S3 + Athena queries.

✅ This gives you a complete 360° view for each service in interview language.