Crack Your AWS Interview: Key Questions on Lambda Scalability, Glue Jobs, and IAM

Preparing for an AWS interview? 🚀 Here’s a collection of frequently asked AWS questions with clear and concise answers. This covers Lambda, Glue, S3, PySpark, IAM, and SQL queries that often come up in interviews.

1. Parallel Processing in Lambda

Q: How can you achieve parallel processing in AWS Lambda?
A:

Lambda runs single-threaded per invocation. To process in parallel:
- Use multiple concurrent executions (e.g., via S3 events, SQS, or Kinesis).
- Enable parallelism within the function using async or multiprocessing libraries.
Best practice: Use SQS + Lambda for distributed event-driven workloads.

2. Boto3 Client vs Resource

Q: What’s the difference between boto3.client() and boto3.resource()?
A:

client → Low-level API mapping directly to AWS service calls. Returns JSON/dictionaries.
resource → High-level, object-oriented API. Provides Python objects for easier manipulation.

Example:

import boto3

# Create S3 client
s3 = boto3.client('s3')
print(s3.list_buckets())

# Create S3 resource
s3_res = boto3.resource('s3')
for bucket in s3_res.buckets.all():
    print(bucket.name)

3. Adding External Libraries to Lambda

Q: How do you add external dependencies to AWS Lambda?
A:

Lambda Layer → Package dependencies separately and attach to functions.
Deployment Package → Zip function code + dependencies and upload.
Container Image → Package code and libraries in a Docker image (up to 10 GB).

4. AWS Glue PySpark Job Failures

Q: What are common reasons Glue PySpark jobs fail?
A:

Schema mismatch between source and target.
Out-of-memory (too many shuffles).
Incorrect DynamicFrame ↔ DataFrame conversions.
Job bookmarks causing duplicates or skips.
JDBC partitioning or timeout issues.
Missing partition pruning / pushdown.

✅ Debug with CloudWatch logs, enable Spark UI, and optimize with resolveChoice, dropNullFields, partition pruning.

5. Get Day from Date in PySpark

from pyspark.sql
import functions as F

df = df.withColumn("day", F.dayofmonth(F.col("date_col")))
df = df.withColumn("day_name", F.date_format(F.col("date_col"), "EEEE"))

dayofmonth() → 1–31
dayofweek() → 1–7 (Sunday–Saturday)
date_format() → Custom formatting

6. Cluster Configuration in AWS Glue Spark

Q: Where do you configure Glue Spark job resources?
A:

In Glue Job settings:
- Choose Worker type (Standard, G.1X, G.2X, Z.2X).
- Specify number of workers.
- Add Spark configs via --conf in job parameters.

Example: 10 workers of G.1X = 40 DPUs.

7. Scalability in Lambda

Q: How does Lambda scale, and how do you control it?
A:

Lambda scales automatically per event.
Controls:
- Reserved concurrency → Guarantees capacity.
- Provisioned concurrency → Pre-warms functions to reduce cold starts.
- Account concurrency limits (default = 1000).
Use SQS or Kinesis to buffer bursts and Step Functions for orchestration.

8. DPU vs Workers in AWS Glue

DPU (Data Processing Unit): Legacy unit of compute. 1 DPU = 4 vCPU + 16 GB memory.
Workers: Modern abstraction (Glue 2.0+). You choose worker type and count.

Worker Type	vCPUs	Memory (GB)	Equivalent DPUs
Standard	2	4	0.5 DPU
G.1X	4	16	1 DPU
G.2X	8	32	2 DPUs
G.025X	2	2	0.25 DPU
Z.2X	8	32	2 DPUs

✅ Workers give better flexibility over CPU and memory vs DPUs.

9. S3 Bucket – Access Denied Issues

Q: Why do I still get Access Denied even if IAM role allows S3?
A: Other possible reasons:

Bucket policy has an explicit Deny.
Block Public Access enabled.
ACLs / Object Ownership conflicts.
KMS key policy doesn’t allow decryption.
VPC Endpoint policy restrictions.
SCP (Service Control Policy) in Organizations.
Cross-account → Needs IAM + bucket policy permissions.

10. SQL – Departments with > 5 Employees (Using CTE)

Tables:

employee(emp_id, emp_name, dept_id)
department(dept_id, dept_name)

Query:

WITH emp_count AS (
SELECT dept_id, COUNT(*) AS emp_cnt
FROM employee
GROUPBY dept_id

)

SELECT d.dept_id, d.dept_name, e.emp_cnt
FROM emp_count e
JOIN department d ON e.dept_id = d.dept_id
WHERE e.emp_cnt > 5;

✅ This returns departments that have more than 5 employees.

11. Palindrom for num

def is_palindrome(num: int) -> bool:
    """
    Check if a number is a palindrome.
    Returns True if palindrome, False otherwise.
    """
    return str(num) == str(num)[::-1]


# Call the function
number = 121
if is_palindrome(number):
    print(f"{number} is a palindrome")
else:
    print(f"{number} is not a palindrome")

Final Thoughts

These questions cover real-world AWS interview scenarios across Lambda, Glue, IAM, and PySpark. When answering, focus on:

Explaining why something happens (not just how).
Giving short practical examples.
Highlighting best practices (like using SQS for parallel Lambda scaling, or partition pruning in Glue).

Srini

Data Engineer with deep AI and Generative AI expertise, crafting high-performance data pipelines in PySpark, Databricks, and SQL. Skilled in Python, AWS, and Linux—building scalable, cloud-native solutions for smart applications.