Introduction

One common challenge beginners face while working with AWS Glue is handling S3 paths that contain dynamically changing folder names.

This usually happens when files are written by applications, APIs, Lambda functions, or ETL jobs that generate hash-based or UUID-based folders for every run.

For example, your S3 path may look like this:

s3://my-bucket/output/
8d3f1a92/result/
c2b7e5f1/result/
9af1b2c8/result/

Here, the middle folder (8d3f1a92, c2b7e5f1) changes dynamically every execution.

Because of this, Glue crawler may:

  • create multiple unwanted partitions
  • fail to create a clean table
  • treat the hash folder as part of the schema

In this blog, we’ll understand why this issue happens and how to fix it properly.


Understanding the Problem

Let’s say your actual folder structure is:

s3://project-bucket/results/
123abc456/result/data.parquet
789xyz111/result/data.parquet

The folder names:

  • 123abc456
  • 789xyz111

are dynamic variables

These may be:

  • request IDs
  • hash values
  • UUIDs
  • timestamp-based folders

The issue is that the crawler expects stable folder structures.

When you crawl this path:

s3://project-bucket/results/

Glue scans every level and may treat the dynamic folder as a partition column.

That makes querying difficult.


Why This Happens in AWS Glue

AWS Glue crawler works best with predictable folder hierarchies.

For example:

s3://bucket/sales/year=2026/month=04/

This is perfect because:

  • folder names are meaningful
  • partitions are stable

But dynamic folders like:

s3://bucket/output/9a8b7c/result/

have no business meaning.

So Glue doesn’t know what to do with them.


Best Solution: Use Wildcard Path Pattern

This is the easiest and most effective solution.

Instead of creating the table from the top-level path, create it using a wildcard.

Use this path:

s3://project-bucket/results/*/result/

The * means:

“ignore any folder name at this level”

This automatically handles changing folder names.


Step-by-Step Fix

Step 1: Identify the Stable Folder Pattern

Example actual paths:

s3://project-bucket/results/abc123/result/
s3://project-bucket/results/def456/result/
s3://project-bucket/results/ghi789/result/

Common stable pattern:

results/*/result/

Step 2: Create Table Using Athena

Open Amazon Athena and run:

CREATE EXTERNAL TABLE result_table (
customer_id string,
amount double,
created_date string
)
STORED AS PARQUET
LOCATION 's3://project-bucket/results/*/result/';

This creates one clean table.


Step 3: Validate the Data

Run:

SELECT * FROM result_table LIMIT 10;

Now Athena reads data from all dynamic folders.


Alternative Fix: Create a Curated Folder

For production systems, this is the best practice.

Instead of querying dynamic folders directly, move data into a clean folder.

Example:

raw/
abc123/result/
xyz456/result/curated/
result/

How to Do This Using Glue Job

Use a Glue PySpark job:

from pyspark.sql import SparkSessionspark = SparkSession.builder.getOrCreate()df = spark.read.parquet("s3://project-bucket/results/*/result/")df.write.mode("overwrite").parquet(
"s3://project-bucket/curated/result/"
)

Now your crawler can safely crawl:

s3://project-bucket/curated/result/

This gives a stable table.


Why This Is Better

Benefits of curated path:

  • no dynamic folder confusion
  • easy crawler setup
  • faster queries
  • better performance
  • easier debugging

This is the standard approach used in real projects.


Common Beginner Mistake

Many beginners try this:

s3://project-bucket/results/

This causes:

  • messy partitions
  • incorrect schema
  • multiple tables

Avoid crawling unstable paths.


Best Practice Recommendation

Use this approach:

Raw Layer
results/hash/result/Processing Layer
Glue JobCurated Layer
curated/result/

This is a professional data engineering design.


Final Conclusion

If your S3 path contains dynamically changing variables such as hashes, UUIDs, or run IDs, do not directly crawl the parent folder.

Instead, use one of these approaches:

Quick Fix

Use wildcard path:

*/result/

Production Fix

Move data to curated folder and crawl that path.

This keeps your Glue tables clean and query-ready.

Start Discussion

This site uses Akismet to reduce spam. Learn how your comment data is processed.