Introduction

One common challenge beginners face while working with AWS Glue is handling S3 paths that contain dynamically changing folder names.

This usually happens when files are written by applications, APIs, Lambda functions, or ETL jobs that generate hash-based or UUID-based folders for every run.

For example, your S3 path may look like this:

s3://my-bucket/output/
   8d3f1a92/result/
   c2b7e5f1/result/
   9af1b2c8/result/

Here, the middle folder (8d3f1a92, c2b7e5f1) changes dynamically every execution.

Because of this, Glue crawler may:

create multiple unwanted partitions
fail to create a clean table
treat the hash folder as part of the schema

In this blog, we’ll understand why this issue happens and how to fix it properly.

Understanding the Problem

Let’s say your actual folder structure is:

s3://project-bucket/results/
   123abc456/result/data.parquet
   789xyz111/result/data.parquet

The folder names:

123abc456
789xyz111

are dynamic variables

These may be:

request IDs
hash values
UUIDs
timestamp-based folders

The issue is that the crawler expects stable folder structures.

When you crawl this path:

s3://project-bucket/results/

Glue scans every level and may treat the dynamic folder as a partition column.

That makes querying difficult.

Why This Happens in AWS Glue

AWS Glue crawler works best with predictable folder hierarchies.

For example:

s3://bucket/sales/year=2026/month=04/

This is perfect because:

folder names are meaningful
partitions are stable

But dynamic folders like:

s3://bucket/output/9a8b7c/result/

have no business meaning.

So Glue doesn’t know what to do with them.

Best Solution: Use Wildcard Path Pattern

This is the easiest and most effective solution.

Instead of creating the table from the top-level path, create it using a wildcard.

Use this path:

s3://project-bucket/results/*/result/

The * means:

“ignore any folder name at this level”

This automatically handles changing folder names.

Step-by-Step Fix

Step 1: Identify the Stable Folder Pattern

Example actual paths:

s3://project-bucket/results/abc123/result/
s3://project-bucket/results/def456/result/
s3://project-bucket/results/ghi789/result/

Common stable pattern:

results/*/result/

Step 2: Create Table Using Athena

Open Amazon Athena and run:

CREATE EXTERNAL TABLE result_table (
    customer_id string,
    amount double,
    created_date string
)
STORED AS PARQUET
LOCATION 's3://project-bucket/results/*/result/';

This creates one clean table.

Step 3: Validate the Data

Run:

SELECT * FROM result_table LIMIT 10;

Now Athena reads data from all dynamic folders.

Alternative Fix: Create a Curated Folder

For production systems, this is the best practice.

Instead of querying dynamic folders directly, move data into a clean folder.

Example:

raw/
   abc123/result/
   xyz456/result/curated/
   result/

How to Do This Using Glue Job

Use a Glue PySpark job:

from pyspark.sql import SparkSessionspark = SparkSession.builder.getOrCreate()df = spark.read.parquet("s3://project-bucket/results/*/result/")df.write.mode("overwrite").parquet(
    "s3://project-bucket/curated/result/"
)

Now your crawler can safely crawl:

s3://project-bucket/curated/result/

This gives a stable table.

Why This Is Better

Benefits of curated path:

no dynamic folder confusion
easy crawler setup
faster queries
better performance
easier debugging

This is the standard approach used in real projects.

Common Beginner Mistake

Many beginners try this:

s3://project-bucket/results/

This causes:

messy partitions
incorrect schema
multiple tables

Avoid crawling unstable paths.

Best Practice Recommendation

Use this approach:

Raw Layer
   results/hash/result/Processing Layer
   Glue JobCurated Layer
   curated/result/

This is a professional data engineering design.

Final Conclusion

If your S3 path contains dynamically changing variables such as hashes, UUIDs, or run IDs, do not directly crawl the parent folder.

Instead, use one of these approaches:

Quick Fix

Use wildcard path:

*/result/

Production Fix

Move data to curated folder and crawl that path.

This keeps your Glue tables clean and query-ready.

Srini

Data Engineer with deep AI and Generative AI expertise, crafting high-performance data pipelines in PySpark, Databricks, and SQL. Skilled in Python, AWS, and Linux—building scalable, cloud-native solutions for smart applications.

Start Discussion Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Latest Posts

AWS Glue Crawler Issue with Dynamic S3 Folder Paths? Here’s the Complete Fix

April 19, 2026
How to Create a Generic Stored Procedure for KPI Calculation (SQL + AWS Lambda)

April 4, 2026
Unlocking the Power of Databricks Genie: A Comprehensive Guide

March 24, 2026
Secure S3 File Upload Using API Gateway, Lambda & PostgreSQL (Complete AWS Architecture Guide

March 14, 2026