AWS SageMaker + S3 Tutorial: Build, Train, and Deploy a LiDAR ML Model

LiDAR data is widely used in autonomous driving, drone mapping, and 3D terrain modeling. In this guide, we build an end-to-end machine learning pipeline using AWS S3 for storage and Amazon SageMaker for training and inference.

This version includes Practical Practice Steps after every main section so users can perform tasks in real AWS environment.

🔷 Step 1: Upload LiDAR Data into S3

Organize S3 like:

			
lidar/raw/
lidar/preprocessed/
lidar/models/
lidar/output/

Upload LiDAR .tif, .las, .laz files using AWS Console or CLI:

aws s3 cp ./lidar/ s3://your-bucket/lidar/raw/ --recursive

✔ PRACTICAL PRACTICE FOR USERS

1️⃣ Log in to AWS Console
2️⃣ Open S3 Service
3️⃣ Create a bucket named: your-lidar-project
4️⃣ Create folders:

lidar/raw
lidar/preprocessed
lidar/models
lidar/output
5️⃣ Upload sample LiDAR files (download from: USGS, OpenTopography, or Kaggle datasets)

🔷 Step 2: Launch SageMaker Studio

Open SageMaker → Studio → Launch.

Install libraries:

pip install laspy rasterio tensorflow numpy

✔ PRACTICAL PRACTICE FOR USERS

1️⃣ Open Amazon SageMaker
2️⃣ Click SageMaker Studio > Launch App
3️⃣ Create a new notebook
4️⃣ Run:

!pip install laspy rasterio matplotlib numpy tensorflow

5️⃣ Use S3 browser inside Studio to view uploaded LiDAR files

🔷 Step 3: Preprocess LiDAR Data

Example code to read LAS file:

			
import laspy
import numpy as np
las = laspy.read("sample.las")
points = np.vstack((las.x, las.y, las.z)).T
points[:, 2] = (points[:, 2] - points[:, 2].min()) / (points[:, 2].ptp())
np.save("processed.npy", points)

		

Upload processed data to S3:

aws s3 cp processed.npy s3://your-bucket/lidar/preprocessed/

✔ PRACTICAL PRACTICE FOR USERS

1️⃣ Download a .las LiDAR file
2️⃣ Place it in your Studio notebook directory
3️⃣ Run the preprocessing code
4️⃣ Visualize a small sample:

			
import matplotlib.pyplot as plt
plt.scatter(points[:5000,0], points[:5000,1], c=points[:5000,2])
plt.show()

5️⃣ Save processed file and upload to S3

🔷 Step 4: Train the Model Using SageMaker

Create TensorFlow Estimator:

			
from sagemaker.tensorflow import TensorFlow
estimator = TensorFlow(
    entry_point='train.py',
    role='arn:aws:iam::123456789012:role/SageMakerRole',
    instance_count=1,
    instance_type='ml.p3.2xlarge',
    output_path='s3://your-bucket/lidar/models/'
)
estimator.fit({'training': 's3://your-bucket/lidar/preprocessed/'})

		

✔ PRACTICAL PRACTICE FOR USERS

1️⃣ Create a new folder called training_code/
2️⃣ Add a file named train.py
3️⃣ Paste the TensorFlow model training code
4️⃣ Upload the folder to SageMaker Studio
5️⃣ Run the training job (it will automatically take data from S3)

🔷 Step 5: Save Model Artifacts to S3

Inside train.py, SageMaker automatically saves:

/opt/ml/model/model.tar.gz

And uploads it to:

s3://your-bucket/lidar/models/

✔ PRACTICAL PRACTICE FOR USERS

1️⃣ After training completes, check SageMaker → Training Jobs
2️⃣ Open your job → navigate to Artifacts
3️⃣ Verify model.tar.gz is uploaded to your S3 bucket
4️⃣ Download it to inspect structure

🔷 Step 6: Deploy Model to SageMaker Endpoint (Optional)

			
predictor = estimator.deploy(
    initial_instance_count=1,
    instance_type='ml.m5.large'
)

✔ PRACTICAL PRACTICE FOR USERS

1️⃣ Open SageMaker → Inference → Endpoints
2️⃣ Confirm your endpoint is active
3️⃣ Use the notebook to test:

			
output = predictor.predict(points[:100].tolist())
print(output

🔷 Step 7: Save Inference Output in S3

np.save("output.npy", output)

Upload:

aws s3 cp output.npy s3://your-bucket/lidar/output/

✔ PRACTICAL PRACTICE FOR USERS

1️⃣ Run inference on sample LiDAR points
2️⃣ Save prediction output locally
3️⃣ Upload output file into lidar/output/ folder in S3
4️⃣ Check S3 to confirm file upload
5️⃣ Visualize predictions in notebook

🔶 Conclusion

Using AWS S3 + SageMaker, you created:

✔ A structured data pipeline
✔ Preprocessed LiDAR point clouds
✔ A trained deep learning model
✔ Stored model artifacts in S3
✔ (Optional) Deployed a real-time inference endpoint
✔ Saved prediction output back to S3

This cloud-native workflow is ideal for production-grade LiDAR analytics.

Srini

Data Engineer with deep AI and Generative AI expertise, crafting high-performance data pipelines in PySpark, Databricks, and SQL. Skilled in Python, AWS, and Linux—building scalable, cloud-native solutions for smart applications.