LiDAR data is widely used in autonomous driving, drone mapping, and 3D terrain modeling. In this guide, we build an end-to-end machine learning pipeline using AWS S3 for storage and Amazon SageMaker for training and inference.
This version includes Practical Practice Steps after every main section so users can perform tasks in real AWS environment.
🔷 Step 1: Upload LiDAR Data into S3
Organize S3 like:
lidar/raw/lidar/preprocessed/lidar/models/lidar/output/
Upload LiDAR .tif, .las, .laz files using AWS Console or CLI:
aws s3 cp ./lidar/ s3://your-bucket/lidar/raw/ --recursive
✔ PRACTICAL PRACTICE FOR USERS
1️⃣ Log in to AWS Console
2️⃣ Open S3 Service
3️⃣ Create a bucket named: your-lidar-project
4️⃣ Create folders:
lidar/rawlidar/preprocessedlidar/modelslidar/output
5️⃣ Upload sample LiDAR files (download from: USGS, OpenTopography, or Kaggle datasets)
🔷 Step 2: Launch SageMaker Studio
Open SageMaker → Studio → Launch.
Install libraries:
pip install laspy rasterio tensorflow numpy
✔ PRACTICAL PRACTICE FOR USERS
1️⃣ Open Amazon SageMaker
2️⃣ Click SageMaker Studio > Launch App
3️⃣ Create a new notebook
4️⃣ Run:
!pip install laspy rasterio matplotlib numpy tensorflow
5️⃣ Use S3 browser inside Studio to view uploaded LiDAR files
🔷 Step 3: Preprocess LiDAR Data
Example code to read LAS file:
import laspyimport numpy as nplas = laspy.read("sample.las")points = np.vstack((las.x, las.y, las.z)).Tpoints[:, 2] = (points[:, 2] - points[:, 2].min()) / (points[:, 2].ptp())np.save("processed.npy", points)
Upload processed data to S3:
aws s3 cp processed.npy s3://your-bucket/lidar/preprocessed/
✔ PRACTICAL PRACTICE FOR USERS
1️⃣ Download a .las LiDAR file
2️⃣ Place it in your Studio notebook directory
3️⃣ Run the preprocessing code
4️⃣ Visualize a small sample:
import matplotlib.pyplot as pltplt.scatter(points[:5000,0], points[:5000,1], c=points[:5000,2])plt.show()
5️⃣ Save processed file and upload to S3
🔷 Step 4: Train the Model Using SageMaker
Create TensorFlow Estimator:
from sagemaker.tensorflow import TensorFlowestimator = TensorFlow( entry_point='train.py', role='arn:aws:iam::123456789012:role/SageMakerRole', instance_count=1, instance_type='ml.p3.2xlarge', output_path='s3://your-bucket/lidar/models/')estimator.fit({'training': 's3://your-bucket/lidar/preprocessed/'})
✔ PRACTICAL PRACTICE FOR USERS
1️⃣ Create a new folder called training_code/
2️⃣ Add a file named train.py
3️⃣ Paste the TensorFlow model training code
4️⃣ Upload the folder to SageMaker Studio
5️⃣ Run the training job (it will automatically take data from S3)
🔷 Step 5: Save Model Artifacts to S3
Inside train.py, SageMaker automatically saves:
/opt/ml/model/model.tar.gz
And uploads it to:
s3://your-bucket/lidar/models/
✔ PRACTICAL PRACTICE FOR USERS
1️⃣ After training completes, check SageMaker → Training Jobs
2️⃣ Open your job → navigate to Artifacts
3️⃣ Verify model.tar.gz is uploaded to your S3 bucket
4️⃣ Download it to inspect structure
🔷 Step 6: Deploy Model to SageMaker Endpoint (Optional)
predictor = estimator.deploy( initial_instance_count=1, instance_type='ml.m5.large')
✔ PRACTICAL PRACTICE FOR USERS
1️⃣ Open SageMaker → Inference → Endpoints
2️⃣ Confirm your endpoint is active
3️⃣ Use the notebook to test:
output = predictor.predict(points[:100].tolist())print(output)
🔷 Step 7: Save Inference Output in S3
np.save("output.npy", output)
Upload:
aws s3 cp output.npy s3://your-bucket/lidar/output/
✔ PRACTICAL PRACTICE FOR USERS
1️⃣ Run inference on sample LiDAR points
2️⃣ Save prediction output locally
3️⃣ Upload output file into lidar/output/ folder in S3
4️⃣ Check S3 to confirm file upload
5️⃣ Visualize predictions in notebook
🔶 Conclusion
Using AWS S3 + SageMaker, you created:
- ✔ A structured data pipeline
- ✔ Preprocessed LiDAR point clouds
- ✔ A trained deep learning model
- ✔ Stored model artifacts in S3
- ✔ (Optional) Deployed a real-time inference endpoint
- ✔ Saved prediction output back to S3
This cloud-native workflow is ideal for production-grade LiDAR analytics.






Start Discussion