Here are 30 AWS Glue quiz questions with their answers to test and reinforce your understanding of AWS Glue.

General AWS Glue Concepts
- What is AWS Glue primarily used for?
- Answer: AWS Glue is used for data preparation, ETL (Extract, Transform, Load), and data cataloging.
- Which service does AWS Glue use to store metadata information?
- Answer: AWS Glue Data Catalog.
- What programming languages does AWS Glue support writing ETL jobs?
- Answer: Python (PySpark) and Scala.
- What type of ETL jobs can you run with AWS Glue?
- Answer: Batch jobs and streaming ETL jobs.
- How does AWS Glue discover data?
- Answer: Through Crawlers that scan data sources and populate the Data Catalog with metadata.
Glue Crawlers
- What is the purpose of an AWS Glue Crawler?
- Answer: To scan data sources and automatically populate the Data Catalog with metadata.
- What types of data sources can AWS Glue Crawlers connect to?
- Answer: Amazon S3, RDS, Redshift, and other databases using JDBC.
- How does AWS Glue determine the schema of data during crawling?
- Answer: It infers the schema based on data format and structure in the source.
- Can a single AWS Glue Crawler crawl multiple data sources?
- Answer: Yes. A crawler can connect to multiple data stores and consolidate metadata.
- What is a Glue Classifier in the context of Glue Crawlers?
- Answer: A classifier recognizes and classifies data formats (like JSON, CSV, or custom formats) during the crawling.
AWS Glue Data Catalog
- What are AWS Glue tables?
- Answer: Metadata representations of data sources (e.g., data in S3, databases) stored in the Glue Data Catalog.
- What is a Glue Database in the Data Catalog?
- Answer: A logical grouping of tables within the Data Catalog.
- Can the AWS Glue Data Catalog be integrated with Amazon Athena?
- Answer: Yes, the Glue Data Catalog can serve as Athena’s metadata store.
- How does AWS Glue Data Catalog handle data versioning?
- Answer: The Data Catalog supports schema versioning for tables.
- What service permissions are required for cross-account access to the AWS Glue Data Catalog?
- Answer: Resource-based policies and IAM roles with cross-account permissions.
AWS Glue ETL Jobs
- What is an AWS Glue ETL job?
- Answer: An ETL job extracts, transforms, and loads data between sources and destinations.
- What is the default compute environment for AWS Glue ETL jobs?
- Answer: AWS Glue jobs run in a managed Apache Spark environment.
- How can you monitor the progress of AWS Glue ETL jobs?
- Answer: Checking the CloudWatch logs, Glue job logs, and Glue Console.
- What is the purpose of bookmarks in AWS Glue?
- Answer: To track processed data so that jobs only process new data in subsequent runs.
- How do you specify the type of worker nodes in AWS Glue?
- Answer: Select a predefined worker type (e.g., Standard, G.1X, G.2X) in the job configuration.
AWS Glue Studio and Console
- What is AWS Glue Studio?
- Answer: A visual interface for designing, running, and managing ETL jobs in AWS Glue.
- In Glue Studio, what feature allows the preview job’s data output before running it?
- Answer: The data preview feature.
- What does the AWS Glue Visual Job Editor enable users to do?
- Answer: Design ETL workflows using a drag-and-drop interface.
- Can you write custom code within AWS Glue Studio?
- Answer: Yes, Glue Studio allows for custom transformations using code snippets.
- What is Glue Studio’s primary benefit for non-technical users?
- Answer: It simplifies the ETL development process through a visual interface.
Glue Security, Networking, and Compliance
- What AWS feature allows Glue jobs to access data in private VPCs?
- Answer: AWS Glue VPC endpoints and network configurations.
- Which AWS Glue component allows you to control access at the table and column level?
- Answer: Lake Formation permissions can enforce fine-grained access controls.
- What type of encryption does AWS Glue support for data at rest?
- Answer: AWS Glue supports encryption at rest using AWS KMS.
- How can you ensure Glue jobs do not access unauthorized data sources?
- Answer: By restricting Glue job IAM roles and attaching policies to limit access.
- Can Glue ETL jobs write to cross-account S3 buckets?
- Answer: Yes, with the correct cross-account permissions and bucket policies.
These questions and answers cover essential aspects of AWS Glue, its architecture, and various features, providing a good basis for understanding its functionality and best practices.
References







You must be logged in to post a comment.