Here are 30 AWS Glue quiz questions with their answers to test and reinforce your understanding of AWS Glue.

30 AWS Glue Quiz Questions

General AWS Glue Concepts

  1. What is AWS Glue primarily used for?
    • Answer: AWS Glue is used for data preparation, ETL (Extract, Transform, Load), and data cataloging.
  2. Which service does AWS Glue use to store metadata information?
    • Answer: AWS Glue Data Catalog.
  3. What programming languages does AWS Glue support writing ETL jobs?
    • Answer: Python (PySpark) and Scala.
  4. What type of ETL jobs can you run with AWS Glue?
    • Answer: Batch jobs and streaming ETL jobs.
  5. How does AWS Glue discover data?
    • Answer: Through Crawlers that scan data sources and populate the Data Catalog with metadata.

Glue Crawlers

  1. What is the purpose of an AWS Glue Crawler?
    • Answer: To scan data sources and automatically populate the Data Catalog with metadata.
  2. What types of data sources can AWS Glue Crawlers connect to?
    • Answer: Amazon S3, RDS, Redshift, and other databases using JDBC.
  3. How does AWS Glue determine the schema of data during crawling?
    • Answer: It infers the schema based on data format and structure in the source.
  4. Can a single AWS Glue Crawler crawl multiple data sources?
    • Answer: Yes. A crawler can connect to multiple data stores and consolidate metadata.
  5. What is a Glue Classifier in the context of Glue Crawlers?
    • Answer: A classifier recognizes and classifies data formats (like JSON, CSV, or custom formats) during the crawling.

AWS Glue Data Catalog

  1. What are AWS Glue tables?
    • Answer: Metadata representations of data sources (e.g., data in S3, databases) stored in the Glue Data Catalog.
  2. What is a Glue Database in the Data Catalog?
    • Answer: A logical grouping of tables within the Data Catalog.
  3. Can the AWS Glue Data Catalog be integrated with Amazon Athena?
    • Answer: Yes, the Glue Data Catalog can serve as Athena’s metadata store.
  4. How does AWS Glue Data Catalog handle data versioning?
    • Answer: The Data Catalog supports schema versioning for tables.
  5. What service permissions are required for cross-account access to the AWS Glue Data Catalog?
    • Answer: Resource-based policies and IAM roles with cross-account permissions.

AWS Glue ETL Jobs

  1. What is an AWS Glue ETL job?
    • Answer: An ETL job extracts, transforms, and loads data between sources and destinations.
  2. What is the default compute environment for AWS Glue ETL jobs?
    • Answer: AWS Glue jobs run in a managed Apache Spark environment.
  3. How can you monitor the progress of AWS Glue ETL jobs?
    • Answer: Checking the CloudWatch logs, Glue job logs, and Glue Console.
  4. What is the purpose of bookmarks in AWS Glue?
    • Answer: To track processed data so that jobs only process new data in subsequent runs.
  5. How do you specify the type of worker nodes in AWS Glue?
    • Answer: Select a predefined worker type (e.g., Standard, G.1X, G.2X) in the job configuration.

AWS Glue Studio and Console

  1. What is AWS Glue Studio?
    • Answer: A visual interface for designing, running, and managing ETL jobs in AWS Glue.
  2. In Glue Studio, what feature allows the preview job’s data output before running it?
    • Answer: The data preview feature.
  3. What does the AWS Glue Visual Job Editor enable users to do?
    • Answer: Design ETL workflows using a drag-and-drop interface.
  4. Can you write custom code within AWS Glue Studio?
    • Answer: Yes, Glue Studio allows for custom transformations using code snippets.
  5. What is Glue Studio’s primary benefit for non-technical users?
    • Answer: It simplifies the ETL development process through a visual interface.

Glue Security, Networking, and Compliance

  1. What AWS feature allows Glue jobs to access data in private VPCs?
    • Answer: AWS Glue VPC endpoints and network configurations.
  2. Which AWS Glue component allows you to control access at the table and column level?
    • Answer: Lake Formation permissions can enforce fine-grained access controls.
  3. What type of encryption does AWS Glue support for data at rest?
    • Answer: AWS Glue supports encryption at rest using AWS KMS.
  4. How can you ensure Glue jobs do not access unauthorized data sources?
    • Answer: By restricting Glue job IAM roles and attaching policies to limit access.
  5. Can Glue ETL jobs write to cross-account S3 buckets?
    • Answer: Yes, with the correct cross-account permissions and bucket policies.

These questions and answers cover essential aspects of AWS Glue, its architecture, and various features, providing a good basis for understanding its functionality and best practices.

References