In AWS Glue, you can pass parameters to your jobs to control their behavior and customize their execution. Here’s a list of common parameters that can be used with AWS Glue jobs.

AWS Glue Job Parameters Examples
1. Job-Specific Parameters
These parameters are specific to the job configuration.
--job-name: The name of the job.--region: The AWS region where the job is being executed.--temp-dir: The directory in S3 to use for temporary files.--enable-continuous-cloudwatch-log: Whether to enable continuous logging to CloudWatch--job-language: The programming language of the job (Python or Scala).--glue-version: The version of AWS Glue (e.g., 1.0, 2.0, 3.0).--max-concurrent-runs: The maximum number of concurrent runs for the job.
2. Script Parameters
These parameters are often used to configure your ETL logic within the script.
--source-table: The name of the source table (could be a database table).--destination-path: The S3 path where the output data should be stored.--bookmark-keys: The keys to use for tracking processed data (used with bookmarks).--input-format: The input data format (e.g., CSV, JSON, Parquet).--output-format: The output data format.
3. AWS Glue Context Parameters
These parameters can be used to configure the Glue context.
--enable-glue-datacatalog: Enables the AWS Glue Data Catalog.--job-bookmark-option: Defines the job bookmark option (e.g.,job-bookmark-enable,job-bookmark-disable).--transformation-context: Context for transformations.
4. AWS Glue Connection Parameters
Parameters for configuring connections to data sources.
--connection-name: The name of the connection in the Glue Data Catalog.--connection-options: Connection options (like user, password, etc.).--jdbc-url: JDBC URL for the database connection.
5. Environment-Specific Parameters
Parameters to customize the job execution environment.
--temp-dir: Path to the temporary directory in S3.--iam-role: The IAM role that AWS Glue uses to access resources.
6. Execution Parameters
Parameters used during the execution of the job.
--retry-attempts: The number of retry attempts for the job.--timeout: Maximum execution time in minutes before the job is automatically terminated.
Example of Passing Parameters
You can pass parameters to an AWS Glue job in different ways. You can do so via the AWS Management Console, AWS CLI, or SDK. Here’s an example using the AWS CLI:
aws glue start-job-run --job-name my-glue-job \
--arguments '{"--source-table":"my_source_table", "--destination-path":"s3://my-bucket/output/"}'
Accessing Parameters in Glue Script
In your Glue script (e.g., Python), you can access these parameters using the getResolvedOptions function:
from awsglue.utils import getResolvedOptions
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'source-table', 'destination-path'])
source_table = args['source-table']
destination_path = args['destination-path']
AWS Glue parameters are flexible and configurable for your ETL jobs, letting you customize execution to meet your needs.
Understanding how to effectively use these parameters will help you build efficient and maintainable Glue jobs.
In AWS Glue, you can add additional libraries. These can include Python packages or JAR files. Use the --additional-python-modules parameter for Python jobs.
Use --extra-jars for Scala jobs to add third-party libraries not included in the standard AWS Glue environment.
Adding Additional Libraries
Here’s how you can specify additional libraries for your Glue job:
1. For Python Jobs
To include additional Python libraries, use the --additional-python-modules parameter. You can specify multiple modules separated by commas.
Example CLI Command:
aws glue start-job-run --job-name my-glue-job \
--arguments '{"--additional-python-modules":"pandas==1.2.3,requests==2.25.1"}'
Example Python Script: You can also access the parameters within your Glue script using getResolvedOptions:
from awsglue.utils import getResolvedOptions
import sys
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'additional-python-modules'])
additional_modules = args['additional-python-modules']
2. For Scala Jobs
To include additional JAR files, use the --extra-jars parameter.
Example CLI Command:
aws glue start-job-run --job-name my-glue-job \
--arguments '{"--extra-jars":"s3://my-bucket/path/to/my-library.jar"}'
Important Considerations
- Library Availability: Ensure the libraries are compatible with AWS Glue Python/Scala.
- Package Versioning: When specifying Python libraries, you can specify versions using
==,>=, etc., to avoid compatibility issues. - Size Limitations: Keep the total size of added libraries in mind, as there are limits on the script and additional libraries.
- Dependency Management: If your additional libraries have dependencies, ensure that they are resolved and included properly.
Adding additional libraries to your AWS Glue jobs is straightforward. Use --additional-python-modules for Python jobs. Use --extra-jars for Scala jobs. This allows you to extend the functionality of your ETL processes by leveraging external libraries.
To add additional Python libraries to your AWS Glue job, you can use the --additional-python-modules parameter. Examples include libraries such as pypg. Use this parameter when starting your Glue job. This parameter allows you to specify the libraries and their versions that you want to include.
Using --additional-python-modules
Here’s how you can include pypg along with other libraries in your AWS Glue job.
Example CLI Command
If you want to run a Glue job and include pypg along with other libraries like pandas and requests, you would use the AWS CLI as follows:
aws glue start-job-run --job-name my-glue-job \
--arguments '{"--additional-python-modules":"pypg==<version>,pandas==<version>,requests==<version>"}'
Replace <version> with the specific version number you want to use (e.g., 1.0.0 for pypg). If you want the latest version, you can omit the version specifier.
Example with Python Script
In your Glue script, you can access these parameters using getResolvedOptions:
from awsglue.utils import getResolvedOptions
import sys
# Retrieve job parameters
args = getResolvedOptions(sys.argv, ['JOB_NAME', 'additional-python-modules'])
# If you want to see the additional modules
additional_modules = args['additional-python-modules']
print(f"Additional Python Modules: {additional_modules}")
# Your ETL logic here
# For example, import the libraries if necessary
# Depending on the library, you might import it here or use it in your logic.
Important Notes
- Library Availability: Ensure that
pypgand any other libraries are compatible with the AWS Glue Python environment. - Dependencies: If
pypghas additional dependencies, ensure they are also included in the--additional-python-modulesparameter. - Testing: It’s a good idea to test your job in a development environment first. Then deploy it to production. This ensures that all libraries are correctly loaded.
Conclusion
By specifying --additional-python-modules, you can easily extend the functionality of your AWS Glue jobs with additional libraries like pypg. This is useful for incorporating specific features or capabilities not included in the default Glue environment.
References







You must be logged in to post a comment.