How to Use IAM authentication for RDS PostgreSQL with Glue ETL Jobs
Amazon RDS enables you to use AWS Identity and Access Management (IAM) to manage database access for Amazon RDS for PostgreSQL DB instances. It’s possible use the IAM authentication with Glue connections but it is not documented well, so I will demostrate how you can do it. In your Glue job, you can import boto3 library to call “generate_db_auth_token” method to generate a token and use it when connecting.
Here’s a simple Glue ETL script I wrote for testing. It connects to PostgreSQL using IAM authentication, reads data from a table and writes the output to S3:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
import sys from awsglue.transforms import * from awsglue.utils import getResolvedOptions from pyspark.context import SparkContext from awsglue.context import GlueContext from awsglue.job import Job import boto3 DBURL = "endpointofmydatabase.eu-west-1.rds.amazonaws.com" DBUSER = "mydbuser" client = boto3.client("rds","eu-west-1") TOKEN = client.generate_db_auth_token( DBURL, 5432, DBUSER, "eu-west-1") args = getResolvedOptions(sys.argv, ['JOB_NAME']) sc = SparkContext() glueContext = GlueContext(sc) spark = glueContext.spark_session job = Job(glueContext) job.init(args['JOB_NAME'], args) datasource0 = glueContext.create_dynamic_frame_from_options( connection_type = "postgresql", connection_options = {"url": "jdbc:postgresql://" + DBURL + ":5432/mydatabase", "user": DBUSER, "password": TOKEN ,"dbtable": "public.testtable" }, transformation_ctx = "datasource0" ) datasink1 = glueContext.write_dynamic_frame.from_options(frame = datasource0, connection_type = "s3", connection_options = {"path": "s3://mybucketname/output"}, format = "json", transformation_ctx = "datasink1") job.commit() |