New Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse

Amazon DynamoDB, a serverless NoSQL database, has been the go-to solution for more than a million customers for building low-latency, high-scale applications. As data grows, organizations are constantly looking for ways to extract valuable insights from operational data, which is often stored in DynamoDB. However, to make the most of this data in Amazon DynamoDB for analytics and machine learning (ML) use cases, customers often create their own data pipelines—a time-consuming infrastructure task that adds little unique value to their core business.

Starting today, you can use Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse to run analytics and ML jobs in just a few clicks without consuming DynamoDB table capacity. Amazon SageMaker Lakehouse unifies all your data across Amazon S3 data lakes and Amazon Redshift data warehouses, helping you build powerful analytics and AI/ML applications on a single copy of data.

Zero-ETL is a set of integrations that eliminate or minimize the need to build ETL pipelines. This zero-ETL integration reduces the complexity of the engineering effort required to build and maintain data pipelines, benefiting users running analytics and ML workloads on operational data in Amazon DynamoDB without impacting production workflows.

Let’s get started
For the following demo, I need to set up zero ETL integration for my data in Amazon DynamoDB with an Amazon Simple Storage Service data lake managed by Amazon SageMaker Lakehouse. Must be completed before setting up a null ETL integration. If you want to learn more about how to set up, check out this Amazon DynamoDB documentation page.

When all prerequisites are met, I can start this integration. I go to the AWS Glue console and select Zero-ETL integration under Data integration and ETL. Then I will choose Create an integration with zero ETL.

Here I have options for selecting the data source. i choose Amazon DynamoDB and choose Other.

Next I need to configure the source and destination details. IN Resource details section, I’ll select my Amazon DynamoDB table. IN Destination details in the AWS Glue Data Catalog section I specify the S3 bucket I set up.

To set up this integration, I need an IAM role that grants AWS Glue the necessary permissions. For instructions on configuring IAM permissions, see the Amazon DynamoDB documentation page. Also, if I haven’t configured resource policies for my AWS glue data catalog, I can select Fix it for me automatically add the required resource policies.

Here I have options to configure the output. Below Data distributionI can either use DynamoDB table keys for partitioning or specify custom partition keys. After the configuration is complete, I select Other.

Because I choose Fix it for me checkbox, I have to review the required changes and select Continue before I can move on to the next step.

On the next page I have the option to set data encryption. I can use AWS Key Management Service (AWS KMS) or my own encryption key. Then I give the integration a name and select it Other.

In the last step I have to check the configurations. When I’m happy, I choose Other create zero ETL integration.

Once the initial data ingestion is complete, my integration with zero ETL will be ready to go. Completion time varies depending on the size of my source DynamoDB table.

If I navigate to Tables under Data catalog in the left navigation bar, I can view more details including Plan. Under the hood, this zero-ETL integration uses Apache Iceberg to transform the related data format and structure in my DynamoDB data to Amazon S3.

I can finally say that all my data is available in my S3 bucket.

This integration with zero ETL greatly reduces the complexity and operational burden of moving data, allowing me to focus on gaining insights rather than managing pipelines.

Available now
This new zero-ETL capability is available in the following AWS regions: US East (N. Virginia, Ohio), US West (Oregon), Asia Pacific (Hong Kong, Singapore, Sydney, Tokyo), Europe (Frankfurt, Ireland, Stockholm )).

Explore how to streamline data analytics workflows by integrating Amazon DynamoDB zero-ETL with Amazon SageMaker Lakehouse. For more information on how to get started, see the Amazon DynamoDB documentation page.

Happy building!
— Donnie

New Amazon DynamoDB zero-ETL integration with Amazon SageMaker Lakehouse | Amazon Web Services

Leave a Comment Cancel reply