Amazon SageMaker Lakehouse’s integrated access control is now available in Amazon Athena federated queries | Amazon Web Services

Amazon SageMaker Lakehouse's integrated access control is now available in Amazon Athena federated queries | Amazon Web Services

Voiced by Polly

Today, we announced the next generation of Amazon SageMaker, a unified platform for data, analytics and artificial intelligence that combines the widely adopted AWS machine learning and analytics capabilities. At its heart is SageMaker Uniļ¬ed Studio (preview), a single data and AI development environment for exploration, data preparation and integration, big data processing, rapid SQL analysis, model development and training, and generative AI application development. This announcement includes Amazon SageMaker Lakehouse, a feature that unifies data across data lakes and data warehouses to help you build powerful analytics and artificial intelligence and machine learning (AI/ML) applications on a single copy of data.

In addition to these launches, I’m excited to announce data catalog and permissions capabilities in Amazon SageMaker Lakehouse to help you centrally connect to, discover, and manage permissions to data sources.

Organizations today store data across multiple systems to optimize for specific use cases and scaling requirements. This often results in data being stored across data lakes, data warehouses, databases and streaming services. Analysts and data scientists face challenges when trying to connect and analyze data from these disparate sources. They must set up specialized connectors for each data source, manage multiple access policies, and often resort to copying data, resulting in increased costs and potential data inconsistencies.

The new feature addresses these challenges by simplifying the process of connecting to popular data sources, cataloging them, applying permissions, and making the data available for analysis through SageMaker Lakehouse and Amazon Athena. You can use AWS Glue Data Catalog as a single metadata repository for all data sources, regardless of location. This provides a centralized view of all available data.

Connections to data sources are created once and can be reused, so you don’t have to set up connections repeatedly. Once you connect to data sources, databases and tables are automatically cataloged and registered with AWS Lake Formation. After cataloging, you grant access to these databases and tables to data analysts, so they don’t have to go through separate steps to connect to each data source and don’t have to know the built-in secrets of the data source.Lake Formation permissions can be used to define fine-grained access control policies (FGACs) across data lakes, data warehouses and online transaction processing (OLTP) data sources, ensuring consistent enforcement when querying using Athena. Data remains in its original location, eliminating the need for costly and time-consuming data transfers or duplication. In the data catalog, you can create or reuse existing connections to data sources and configure built-in connectors for multiple data sources, including Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Aurora, Amazon DynamoDB (preview), Google BigQuery, and more. .

We begin the integration between Athena and Lake Formation
To demonstrate this capability, I use a preconfigured environment that includes Amazon DynamoDB as a data source. The environment is set up with appropriate tables and data that effectively demonstrate the ability. I’m using SageMaker Unified Studio (preview) interface for this sample.

To start, I go to SageMaker Unified Studio (preview) through the Amazon SageMaker domain. Here you can create and manage projects that serve as shared workspaces. These projects allow team members to collaborate, work with data, and develop ML models together. Creating a project automatically sets up AWS Glue Data Catalog databases, creates a catalog for Redshift Managed Storage (RMS) data, and provides the necessary permissions.

To manage projects, you can either view the full list of existing projects by selecting Browse all projectsor you can create a new project by selecting Create a project. I use two existing projects: sales-group, where administrators have full access rights to all data, and marketing-project, where analysts work with limited access to data. This setting effectively illustrates the contrast between administrator access level and limited user access.

In this step, I set up a federated catalog for the target data source, which is Amazon DynamoDB. I’m going to Data in the left navigation bar and select + (more) to sign Add dates. i choose Add connection and then i choose Other.

i choose Amazon DynamoDB and choose Other.

I enter data and select Add dates. I now have a federated Amazon DynamoDB catalog created in SageMaker Lakehouse. This is where the administrator grants you access using resource policies. I have already configured resource policies in this environment. Now I’ll show you how fine-grained access control works in SageMaker Unified Studio (preview).

I’ll start with a selection sales group project where administrators maintain and have full access to customer data. This dataset contains fields such as zip codes, customer IDs, and phone numbers. To analyze this data I can run queries using Query with Athena.

When choosing Query with AthenaThe query editor starts automatically and provides a workspace where I can compose and run SQL queries against Lakehouse. This integrated query environment offers seamless data exploration and analysis.

In the second part I turn to marketing project show what the analyst experiences when they run their queries and observe that fine-grained access control permissions are in place and working.

In the second part, I demonstrate the analyst’s perspective by moving to marketing project environment. This helps us verify that fine-grained access control permissions are properly implemented and effectively restrict access to data as intended. Through sample queries, we can observe how analysts interact with data while being subject to security controls in place.

Help Query with Athena option, I execute a SELECT statement on the table to verify access control. The results confirm that I can only browse as expected zip code and cust_id columns while phone the column remains restricted based on configured permissions.

With these new data catalog features and permissions in Amazon SageMaker Lakehouse, you can now streamline your data operations, improve security management, and accelerate AI/ML development while maintaining data integrity and compliance across your data ecosystem.

Now available
Data catalog and permissions in Amazon SageMaker Lakehouse simplify interactive analytics through federated querying when connecting to a unified catalog and data catalog permissions across multiple data sources, providing a single place to define and enforce granular security policies across data lakes, data warehouses, and OLTP data sources for high performance queries.

You can use this capability in the AWS regions – US East (N. Virginia), US West (Oregon), US East (Ohio), Europe (Ireland), and Asia Pacific (Tokyo).

To get started with this new feature, visit the Amazon SageMaker Lakehouse documentation.

-Esra

Leave a Reply

Your email address will not be published. Required fields are marked *