DEV Community

Cover image for Data Track Highlights AWS Re:Invent

Data Track Highlights AWS Re:Invent

What a brilliant re:Invent2020 with so many AWSome announcements for all the builders to innovate and move at pace. I am Yogesh Sharma, AWS Community Builder and here highlighting important highlights in the area of Data.

Let’s go see highlights:

Glue

Glue entered Serverless World
Alt Text

DataBrew
It helps data preparation easier and solves the challenge that data scientists using majority of their time doing data prep. Sounds great isn't it.

Elastic Views (Preview)
Builders can query source data from RDS, Aurora, and DynamoDB using SQL continuously in a materialized view to a variety of destinations including Redshift, S3 and Elasticsearch Service.

Schema Registry
It enables you to validate and control the evolution of streaming data using registered Apache Avro schemas, at no additional charge. It provide seamless integration with MSK, Kinesis and Lambda.

Lake Formation

Alt Text

Lake Formation is a set of best practices that builders can adopt to create data lake on AWS including AWS Well Architect Principles in mind. Lake Formation helps customers to build secure data lakes in the cloud in days instead of months. Lake Formation collects and catalogs data from databases and object storage, moves the data into an Amazon S3 data lake, cleans and classifies data using ML algorithms, and secures access to sensitive data.

AWS Lake Formation transactions, row-level security, and acceleration preview
Transactions, Row-level Security, and Acceleration features to the data lake.

Amazon HealthLake to make sense of health data
Amazon HealthLake removes the heavy lifting of organizing, indexing, and structuring patient information, to provide a complete view of the health of individual patients and entire patient populations in a secure, compliant, and auditable manner.

Redshift

Alt Text

Automatic Table Optimization
Self-tuning/optimizing capability that helps you achieve the performance benefits of sort and distribution keys without manual effort.

Amazon Redshift data sharing (preview)
With Amazon Redshift data sharing, you can rapidly onboard new analytics workloads and provision them with flexible compute resources to meet their workload-specific performance SLAs while allowing them to access common data.

Aqua for Redshift (Preview)
AQUA is a high-speed cache on top of Redshift Managed Storage that can scale out and process data in parallel across many AQUA nodes. AQUA uses AWS designed analytics processors that dramatically accelerate data compression, encryption, and data processing on queries that scan, filter, and aggregate large data sets.

Amazon Redshift ML(Preview)
With Amazon Redshift ML powered by Amazon SageMaker, you can use SQL statements to create and train machine learning models from your data in Amazon Redshift and then use these models directly in your queries and reports. Amazon Redshift ML automatically discovers and tunes the best model based on the training data using Amazon SageMaker Autopilot.

Native JSON support(Preview)

EMR

Amazon EMR is the industry-leading cloud big data platform for processing vast amounts of data using open source tools such as Apache Spark, Apache Hive, Apache HBase, Apache Flink, Apache Hudi, and Presto.

EMR Studio
Integrated development environment (IDE) that makes it easy for builders/data scientists to develop, visualize, and debug data engineering and data science applications.

EMR on EKS
With a few clicks in the Amazon EMR console, you can choose a big data framework and deploy an EMR workload to Amazon EKS. EMR automatically packages the workload into a container, and provides pre-built connectors for integrating with other AWS services. EMR then deploys the container on the EKS cluster, and manages scaling, logging, and monitoring of that workload.

Graviton2 instances
Graviton2 with EMR with up to 30% lower cost and up to 15% improved performance.

AWS Managed Workflows for Apache Airflow (MWAA)

Alt Text
Developers and data engineers use Apache Airflow to manage workflows as scripts, monitor them via the user interface (UI), and extend their functionality through a set of powerful plugins. However, to use Apache Airflow, they need to install, maintain, and scale it manually. Now AWS solves this by offering MWAA for developers and data engineers to build and manage their workflows in the cloud without worrying about managing and scaling their Airflow platform's infrastructure.

Keep learning and shining!

Discussion (0)