Data Lake Houses

#aws

I am writing about the emerging connections between cloud data warehouses and cloud data lakes. This is called a 'data lake house' pattern (concept using AWS services shown below).

Over the years I've built many AWS Redshift data warehouses and a couple of AWS S3 data lakes. What about you? What types of AWS solutions are you building for data analytics lately?

I am particularly interested in communicating with those of you who are using the AWS Lake Formation services. These 'sit on top of' AWS S3 adding federated security and other key services to an AWS Data Lake. When I built data lakes, AWS Lake Formation wasn't yet available, so my teams had to build these control services manually.

I've also been looking at [AWS Glue Data Brew(https://aws.amazon.com/glue/features/databrew/). The data profiling looks to be very useful (shown below).

I am very curious about Data Brew's scalability and cost in production vs. more traditional ETL and/or Hadoop/Spark batch transform methods.

I'll be posting on this topic over the next few month's. If this is of interest to you, follow me on twitter as well. Let's connect!

Top comments (3)

Andrew Brown 🇨🇦 AWS Heroes • Apr 26 '21

Looking forward to hearing more. My knowledge of data warehousing is limited to certification knowledge until I have more real-world use-cases I need to put into practice.

Top2World • Aug 31 '21

Nice post :)
You can also check out Top 10 Most Expensive Houses in the World if you are intrested.

Dendi Handian • Sep 6 '22

Gonna read this later