DEV Community

Cover image for AWS Data Lake
Sabiha Ali
Sabiha Ali

Posted on • Originally published at Medium

AWS Data Lake

If you want your organization to successfully generate business value from their data, you will be doing some kind of analytics with your data like machine learning over files like log files and click streams and social media or Internet connected devices. Traditional data storage and analytic tools can no longer provide the agility and flexibility required to deliver relevant business insights. That’s why many organizations are shifting to a data lake architecture.

Let us assume you have an organization, which is receiving data from many sources giving you some examples let’s say you are getting some data from on premises you’re also getting some data from your website clicks you are also getting some CSV files to analyze, and you also want to do some machine learning analysis on some of this data.

All this data is essential to be stored for analysis, some of this data could be purged after some time and some must be retained for longer period of time.

Data lakes can be called as a centralized repository that will allow you to store all your structured data and your unstructured data at any scale.

Now can we say that this is the same as data warehouse. that would be a good question, but a data warehouse is a database which is optimized to analyze relational data which is coming in the line of business applications. It already has a data structure which is optimized for fast SQL queries whereas a data lake not only stores relational data but also non-relational data, the schema is never defined when the data is captured.

AWS uses S3 as your data lake foundation. This will also eliminate server management.

When we build a data lake on Amazon S3 we can pair it up or integrate it with other native AWS services to run

· big data analytics

· artificial intelligence

· machine learning

· high performance computing

· media processing applications etc…

To gain the insights from the unstructured data we can couple it with services like

AWS lake formation and AWS glue to simplify the creation of the data lake itself and also to simplify the analysis of the curated data in the data lake

Also services like Amazon Glue, Amazon EMR, Amazon Athena also makes it very easy to query your data lake directly.

Happy Learning!!!!!!

By

Sabiha Ali, Solutions Architect, ScaleCapacity

Top comments (0)