DEV Community

Cover image for Data Lake vs Data Warehouse
luminousmen
luminousmen

Posted on • Updated on

Data Lake vs Data Warehouse

For a long time, I do not understand the concepts of Data Lake and Data Warehouse. It seemed to me that they are the same thing β€” a data storage where I can find needed data and process it for my purposes.

I wasn't wrong but there is a difference.

Data Warehouse supports the flow of data from operational systems to analytics/decision systems by creating a single repository of data from various sources (both internal and external). In most cases, a Data Warehouse is a relational database that stores processed data that is optimized for gathering business insightsπŸ”. It collects data with predetermined structure and schema coming from transactional systems and business applications, and the data is typically used for operational reporting and analysis.

πΉπ‘œπ‘Ÿ 𝑒π‘₯π‘Žπ‘šπ‘π‘™π‘’, 𝑙𝑒𝑑’𝑠 π‘ π‘Žπ‘¦ π‘¦π‘œπ‘’ β„Žπ‘Žπ‘£π‘’ π‘Ž π‘Ÿπ‘’π‘€π‘Žπ‘Ÿπ‘‘π‘  π‘π‘Žπ‘Ÿπ‘‘ π‘€π‘–π‘‘β„Ž π‘Ž π‘”π‘Ÿπ‘œπ‘π‘’π‘Ÿπ‘¦ π‘β„Žπ‘Žπ‘–π‘›. π‘‡β„Žπ‘’ π‘‘π‘Žπ‘‘π‘Žπ‘π‘Žπ‘ π‘’ π‘šπ‘–π‘”β„Žπ‘‘ β„Žπ‘œπ‘™π‘‘ π‘¦π‘œπ‘’π‘Ÿ π‘šπ‘œπ‘ π‘‘ π‘Ÿπ‘’π‘π‘’π‘›π‘‘ π‘π‘’π‘Ÿπ‘β„Žπ‘Žπ‘ π‘’π‘ , π‘€π‘–π‘‘β„Ž π‘Ž π‘”π‘œπ‘Žπ‘™ π‘‘π‘œ π‘Žπ‘›π‘Žπ‘™π‘¦π‘§π‘’ π‘π‘’π‘Ÿπ‘Ÿπ‘’π‘›π‘‘ π‘ β„Žπ‘œπ‘π‘π‘’π‘Ÿ π‘‘π‘Ÿπ‘’π‘›π‘‘π‘ . π‘‡β„Žπ‘’ π‘‘π‘Žπ‘‘π‘Ž π‘€π‘Žπ‘Ÿπ‘’β„Žπ‘œπ‘’π‘ π‘’ π‘šπ‘–π‘”β„Žπ‘‘ β„Žπ‘œπ‘™π‘‘ π‘Ž π‘Ÿπ‘’π‘π‘œπ‘Ÿπ‘‘ π‘œπ‘“ π‘Žπ‘™π‘™ π‘œπ‘“ π‘‘β„Žπ‘’ π‘–π‘‘π‘’π‘šπ‘  π‘¦π‘œπ‘’β€™π‘£π‘’ π‘’π‘£π‘’π‘Ÿ π‘π‘œπ‘’π‘”β„Žπ‘‘ π‘Žπ‘›π‘‘ 𝑖𝑑 π‘€π‘œπ‘’π‘™π‘‘ 𝑏𝑒 π‘œπ‘π‘‘π‘–π‘šπ‘–π‘§π‘’π‘‘ π‘ π‘œ π‘‘β„Žπ‘Žπ‘‘ π‘‘π‘Žπ‘‘π‘Ž 𝑠𝑐𝑖𝑒𝑛𝑑𝑖𝑠𝑑𝑠 π‘π‘œπ‘’π‘™π‘‘ π‘šπ‘œπ‘Ÿπ‘’ π‘’π‘Žπ‘ π‘–π‘™π‘¦ π‘Žπ‘›π‘Žπ‘™π‘¦π‘§π‘’ π‘Žπ‘™π‘™ π‘œπ‘“ π‘‘β„Žπ‘Žπ‘‘ π‘‘π‘Žπ‘‘π‘Ž.

Although data warehouses can handle unstructured data, they don’t do so in the most efficient manner. With so much data out thereπŸ“ˆ , it can get expensive to store all of your data in a database or a data warehouse. Also, data that goes into data warehouses need to be processed before it gets stored β€” with today’s massive amount of unstructured data, that could take significant time and resources. In response, businesses started maintaining Data Lakes, which store all of an enterprise’s structured and unstructured data at scale in the most cost-effective manner possible. Data Lakes store raw data, and could be set up without having to first define the data structure and schema.

Data Lakes allow users to run analytics without having to move the data to a separate analytics system.

AWS post on the topic

Photo by Tom Gainor on Unsplash


Thank you for reading!

Any questions? Leave your comment below to start fantastic discussions!

Check out my blog or come to say hi πŸ‘‹ on Twitter or subscribe to my telegram channel.
Plan your best!

Top comments (0)