DEV Community

Beatrice Akaeme for AWS Community Builders

Posted on

SEVERLESS DATA LAKE ON AWS

** WHAT IS DATA LAKE**

A data lake is commonly considered as a storage system where data can be stored in a natural form. It allows organizations to store all their structured and unstructured data at any scale. When you store it using the structured or unstructured, it means that You can get and store your data the way it is, without having to first structure the data, and run different types of analytics. Data lake exist because we all use one form of data or the other and we have got systems of record, we have got streaming data, we have got batch data internal, external data. And its really a combination of theses different kinds of data sources that leads us to get powerful insights about what our users are doing about the way the world is working around us and leads us to develop more intelligent applications.

*WHY DATA LAKE *
AGILITY: A data lake gives the company agility, the ability to work super fast and access data quickly.
It provides a repository where consumers can quickly find the data they need and use it in their business projects.
A data lake lets you break down data silos and combine different types of analytics to gain insights and guide better business decisions.

ORGANIZATION OF DATA: Helps to organize, manage and store all categories of data especially in large cooperation that works with thousands of data.
This cooperation doesn't have to worry about having their data all over the place.

PROCESS INVOLVED IN BUILDING DATA LAKE
DATA INGESTION: The ability to ingest data from various sources.

DATA STORAGE: S3 is used to ensure security and durability of your data at low cost.

DATA CATALOGUE: This is an application that contains your data or all data in your organization. It makes finding your data more easy.

**DATA DISCOVERY/SEARCHING.
**This is a data discovery tool for collecting and evaluating several data sources.

                   **DATA LAKE ON AWS**
Enter fullscreen mode Exit fullscreen mode

This means organizations prefer to create their data lakes on public cloud such as Amazon Web Services(AWS). For the following reasons:

Pay As You Go: When businesses uses public cloud, they get to pay for only the period/capacity they actually use the services for their data. AWS scaling capacity makes equivalent resources available at much lower cost compared to personal and small quantity purchase by every private owned business. Certain services such as, database, table, column and tag-based access controls, and cross-account sharing, are provided by AWS Lake Formation at no charge.

Upfront capital investment: With AWS, there is a No Upfront, Partial Upfront, or All Upfront available. AWS provides these three reserved instance payment options for you choose to choose from based only on your usage. No upfront investment is required in this case.

Fast Development: The required tools needed for development are just a click away on a cloud. Build your Data lake and deploy faster on AWS cloud.

Great tools for your Data Lake: With "AWS Lake Formation", you can set up one of the most simplistic data lake on the cloud

AWS Security: AWS provides you with secure infrastructure for your projects and access to the most complete platform for Big Data.

Higher availability On AWS: You can build and deploy your Data Lake on AWS set to multiple data centers to make it highly available and ensures quicker disaster recovery.

Faster scalability: AWS provides a great choice for Data Lake Scalability such as Amazon S3. It can cost-effectively build and scale data lake of different sizes in a very secure environment.

Reliable and Secure Storage: With Amazon S3, you can be sure of a durable, reliable and secure storage for your Data Lake. In Amazon S3 your data is protected by 99.999999999% (11 9s) of durability.

            **SERVERLESS DATA LAKE.**
Enter fullscreen mode Exit fullscreen mode

AWS Serverless Architecture helps us build and run our Data Lakes easily.
How can we build data lake without having to manage any servers? What AWS services can we use in order to build it? Building your Data Lake on AWS means you as a client building and deploying
your Data Lake without having to manage any servers yourself. Aside from this benefit of not having to manage servers, you only pay for what you consume and this architecture scales as your data grows. It is more scalable, faster to release, and flexible, at a lower cost because users only pay for what they use.

Why serverless?

Storage: A serverless data lake architecture offers a highly available, scalable, secure and durable amount of storage with low latency. A serverless storage solution like S3 is suited for that due to its lower cost. You can store various kinds of data from mobile apps, business apps, websites etc.

For computing: Example of AWS serverless compute services are Glue, Lambda and Fargate used in different use cases. These computing resources are used to run a code on-demand

Management: The basic infrastructure is completely owned and operated by AWS, so the client needs not worry about maintenance other than focusing on ant monitoring the day to day activities of the service.

High Availability: AWS ensure that the architecture is highly availability. The data is replicate across regions automatically.

Top comments (0)