DEV Community

Cover image for #015 ETL vs ELT | What's the difference between ETL vs ELT | ETL by examples and real company cases
Kemal Cholovich
Kemal Cholovich

Posted on

#015 ETL vs ELT | What's the difference between ETL vs ELT | ETL by examples and real company cases

ETL vs ELT?

ETL stands for Extract, Transform, Load, and it is process in data warehousing where data is extracted from various sources, transformed to fit the data warehouse's schema, and then loaded into the data warehouse.

The process is typically used to move **data from a **source system, such as a transactional databases, CSVs or a spreadsheets, to a destination system, such as a data warehouses or a data lakes.

ELT stands for Extract, Load, Transform, and it is similar to ETL but the difference is that data is loaded into the data warehouse first and then transformed.

At the end, the main difference between ETL and ELT is the order of the two last steps, transforming and loading.

How to decide when to use ETL or ELT?

The choice between ETL | ELT is often influenced by the type of data sources, the complexity of the data transformation, the computer power of the system, and the scalability of the solution.

When to use ETL?

  1. When the data sources are not compatible with the data warehouse's schema, and

  2. Significant transformations need to be made before loading the data

ETL Example:

CASE 1:

You are collecting data from multiple sources (CSV, Excel, APIs...) and you want to store the data in a central location like a data warehouse for reporting and analysis.

CASE 2:

LIDL company wants to analyze sales data from multiple stores. The company would EXTRACT the sales data from the transactional databases of each store, TRANSFORM the data to match the format of the data warehouse, and then LOAD the data into the data warehouse for analysis.

During the extraction phase, the company may need to filter ** or **aggregate the data to only include the relevant information.

During the transformation phase, the company may need to clean and transform the data, such as converting data types or calculating new fields.

Finally, during the loading phase, the company may need to handle any conflicts or errors that arise when loading the data into the destination system.

The data from these sources likely have different structures and formats, so you would use ETL to EXTRACT the data, TRANSFORM it to fit a common schema , and then LOAD the data into the data warehouse. Simply, isn't it? ;)

When to use ELT?

  1. When the data sources are compatible with the data warehouse's schema, and

  2. The data only needs minimal transformations before being loaded into the data warehouse.

ELT Example:

CASE 1:

You are collecting data from a modern source such as a data lake that already has a similar structure and format as the data warehouse.
In that case you could use ELT. You should load **data into data warehouse first, then perform any necessary **transformations.

CASE 2:

Accenture's analytics services wants to analyze social media data. The company would extract data from social media platforms such as Twitter and Facebook, load the data into a data lake or a data warehouse, and then transform the data using SQL or other data manipulation tools.

During the extraction phase , the company need to _filter _or _aggregate _the data to only include the relevant information.

During the load phase , the company may need to handle any conflicts or errors that arise when loading the data into the destination system.

Finally, during the transformation phase , the company may need to clean and transform the data, such as converting data types or calculating new fields.

Thanks,
Kemal Cholovich

Top comments (0)

An Animated Guide to Node.js Event Loop

>> Check out this classic DEV post <<