DEV Community

Cover image for What is Data Engineering?!!!
Khalif Cooper
Khalif Cooper

Posted on

What is Data Engineering?!!!

Data Engineering is not new concept or new career path. Its a old concept. The idea and core of data engineering, and what its' all about has been around for a long time. The only difference is now, its being redesigned, redeveloped and re modified.

Think of full stack developers. A full stack developer, is just a back-end and front-end developer merged together. Likewise, If you ever think about databases or know people that are DBAs, SQL, and ETL Developers, then you seen data engineering before. You just seen it in a different light or perspective.

Data Engineering in simple terms consists of taking data from one place to another. To dive a little deeper, the reasoning behind why we need data engineering is the most important part.

Lets start with ETL. ETL in the beginning sounds confusing but we are going to start with the basics.

E stands for Extract. Extract meaning remove or withdraw, data from different sources. So the source can be data from ESPN, Yahoo News, CNN, CBS etc..

T stands for Transform. Initially when you extract data from these different sources, the data may be unstructured, may be under performing in terms of speed, or may need to join some data together into one table. This is where you will do this part.

L stands for Load. So, this is the final part. It means to store the data, that was transform somewhere in a data lake or data warehouse.

You can learn more about data lakes and data warehouses:here

You store the data in Google Cloud's Big Query, Azure SQL Database or AWS Redshift Data Warehouse. This is ETL. Congrats! You now know what ETL is, if you always wanted to know or if you want to teach someone else.

So there are some additional features around ETL, that you hear about such as Apache Airflow, Luigi, AWS Glue, Apache Spark. These are all tools that are either apart of the ETL process or makes the ETL pipelines more efficent at doing their job.

For example, Apache Airflow is used for scheduling or creating cron jobs to say something like "update this table of users in this particular database everyday at 6am pacific time". There is a lot more in data engineering but this is the bare bones on data engineering to get you started.

I hope this tutorial was informational and it gave a general idea of what data engineering is. Stay tuned to my next data engineering article where I am going to teach you how to write your own ETL pipeline in python.

Top comments (2)

Collapse
 
shreya123 profile image
Shreya

This blog provides a comprehensive and insightful overview of what data engineering truly entails. In today's data-driven landscape, understanding the backbone of data management and processing is crucial, and this article does an excellent job of demystifying the world of data engineering.

I appreciate how the article breaks down the core components of data engineering, from data collection to data quality and governance. It's clear that data engineers play a pivotal role in ensuring that data is not just collected but transformed into valuable insights. The emphasis on data quality is particularly important, as it highlights the fact that data engineering is not just about quantity but also about the reliability and accuracy of the data.

The mention of the tools and technologies used in data engineering is also helpful. It provides a starting point for anyone looking to explore this field further. The diversity of tools, from databases to ETL (Extract, Transform, Load) tools, illustrates the complexity of data engineering and the need for a versatile skill set.

Moreover, the article rightly underscores the broader impact of data engineering on businesses and society. In an era where data is often referred to as the "new oil," data engineering is the refinery that makes that resource usable. It enables data-driven decision-making, ensures scalability, and addresses critical issues like data security and compliance.

Overall, this blog post serves as a valuable introduction to the world of data engineering. It not only defines the field but also highlights its significance in our data-centric world. For anyone curious about what data engineering is and why it matters, this article provides an excellent starting point.

Collapse
 
braisdom profile image
Braisdom

1) SQL programming and SQL model design are necessary skills
2) Traditional SQL programming cannot be engineered
3) github.com/braisdom/ObjectiveSql is the best choice