DEV Community

Cover image for Data Engineering 101: Introduction to Data Engineering
Claire Maina
Claire Maina

Posted on

Data Engineering 101: Introduction to Data Engineering

A data engineer designs and builds systems that collect, store, manage and analyze data. Companies collect a lot of information about their business from various resources, and they need data engineers to make this information accessible, structured and usable. A data engineer can build data pipelines, optimize queries, create automated systems, manage data warehouses, and develop data workflows.

Roles of a Data Engineer
• Build data pipelines – this involves collecting data and building data warehouses or data lakes.
• Make data accessible – involves remodeling the data in a way that is easy for all stakeholders to access interpret and manipulate. Excel, Power BI, and Tableau are some of the tools mostly used.
• Optimize queries – involves updating the current queries to meet current business needs.
• Data maintenance – involves testing and maintenance to ensure the system is running smoothly.

Skills needed

  • Distributed systems: Hadoop
  • Databases: MySQL
  • Data processing: Spark
  • Real-time data ecosystem: Kafka
  • Data orchestration: Airflow
  • Data science: pandas (Python library)

Software and Technology Requirements

  1. Cloud account - Google GCP, AWS or Azure.
  2. Python 3, Python IDE and a text editor - VSCode, Anaconda.
  3. SQL server and MYSQL Workbench or DBeaver and DBVisualizer.
  4. Git and version control system

Top comments (0)