Start by trying to understand the fundamentals, what data engineering is, some of the common task that data engineer performs at different companies, common data terminology and learn how to define your problems before you tackle them.
Work with SQL to use databases for storing, reading and updating data.
Learn the fundamentals of Python programming: working in notebooks; logic and functions; and data structures.
Get to grips with everything about public cloud, You can use one for instance if you choose to use AWS, learn about a host of services offered by AWS and work hands-on with them to work with data and applications in the cloud.
Learn how to connect large data sources in the cloud to create data lakes. Understand data analytics as it pertains to big data and data lakes.
Learn how to build data pipelines. Learn how to get your data where you want it and when, using tools like Apache Hadoop and Apache Spark.
Gain practical experience writing functions in Apache Spark to test quality metrics and learn how to document data lineage.
You can read more about this and more from the presentation we had for Data Science East Africa Data Engineering Bootcamp here.
Here are some of the resources i used when i was getting started in data engineering.
Python
SQL
Amazon Web Services, AWS.
Microsoft Azure
Google Cloud Platform
Apache Spark
Please add any useful resources that you think might be important for an aspiring data engineer in the comment section.
Thank you for reading all through and all the best as you explore and build your world class data engineering career ✌️.
Top comments (2)
Here is the advice that I wish I had known when I first began studying data engineering, and that I would advise everyone to be aware of.
Thank you for this..been wanting to shift into tech , Data Engineer sanasana... asante