Data Science is a career path that many people are currently choosing based on an essay written a decade ago. Yes, it has been almost ten years since the Harvard Business Review essay was published, yet we are all still choosing Data Science as a professional path. However, if you read through LinkedIn recently, you will notice that many people have posted that they are recovering data scientists who have now become data engineers.
Many of us are drawn into the subject of Data Science just to realize it isn't for us finally. And, while there is some gatekeeping around Data Engineering, it appears that many people who have initially been data scientists eventually switched over to Data Engineering.
What may be the reason for this? In this blog post, I hope to help you avoid the whole transition from data scientist to data engineer by giving you a few fundamental reasons why you should become a data engineer rather than a data scientist.
Disclaimer: Some who read the title of this article might assume this is some form of:
"We Don't Need Data Scientists" or "Data Engineering Is Better Than Data Science" type article. That's not the purpose of this article. It is meant to discuss why someone may prefer being a data engineer. Of course, if you work at a small enough company you might be a little of both.
If you enjoy building infrastructure, programming, and writing object-oriented code rather than merely procedural code to interact with data, you may be more of a data engineer.
Data engineers develop data pipelines, infrastructure, monitoring, and other aspects that aren't immediately related to models. Data engineers operationalize or productionize a model, which implies taking the analysis or Jupyter notebook that a data scientist created and applying it in a sustainable and robust system, rather than simply pressing "run" on that Jupyter notebook every day.
We prefer the discipline and process of constructing infrastructure instead of simply burying it in data frames that no one can access or adequately QA. Data engineers enjoy having a tangible end product. We don't want just an analysis; we want a table, a pipeline, a data warehouse, or a Data Lake.
Data engineers enjoy the sense of accomplishment that comes with finishing a project. Data science has an unending capacity to generate questions after questions, making your analysis endless. I have watched my data science counterparts often finish an analysis on a single data set, only to have to dig into the data set more due to the business asking even more questions.
But, as data engineers, we have a general standard to adhere to a table, a data pipeline, or something along those lines. Once we've created it, we're done. Sure, our stakeholders may say, "Oh, I wanted to add this column as well," but that's a new project or assignment, and we already know we've completed the previous project. In order to take on this new task we would need to reprioritize all of our current work.
The preceding is not necessarily true in the field of Data Science. It can be an infinite complex of questions, none of which will ever lead to an answer.
If having no actual end product is what you enjoy about your job, Data Science may be for you. However, if you prefer having a finished product at the end of the day, Data Engineering may be a better fit.
Another advantage of being a data engineer is that we are not always the center of attention. Data Science has a lot of sexiness and glamour, whereas data engineers can hide in the background, which many of us enjoy. Instead of spending a lot of times in front of co-workers explaining the impact of your model, we can often hide behind our keyboards and build our tables for our partners.
So, being a data engineer is ideal if you prefer completing your task without attracting a lot of attention and questioning. You get to do your work, and you know that once it's done, you can pass it over to the data scientist, who will analyze the data and then jump in front of a stakeholder or manager and explain what their results mean.
In that way, data engineering is an excellent job for folks who prefer interacting with a keyboard over dealing with other people.
Of course, I'm not implying that all data engineers are introverts or dislike interacting with others. You can get the opportunity to talk and share with others if you so desire. If you want to take it, it's there for you.
And I will always be a proponent of improving your soft skills, especially in terms of communication. Strong communications can help you leap forward in your career. Why else do you think I started a Youtube channel.
However, the business is more concerned with the analysts and data science findings. They are concerned with the amount of money they will save, the final model, and the influence on the business. Data engineering is critical, and data scientists cannot do their jobs without it. But few people care how the sausage is produced, they only care that it is on their plate.
Finally, if you prefer SQL over Pandas, you might be more of a data engineer.
I've discovered that data scientists seem to prefer using Pandas, and most data engineers tend to lean towards SQL.
In one way or another, both manipulate data. But if you need to execute a sophisticated, 1000-line query, I can only imagine how insane it would look in Pandas and how many calls to so many functions it would entail.
What's fantastic is that we live in a world where Spark, SparkSQL and DataBricks exist, so we can all play in the same arena and use a similar engine, but SQL will continue to be the data language since it has withstood the test of time.
So there you have it: four reasons why you should consider becoming a data engineer rather than a data scientist. If any of these points resonated with you, you might want to seek a career as a data engineer.
It's easy to be caught up in the Data Science allure, given that it's had its still a very sexy job. Still, other data-related areas, such as data engineering, need completely different day-to-day tasks and team relationships.
The skills necessary for these jobs, as well as the end deliverables, vary greatly. The bottom line is, to be honest with ourselves and select our best-suited, ideal career.
Of course, there are still other roles like analytics engineer and data analyst. So good luck finding the right career.
If you're interested in learning more about data engineering or data science, then consider these articles and videos.