Photo by Magnet.me on Unsplash
Data Science is a career path that many people are currently choosing based on an essay written a decade ago. Yes, it has been almost ten years since the Harvard Business Review essay was published, yet we are all still choosing Data Science as a professional path. However, if you read through LinkedIn recently, you will notice that many people have posted that they are recovering data scientists who have now become data engineers.
Many of us are drawn into the subject of Data Science just to realize it isn't for us finally. And, while there is some gatekeeping around Data Engineering, it appears that many people who have initially been data scientists eventually switched over to Data Engineering.
What may be the reason for this? In this blog post, I hope to help you avoid the whole transition from data scientist to data engineer by giving you a few fundamental reasons why you should become a data engineer rather than a data scientist.
Disclaimer: Some who read the title of this article might assume this is some form of:
"We Don't Need Data Scientists" or "Data Engineering Is Better Than Data Science" type article. That's not the purpose of this article. It is meant to discuss why someone may prefer being a data engineer. Of course, if you work at a small enough company you might be a little of both.
You Like Building Stuff
If you enjoy building infrastructure, programming, and writing object-oriented code rather than merely procedural code to interact with data, you may be more of a data engineer.
Data engineers develop data pipelines, infrastructure, monitoring, and other aspects that aren't immediately related to models. Data engineers operationalize or productionize a model, which implies taking the analysis or Jupyter notebook that a data scientist created and applying it in a sustainable and robust system, rather than simply pressing "run" on that Jupyter notebook every day.
We prefer the discipline and process of constructing infrastructure instead of simply burying it in data frames that no one can access or adequately QA. Data engineers enjoy having a tangible end product. We don't want just an analysis; we want a table, a pipeline, a data warehouse, or a Data Lake.
You Like Feeling Done
Data engineers enjoy the sense of accomplishment that comes with finishing a project. Data science has an unending capacity to generate questions after questions, making your analysis endless. I have watched my data science counterparts often finish an analysis on a single data set, only to have to dig into the data set more due to the business asking even more questions.
But, as data engineers, we have a general standard to adhere to a table, a data pipeline, or something along those lines. Once we've created it, we're done. Sure, our stakeholders may say, "Oh, I wanted to add this column as well," but that's a new project or assignment, and we already know we've completed the previous project. In order to take on this new task we would need to reprioritize all of our current work.
The preceding is not necessarily true in the field of Data Science. It can be an infinite complex of questions, none of which will ever lead to an answer.
If having no actual end product is what you enjoy about your job, Data Science may be for you. However, if you prefer having a finished product at the end of the day, Data Engineering may be a better fit.
You Don't Like Being The Center of Attention
Another advantage of being a data engineer is that we are not always the center of attention. Data Science has a lot of sexiness and glamour, whereas data engineers can hide in the background, which many of us enjoy. Instead of spending a lot of times in front of co-workers explaining the impact of your model, we can often hide behind our keyboards and build our tables for our partners.
So, being a data engineer is ideal if you prefer completing your task without attracting a lot of attention and questioning. You get to do your work, and you know that once it's done, you can pass it over to the data scientist, who will analyze the data and then jump in front of a stakeholder or manager and explain what their results mean.
In that way, data engineering is an excellent job for folks who prefer interacting with a keyboard over dealing with other people.
Of course, I'm not implying that all data engineers are introverts or dislike interacting with others. You can get the opportunity to talk and share with others if you so desire. If you want to take it, it's there for you.
And I will always be a proponent of improving your soft skills, especially in terms of communication. Strong communications can help you leap forward in your career. Why else do you think I started a Youtube channel.
However, the business is more concerned with the analysts and data science findings. They are concerned with the amount of money they will save, the final model, and the influence on the business. Data engineering is critical, and data scientists cannot do their jobs without it. But few people care how the sausage is produced, they only care that it is on their plate.
You Prefer SQL Over Pandas
Photo by Pascal Müller on Unsplash
Finally, if you prefer SQL over Pandas, you might be more of a data engineer.
I've discovered that data scientists seem to prefer using Pandas, and most data engineers tend to lean towards SQL.
In one way or another, both manipulate data. But if you need to execute a sophisticated, 1000-line query, I can only imagine how insane it would look in Pandas and how many calls to so many functions it would entail.
What's fantastic is that we live in a world where Spark, SparkSQL and DataBricks exist, so we can all play in the same arena and use a similar engine, but SQL will continue to be the data language since it has withstood the test of time.
So Which Data Career Is Right For You
So there you have it: four reasons why you should consider becoming a data engineer rather than a data scientist. If any of these points resonated with you, you might want to seek a career as a data engineer.
It's easy to be caught up in the Data Science allure, given that it's had its still a very sexy job. Still, other data-related areas, such as data engineering, need completely different day-to-day tasks and team relationships.
The skills necessary for these jobs, as well as the end deliverables, vary greatly. The bottom line is, to be honest with ourselves and select our best-suited, ideal career.
Of course, there are still other roles like analytics engineer and data analyst. So good luck finding the right career.
If you're interested in learning more about data engineering or data science, then consider these articles and videos.
My Favorite Books For Data Engineers - From Streaming To Software Engineering
Which Managed Version Of Airflow Should You Use?
What Is Trino And How It Manages Big Data
What I Learned From 100+ Data Engineering Interviews - Interview Tips
Top comments (2)
Good article.
A lot of this is more to do with what companies think they need, and a lot to do with following trends and fashions. It becomes more about - we need to be doing data science and artificial intelligence without thinking about what the goals are. A significant amount of machine learning simply becomes about replacing code with ML which produces similar outputs without as much code.
I have played around enough with ML to see these variations. Where ML makes strong promises are where significant coding logic would be needed and updating significantly. I have written a couple of neat applications on top of WordNet, but I still suspect some AI/ML is needed to improve the quality of the output.
You're all done with your pipeline until an edge case pops up you hadn't planned for and you spend a while looking through logs to figure out what exactly went wrong in the process :)
One of the things you missed is the difference in expectations. Stakeholders expect data engineers to process data and get it into a state others can use. Stakeholders expect data scientists to cure cancer in a single sprint. It's a lot harder to manage expectations when there is so much hype built up around a field. I just keep telling people Tesla promised full self driving cars "next year" 3 years ago and I still have to drive myself to work. It took Google several years and PhDs to get Google Assistant to "understand" a single sentence.