DEV Community

Abel Peter
Abel Peter

Posted on

Data engineering vs Data science vs Data analysis.

Prior to the beginning of this year, I thought the three fields above were similar and even worse, I would have probably used them interchangeably in a sentence. I figured out this wasn’t the case a few months ago and this past week really settled the debate in my head for good. They do obviously overlap since they all trade in one commodity but I will try to separate them and hopefully come up with a more elaborate whole.
Data being a valuable resource especially in this century where anything that can be automated is being automated, we need creative, efficient and feasible ways to collect this data, process the data and avail it with ease if need arises, this is where data engineering comes in. All insight that might be deduced from data is heavily dependent on the quality of the data. This makes data engineering more fundamental to anyone in the business of turning data to useful information.
Coming from a data science/ analytics background where I was still taking on ‘medium’ level tasks, I would often find fancy machine learning algorithms performing poorly than simpler ones like regression analysis and I came to learn the quality of the data played a bigger role in their performance. Also when dealing with poor quality data, using different tools would yield widely varying results that raises more questions than answers.
Having the skill to optimize data for a certain tool or a particular problem is a very important skill in data science and that’s why I decided to learn data engineering. Pairing this skills should make me a better problem solver in the data space.
Data science which is the next field is super diverse, some people are of the view that its just statistics with a fancy name, which is partly true! I think what makes it different from good old stats is the robustness and efficiency brought about by programming tools and more data. This is one of the main reasons it has been widely adopted by scholars in their research duties. Before, only a really good statistician would be entrusted with highly complex studies but with this tools available more people can take part in them. Also instead of using a particular analysis framework we can use all of them and discriminate in favor of which one mostly encompasses our problem.
Aside from the stats, we also have building machine learning models using different tools depending on your task and trying to predict values given another random dataset. This models usually have an accuracy score that show how well it can predict the data. Most times a high accuracy score on a particular data can be regarded as a bad fit since it may fail to generalize given new data.
For data analysis, it can be referred to as the entire process of processing of the data from collection to aggregating the data and using the data to tell a story about something. Data analysis is a lot older and broader compared to the others, this is because human beings have been collecting data and trying to use data in all spheres of life from time immemorial. From medicine, finance, engineering, farming etc. all this require data analysis for success. With technology just like earlier stated our tooling just got a lot better, we have more data points and better data collection methods. In a business setting, data analysts are required to generate pitchbooks and dashboards used for presentation purposes.
The data here should be intuitive enough to the average person so that important information can be relayed. Visualizations tools such as PowerBi, tableau are leveraged to allow for ease in presentation. Data analysts also heavily rely on the quality of data if they are to convey any valuable information to concerned parties.
Finally, they are all strongly related to each other and a bad performance in one sector is going to affect the other sectors too. Since the ultimate goal is development of business intelligence and insight, their needs to be a positive feedback loop such that the decisions made by management are data driven and the feedback from the market is the relayed to the technical team who then try to come with better ways to leverage the data available.

Thank you for reading , follow me for more articles on the topics above!

Top comments (0)