loading...
Cover image for Data Engineering vs Data Science

Data Engineering vs Data Science

charlesdlandau profile image Charles Landau ・1 min read

I was in a local technology Slack and saw the question:

Tell me, what is data engineering and how is it different than data science? What [Python] tools do you use?

It's a good question and it's worth considering. Here's what I think:

https://media.giphy.com/media/4WF7SSNIxQMRMflOAE/giphy.gif

Ted Danson reading a slip of paper. Caption: "Okay, here we go"

Data engineers are concerned with how data:

  • Lands in a system
  • Moves through a system
  • Interacts with business processes and application logic
  • Is stored
  • Is governed

Definitions differ, but I think that's a good starting place.

There is a huge and diverse ecosystem of tools out there. I would highlight the following as tools with strong Python tooling:

  • Kafka (for pub/sub)
  • Spark for data processing (including the Spark Streaming API for stream processing)
  • Airflow for workflow management
  • Pandas for data "wrangling"

And you can find tons of tools on the awesome list.

Lastly, how do data scientists differ from data engineers? I would argue that they're both roles that are concerned with connecting data to business processes. They differ the same way their names differ:

  • A data engineer is concerned with designing and building systems that make data available and actionable in a cost-effective manner.
  • A data scientist performs experiments, the results of which are (sometimes) actionable insights or automation.

I hope that explanation helps you. If you see something I didn't explain very well, or got wrong, please chime in with that 👇

Posted on by:

charlesdlandau profile

Charles Landau

@charlesdlandau

Data Scientist | Sr. Consultant at Guidehouse

Discussion

markdown guide