Data scientists and machine learning developers are characterised by specialisation - just one in ten of them are involved end-to-end in the data science/machine learning (DS/ML) workflow. Excluding these generalists, around two-thirds of the remaining DS/ML developers are involved in three or fewer distinct stages of the DS/ML workflow. As we will see later in this chapter, this is part of a longer-term trend towards specialisation in data science and machine learning. As working practices mature and technical complexity increases, developers have to specialise in order to stay competitive.
For those involved in different stages of the DS/ML workflow, data exploration and analysis is the most often selected - nearly half of DS/ML developers said that they are involved here. Furthermore, with around two in five selecting model development or visualisation/presentation, it’s clear that these three activities form the bedrock of DS/ML projects. In fact, nearly three-quarters of DS/ML developers are involved in at least one of these three stages, and 14% do all of them. Whilst DS/ML developers are rarely involved end-to-end, most maintain a varied skill set, incorporating coding, statistics, and communication into their repertoire.
Further down the list, just under a third of DS/ML developers reported being involved in data/feature engineering or data ingestion. These somewhat less glamorous activities nevertheless underpin much of the DS/ML lifecycle.
DS/ML developers doing data engineering or ingestion tend to only stay involved early on in the DS/ML workflow. Those concerned with data ingestion are only more likely than average to also do data engineering or exploration/analysis - there are clearly some transferable skills here, but these don’t often translate to model deployment or optimisation. DS/ML developers involved in feature engineering often take part in the model development process - the iterative synergy between feature engineering and model development is clear here.
Top comments (0)