The data science project goes through phases of discovery, scoping, data analysis, data cleaning, modeling, evaluation, the deployment. The data scientists work collaboratively with ‘the business’ to define a scope with can be realistically solved.
All the tasks up until modeling and deployment roughly take up 80% of the time spent. Figuring out if the problem can be solved, and if there is relevant data to back it up is time consuming.
I spoke with a professional data scientist recently on this topic and they said, “A good chunk of my job is scoping: telling people that ML is a bad solution to their problem (they should try something simpler first), that we don’t have the data to solve their problem (and how we might get it), or talking through potential solutions.”