Science is way more important than Data
The common perception of data science work you need a lot of data, then you formulate a hypothesis, run EDA etc.
Let me quote the simplest book on a topic: “Frantically, she tried to cast back for anything she’d read about what scientists were supposed to do. Her mind skipped gears, ground against itself, and spat back the instructions for doing a science investigation project:
- Step 1: Form a hypothesis.
- Step 2: Do an experiment to test your hypothesis.
- Step 3: Measure the results.
- Step 4: Make a cardboard poster.” Excerpt From “Harry Potter and the Methods of Rationality” Eliezer Yudkowsky
Starting with data is such a common mistake it’s unbelievable, and this is where scientists could be bringing value into business using their superpowers:
- By discussing the Hypothesis with the business (product owner) early you can agree on what is the common problem you are trying to solve and align data science work with business needs.
- By designing the experiment upfront, preferably together with the business (product owner) you can agree on measurements that disprove your hypothesis first — fail fast.
- Then you can agree that measurements would be taken in a legal, compliant and ethical way and you can proceed to make and derive insight (and make cardboard posters using modern visualisation techniques).
- If you skip those steps and start with data (garbage collection), how would you know that data are useful, legal, compliant, ethical and insight generated have some form of value for the business?
Insight should lead to decision and action and this is where agreeing on a hypothesis early will help to drive value for the business.
Let me illustrate with an example:
Keynote speaker at one of the conferences presents the results of running deep learning models over retina scans, he also highlights that the ML model classifies retina scans into two buckets which are believed to be related to the patient's gender. “Isn’t it amazing?”. The fellow next to me jumps in and after a few questions conclude “You got the science part wrong”.
Do you think if you are a medical professional you will be making decisions on fertility/contraception plans based on retina scans? I hope not, particularly if data scientists are on receiving end of the treatment :)
For data scientists: those lectures on the philosophy of science and the scientific method were not optional (to quote my Prof, I missed them initially).
If you are given data and asked to derive insight most likely it’s too late, engage early to validate the business needs/value and measurements, even if data is collected legally for one purpose it doesn’t mean insight derived can be applied for another purpose and to be useful insight need to be applied. The value of analytics grows with the most value for the business is prescriptive analytics (see Jerry Overton course on O’Reilly platform).
For business: scientists and data scientists, in particular, can help drive business decisions if engaged early and in the right way, Goldratt's famous business novels have data scientists nearly in every single one of them — helping to drive policy changes.