DEV Community

Discussion on: Exploratory Data Analysis; Much Time & Effort?

Collapse
 
madhadron profile image
Fred Ross

My exploratory work tends to fall in one of three categories:

  1. Diagnose what went wrong with this system.
  2. Estimate if this project is worth pursuing.
  3. Try to poke holes in a model before deploying it.

It's all goal directed against a mental model of what I'm doing. For (1), it's trying to eliminate swathes of the control flow and data flow of the system as quickly as possible. For (2), it's getting rough estimates of the economic impact of an idea. For (3), it's trying to find situations that will break a model, rather like the adversarial process a security researcher goes through.

Instead of writing up a huge list of things to explore, treat it the way Tukey did in his book Exploratory Data Analysis that started the field. You start with a data set, and sequentially ask questions about it and answer them as fast as possible, since often you can cut off a whole line of questioning at the very beginning with a quick check. (It's a great book, by the way...especially to see what he developed to do this kind of work by hand.)

Collapse
 
mccurcio profile image
Matt Curcio

Thanks Fred,
I have seen references to John Tukey's book in my reading. It does look interesting.

Collapse
 
mccurcio profile image
Matt Curcio

Hi Fred, I am curious. What other Data Science blogs or sites do you follow?

Thread Thread
 
madhadron profile image
Fred Ross

I don't.

Collapse
 
mccurcio profile image
Matt Curcio

Hi Fred,
I am curious. What other Data Science blogs or sites do you follow?