DEV Community

Cover image for Tech stories: completely wrong data ignored
András Tóth
András Tóth

Posted on • Updated on

Tech stories: completely wrong data ignored

Reading how a serious problem about fake networks got ignored at Facebook I remembered my own little story about raising a problem and getting the hammer for it. This was also partly the reason I think I was fired: "András, why are you spending time on this when you need to...".

The chilling feeling of discovering wrong data

The company I was working for wanted to employ a hybrid strategy of collecting quantitative (registering "clicks" and user behavior and then crunch the numbers) and qualitative (observing user behavior through interviews). This was an excellent approach but it depended on the quality of the data collected.

I was sitting at my desk when one of the team leads and our UX researcher were chatting about some data they used to draw decisions upon.

The assumption they had was that

90% of the users reading help articles are not logged in.

This was drawn from user behavior data. The data consisted of events: User X clicked Button Y and user X has these properties {...}.
Many teams were adding their own events to this.

I was pretty surprised to hear this number so I turned to my right and asked
"What event property did you use to determine this?"

They replied with a particular property (registered??? I can't recall the name, it's not important).

From my previous endeavours I knew already that that particular property was not really trustworthy. In fact I never understood how it worked (until I have investigated it; and then I shook my head...).

I freaked out and started investigating how that property is set. It turned out that they heard of somebody who said that this is the right property.

But it wasn't. In fact there were about 3 properties and none of them was actually telling if the user was logged in or not (but we fixed it by adding a fourth, which was accurate 🤫).

Bad data? Cool story bro!

By stopping my work and franticly trying to investigate how those properties are actually working I realized the real number was around 50% or more: most people were logged in. We can't base our strategy on the totally false 90% value.

And then what happened? Nothing! 🙃

They did not really give a f--- about it, and I was even scolded for setting my work aside for a couple of days.

Cool story bro! Now go and do work.

I was shocked by this. In my spare time I was going down the rabbit hole and I realized we collect data in many ways so flawed that it can't be trusted. Partly from having so many independent teams and partly because no one likes to change plans.

The decision was made and annoying facts should not bother it.

I wrote a big plan about what to do and I could actually warn one of the data scientists we had. And that's it.

A bit later I was fired for "not being a good cultural fit". Obviously there were more stories behind it, so stay tuned 😄.

And now the constructive part...

How to fix "serious data collection flaw"

There are many aspects here.

Clean code, clean events

Most importantly, events need to be clean and clear. The names should be meaningful and they should be regularly checked for validity. If you have multiple teams you can still have some core data analysts who have the responsibility of improving the quality of most important areas.

Shit data, shit decision => fix data and decision

It sucks to stop your work. But if you uncover this huge flaw you have to take it seriously. The entire team should take it seriously. Or at least the team lead or product manager.

Anticipate the disruption

Normalcy bias is at play here as well: "You just said that my assumptions were completely wrong, but I pretend it's no big deal and keep the dear assumptions."

If you expect that you will have disruptions, your sense of normalcy will incorporate that you need to work unexpectedly on major flaws.

If politics is more important than accurate decisions, you have a cultural problem

There I have said it. You have to have ear and respect for the engineers, data scientists and low-key workers raising serious issues. Pretending the problems do not exist will cost you real $$$s (and also demoralises your workforce). Even worse strategy to expect them to solve the problems in their spare time, since people will do uncoordinated guerrilla actions, leading to an even more confusing system.

The mistreated issues will mount up eventually and you are going to shoot yourself in the foot. At least there's free cereal in the office kitchen and there are important motivational messages scattered on the walls.

Top comments (0)