DEV Community

Cover image for NEVER trust in unattended data
Rodrigo Matola
Rodrigo Matola

Posted on

NEVER trust in unattended data

Cover image by Jen Theodore on Unsplash

I will start with the phrase

"Against data there are no arguments"

and show in this text that there are arguments. And many…

This text can be considered an addition to "Automation Metrics: some problems".

Contextualizing

Data and metrics, depending what company, are difficult information to have access. They are often in the hands of another area (which is not IT and development) and it may be bureaucratic to gain access to Google Analytics, for example. And you may not get it.

This situation, for us QAs, is very bad, because we are left without foundations on what to prioritize in tests and automation, being totally in the "guess".

However, sometimes (always?) loose data comes to your attention, such as average age of customers, operating systems and browsers used, average browsing time on the site/application etc.

Since that's all you have, you end up making your testing strategy based on them. But is the information that came to you complete? Or worse, correct?

The examples I will use are real, but with adaptations to fit the text.

Android or iOS?

In a project that I worked on, we were doing very superficial tests on iOS, only happy paths. That's because the information we had was that about 70% of customers use Android. New features were prioritized for Android. Bugs on iOS, depending on the severity, would find a country house to vacation and not be disturbed.

I think 30% is a very large number to ignore, but this strategy was agreed with the team and the business area.

In a certain review, after me and some devs wanted more data, a dev brought some numbers he got from Firebase.

We found that Android was used by 90% (not 70%), which gave me a relief to know that our strategy was even more right. Relief that only lasted until the next slide.

This slide showed that Android customers spent an average time of 1.5 minutes on the app, while iOS customers spent around 7-10 minutes (I can't remember the exact time). This information immediately conflicted with the previous one in my head! A minute and a half could mean that people just installed the app to find out what it was about and never used it again! Or used it sporadically.

The time of 7 to 10 minutes was the time it took to make a complete flow in the app, that is, our focus maybe should be iOS and not Android! Was our whole strategy wrong?

I went to another team in the next Sprint as the apps were deprioritized for a while. I don't have any more information about the metrics...

Time of section

Once we had to set a session time for the person to be logged into the application. This was an Information Security (IS) imposition.

IS wanted us to set 3 minutes of session time. This time was enough for a query in the app. We had that same 3 minute value as an average time extracted from Analytics. We implement this value.

Few days (or weeks) later we started to receive complaints that the app was logging off by itself while browsing. We noticed that for some flows, 3 minutes was the time the person would spend if they made the flow directly and without errors, that is, almost like an automation. Of course, ordinary people were taking longer than that.

A large part of this complaint came when we launched a new paid functionality, previously restricted to a few customers, to basically the entire customer base. As new customers were unable to complete the registration process, they began to complain. Imagine the money lost during that time because of an incorrectly used metric.

Average alone means nothing

The following example was adapted from UOL Educação (in Portuguese).

Your street has several children and you want to develop a product focused on these children. You go to the street houses to ask the age of the children and set up the following table:

Ages: [9, 9, 9, 1, 1, 1]; 
Average = 5
Enter fullscreen mode Exit fullscreen mode

Then you make your products aimed at 5 year olds. Do you agree that 1-year-olds and 9-year-olds are in very different stages of development than 5-year-olds? You do not sell even one unit of your product.

"Average must be accompanied by at least the standard deviation"

If you calculate the standard deviation of this sample, you will find the value of 4 for more or less, that is, despite the mean being 5, you can have children aged 5+4=9 and 5-4=1, exactly as in your sample!

Another example, which is the time taken for a hospital discharge process, was taken from Minitab.

Image description

In the two graphs above, the average time is 35 minutes (dashed line), but the standard deviations are 6 and 20, respectively. That is, the average in the first case is more reliable than in the second.

In the first case you would wait between 29 and 41 minutes (30-40 minutes, rounding up). But in the second, you would wait from 15 to 55 minutes! Imagine you asking "how long is the wait? From 15 to 55 minutes", would you consider it a reliable measure?

Use other central tendency measurements

Measures of central tendency indicate a value that best represents the entire data set, that is, they give the tendency of concentration of observed values. (Ref. Introdução à Estatística Econômica, UAB/FURG. In Portuguese)

In addition to the average, when doing a data analysis, also take a look at the

  • median: value that is in the middle of the sequence, when the data are arranged in ascending order;
  • mode: value or values that occur most frequently.

See the dataset below, taken from <https://matematicabasica.net/media-moda-e-mediana/>

Image description

You can see that the values are much dispersed, and this is confirmed by the measurements:

  • average: 42.1
  • deviation: 23.2
  • median: 45.0
  • mode: 20.0 and 50 (bimodal)

Imagine that these were the ages of the people using your app and you need to focus your adds on an age group as your budget is low. What age would you focus on?

With that alone, can I make an assertive analysis?

No! For example, do our customers use Android or iOS? This is a hypothesis to be tested, so it is necessary to do a hypothesis test. It is also necessary to define the confidence interval.

The data must be treated to verify its consistency, if it has outliers, which distribution the data obey (normal, chi-square, etc.)

Anyway, did you realize that just an average is basically useless?

Conclusion

The purpose of this text is to warn that you cannot make decisions based on a single source of information, and not that you need to make fancy calculations (such as hypothesis testing) or have statistical training to do so.

On the news, you should go after more information to confirm the veracity. The same with data. Not that you need to consult another data source, as you will often have only one source, but look for other interpretations and analysis for the same dataset before making a decision.

Top comments (0)