DEV Community

Emma Donery
Emma Donery

Posted on • Originally published at emmadonery.hashnode.dev on

Data Fallacies: I6 common Data Fallacies and how to Avoid them

Image description

The core advantage of data is that it tells you something about the world that you didnt know before. ~ Hilary Mason, data scientist and founder of Fast Forward Labs

Introduction👋

Data collection and analysis are essential to the long-term success of your business. Whether it be profitability analysis or looking for ways to improve processes, data guides the process. However, data analysis is not something to jump into blindly. While working on drawing meaningful conclusions from your data, you are likely to encounter many challenges. It may be difficult to collect or identify the right data streams. The data itself may be incomplete, inconsistent, or unavailable. However, having identified the data you need, and why you need it, you still may not be out of the woods.

What are Data Fallacies

Is your Data Tricking you????? 🤔

Data Fallacies are common tricks that data can play on you which lead to mistakes in data interpretation and analysis This article points out the common data fallacies and how to avoid them

🔷 Cherry Picking 🍒

image.png

The art of cherry picking, also known as the fallacy of incomplete evidence is the practice of selecting results that fit your claim and excluding those that don't. Simply stating, Cherry picking, suppressing evidence, or the fallacy of incomplete evidence is the act of pointing to individual cases or data that seem to confirm a particular position while ignoring a significant portion of related cases or data that may contradict that position

Example : Being dishonest with data. An example of intentional cherry-picking data is when a person or organization mentions only a small number of studies out of all studies published on a certain topic, in order to make it look as if the scientific consensus matches theirs.

  • Cherry picking can be deliberate or accidental Intentional cherry-picking data, by purposely omitting available evidence, is often done to make arguments more persuasive helping to support a particular position. Unintentional cherry-picking data is an example of one way people process information in order to make decisions. People who engage in unintentional cherry-picking of data tend to process information in a way that confirms the beliefs they already have. When people feel they are right and encounter new or remember old information, they tend to focus on information that confirms their beliefs and ignore information that contradicts them.

How to avoid :

  • The easiest way to avoid cherry-picking is not to do it! Researchers should always present the full range of their findings, not just the results that make them seem most credible.
  • When reporting on findings, researchers should be careful to avoid choosing words that imply meaning beyond what actually happened in the experiment.
  • Researchers can prevent cherry picking when choosing participants for a study and when presenting their research by adopting solid research practices include expanding the sample size, carrying out double-blind studies, and carefully selecting their words.
  • Researchers should utilize respondents from a variety of backgrounds (where possible) to prevent cherry-picking over the course of conducting a study and to reduce the prejudice that occurs with a limited perspective. Or, if their study just needs a smaller set of participants, they shouldn't try to generalize their results to a wider audience.

🔶 Cobra effect 🐍

image.png

Cobra effect is also known as a Perverse Incentive

Definition: The Cobra Effect is when an attempted solution results in unintended consequences.

Origin: The term originates from the story of a policy pursued by the British colonial government in India to tackle the menace caused by a huge population of cobras out in the open. It tried to incentivize the capture of cobras by providing a bounty, but the policy led to an actual increase in the cobra population as people began to breed new cobras in order to seek the bounty.

The cobra effect is usually cited to emphasize that good intention alone does not necessarily translate into desirable results.

How to Avoid the Cobra Effect

There are a few ways to avoid the cobra effect in survey research.

  • The first step is always to be aware of the possibility that there may be an unexpected side effect to the survey/study youre running.
  • A second way to avoid the cobra effect is to have a robust test and repeat process for your surveys. By continually testing and retesting your surveys, youll be able to spot any potential issues before they become problems.
  • The third step is to be clear with your question wording and instructions. There are lots of ways that ambiguous language can lead to unintended consequences in surveys, so be especially careful here
  • Finally, it may help to use a mixture of different types of questions in your surveys if possible. For example, you might use both multiple-choice and open-ended questions in order to get more accurate information about what your respondents are thinking and feeling.

🔷 Danger of summary Metrics

summary.png

Summary metrics - taking a series of data and summarizing it into a single data point like a total or average It can be misleading to only look at the summary metrics of data sets.

How to avoid: Never trust summary statistics alone; always visualize your data so you have a clearer picture of how and why your metrics are changing.

🔶 Data dredging

image.png

Also known as Data Dredging, Data Fishing, Data Snooping, Data Butchery

Data dredging refers to the failure to acknowledge that the correlation was in fact the result of chance. This is done by performing many tests and only looking at the ones that come back with an interesting result. Simply defining, "it is seeking more information from a data set than it actually contains.

It is the inverse of Cherry Picking. With Cherry Picking you pick the data that is most interesting, and with Data Dredging you pick the conclusion that is most interesting.

Solution: First formulate a hypothesis and then test it. Do not use the same data to both construct and test your hypothesis.

🔷 Gambler's Fallacy

gamblers.jpg

Gambler's fallacy also known as Monte Carlo Fallacy

It is the mistaken belief that because something has happened more frequently than usual, it's now less likely to happen in the future and vice versa

How to avoid : Ensure you evaluate whether your assumptions are based on statistical likeliness or more personal intuition.

🔶 False Causality

image.png

This is a common mistake and is known for the phrase 'cum hoc ergo propter hoc' ('with this, therefore because of this.') It is an error to falsely assume when two events occur together that one must have caused the other.

Solution : When you see a correlation, there can be no conclusion made regarding the existence or the direction of a cause-and-effect relationship. If you do not know the causation, do more research. In other words, Never assume causation because of correlation alone-always gather more evidence.

🔷 Gerrymandering

image.png

It is the practice of deliberately manipulating boundaries of political districts in order to sway the result of an election.

It is possible to change the boundaries of electoral districts in many political systems to favor one party over another, for as by including more rural areas in one district to disadvantage the party that is more popular in cities. When examining data, a related occurrence called the Modifiable Areal Unit Problem (MAUP) can take place. The outcome can vary depending on how you define the regions in which to aggregate your data, such as how you define "Northern counties." Data grouping scales can also have a significant impact. Whether utilizing postcodes, counties, or states, the results can vary greatly.

🔶 Hawthorne effect

image.png

Also known as the Observer Effect.

It is the effect that something changes just because you are observing it. Often occurs when collecting data on human research subjects

Solution : When using human research subjects, its important to analyze the resulting data with consideration for the Hawthorne Effect.

🔷 McNamara Fallacy

image.png

It is also known as the quantitative fallacy

The McNamara Fallacy is the mistake of making a decision based solely on metrics and ignoring all others. Relying solely on metrics in complex situations can cause you to lose sight of the bigger picture.

It is named for Robert McNamara, the US secretary of defense (19611968). Robert McNamara believed truth could only be found in data and statistical rigor, by measuring success in the Vietnam War by enemy body count and ignored other important insights like the mood of the US public and feelings of the Vietnamese people

How to avoid : Although data and numbers can tell you a lot, you should not obsess over optimising numbers while ignoring all other information

🔶 Overfitting

image.png

Overfitting is probably the best-known fallacy. According to Investopedia, Overfitting is an error that occurs in data modeling as a result of a particular function aligning too closely to a minimal set of data points

Techniques to reduce overfitting:

  • Increase training data.
  • Reduce model complexity.
  • Early stopping during the training phase (have an eye over the loss over the training period as soon as loss begins to increase stop training).
  • Ridge Regularization and Lasso Regularization
  • Use dropout for neural networks to tackle overfitting.

🔷 Publication Bias

publication.png

Publication bias is defined as the failure to publish the results of a study on the basis of the direction or strength of the study findings. How interesting a research finding is affected how likely it is to be published, distorting our impression of reality.

🔶 Regression Toward the mean

mean.jpg

It is also called reversion to the mean or reversion to mediocrity

Regression Toward the mean is a concept that refers to the fact that if one sample of a random variable is extreme, the next sampling of the same random variable is likely to be closer to its mean. It is just a fancy way of saying that when something happens thats unusually good or bad, over time it will revert back towards the average.

🔷 Sampling Bias

sampling.jpg

This occurs when conclusions are drawn from a set of data that isn't representative of the population you're trying to understand.

How to avoid : make sure that your data sample represents the population accurately so that whatever conclusions you make about your sample, you can conclude the same about your population.

🔶 Simpson's Paradox

image.png

Also known as Yule-Simpson's Effect

It is a phenomenon in which a trend appears in different groups of data but disappears or reverses when the groups are combined.

How to avoid : This fallacy is difficult to overcome beforehand. However, if you ever encounter this weird phenomenon by finding a bias that reverses if you look at different groups within your data, then know that you have not necessarily made a mistake. You may simply have found an example of Simpson's paradox.

🔷 Survivorship Bias

image.png

Drawing conclusions from an incomplete set of data, because that data has 'survived' some selection criteria It is the error of drawing conclusions from an incomplete set of data because that data has 'survived' some selection criteria.

How to avoid : When concluding something about the data that has survived a selection process, make sure you do not generalize this conclusion to the entire population

🔶 Texas Sharpshooter Bias

image.png

It arises when a person has a large amount of data, but only focuses on a small subset of this data. In many cases because this subset leads to the most interesting conclusion

It is named after a fictitious sharpshooter who lets off a lot of shots at the side of a barn, looks at it, finds a tight grouping of hits, paints a target around it, and then claims to be a 'great sharpshooter'. This bias is related to the clustering illusion, which is the tendency in human cognition to interpret patterns where none actually exist.

Data tip 💡 : Data is not about adding more to your plate. Data is about making sure you have the right things in your plate.

Final thoughts 💭

Analyzing data comes with its own pitfalls. Data is not, in and of itself, the key to success. However, when collected, analyzed, and used in a precise manner, with precise goals, it can make the difference between success and failure. To do that, you need to make sure that the entire process is free of bias. Now that you are aware of the pitfalls you may encounter, you are ready to tackle your data and start analyzing.

Please fell free to leave a feedback.

We can connect on Twitter | Linkedin | Instagram

If you find my articles helpful, you can support my work by buying me a coffee.

download-assets-sm-3.png

Thank you

Top comments (0)