Dave Amiana

Posted on Apr 26, 2021

Critical Introduction to Probability and Statistics: Fundamental Concepts & Learning Resources

#datascience #computerscience #machinelearning

Probably the most applied mathematical concept and anyone can learn it. With great power comes greater responsibility. Let us remember our philosophical grounds before we begin our scientific journey.

Most people use statistics like a drunk man uses a lamppost; more for support than illumination. - Andrew Lang

There is a $\frac{1}{6}$ chance of getting 6 when you roll a die. Eating Chocolate (given $X,Y,Z$ ) can reduce up to 10% of your body fat. Brand $X$ is statistically proven to have a significant effect on $Y$ . Are these forms of headlines familiar to you? What do we mean by statistically proven? What methods were used in these forms of analysis? How do[or should] we orient ourselves with interpreting statistical results? More so, how do we understand statistics at all? Should we rely on method $X$ just because they were mentioned by the author in their study?

There are practical concerns with analyzing and interpreting statistical results. For example, a study that strongly recommends drug X to patients with Y types of illnesses but has failed to showcase the false-positive result of their analysis — probably because their analysis failed to account for false-positives and false-negative instances —the drug can potentially do more harm than good [and we don’t know that for sure] since the flawed statistical analysis and poor interpretation of experimental results deviate from what could actually happen. Indeed, we should not just skim through the numbers and convince ourselves that a study’s claim is reasonable. We should justify scientific methodologies and how the premises follows through their conclusion. This scenario is just one example of the importance of having well-grounded foundations in Probability and Statistics — it enables us to properly interpret results that are relative to the event of which we are concerned.

Given my fair share of experience in statistical modeling (for time-series forecasting) and data analysis, I have encountered and tried using some of the most sophisticated statistical programs like SPSS, Eviews, GMDH Shell, and GraphPad Prism— which helped me implement sophisticated statistical models, which have been greatly founded in practice for solving problems that are akin to my research projects. But even with systematic statistical treatments what troubled me, in the end, was interpreting the results of my experimental data. How can I make sense out of my seemingly random data? Of course, a reasonable response would be to consult from your other researchers to elaborate on their methods which inspired your way of approaching the problem or consulting your research adviser. But if you happen to ask a lot of questions, like why did you use this value as your threshold? Why does this method make sense? How would you justify your assumption? These questions lead you to problems that concern the Theory of Statistics for years — namely the varying interpretations of Probability.

I learned that a firm understanding of the reasoning behind the fancy methods in Statistics improves our ability to scrutinize scientific methodologies or interpret experimental results

“It is unanimously agreed that statistics depends somehow on probability. But, as to what probability is and how it is connected with statistics, there has seldom been such complete disagreement and breakdown of communication since the Tower of Babel (Savage, 1972)”.

In this article, I hope to convince you to be (1) more responsible for interpreting statistical results to avoid inadequately jumping into conclusions; and (2) learn the subject with a brief overview and introduction to some of its most fundamental concepts — this is not to be confused as basic statistics where concepts such as sampling, population, and the central tendency are defined — although I made links for notable learning resources in understanding the basics of Statistics. Since there have been a plethora of learning resources out there, I ought to take a slightly different perspective to build our solid grasp of the subject which should benefit readers from a broader set of backgrounds.

Be warmed: P-hacking!

A focus on novel, confirmatory, and statistically significant results leads to substantial bias in the scientific literature. One type of bias, known as “p-hacking,” occurs when researchers collect or select data or statistical analyses until nonsignificant results become significant (Head, Holman, Lanfear, Khan, & Jennions, 2015).

Overview

Statistics is a branch of mathematics concerned with data collection, organization, analysis, interpretation, and presentation [1-2]. It deals with every aspect of data, including the planning of data collection in terms of the design of surveys and experiments [3]. It is important to note that while experimental designs and surveying are held of importance in practicing statistical analysis, the focus of this article will only be dedicated to the fundamental features of statistics which shall better suit our grounds of the matter.

Statistics is concerned with the use of data in the context of uncertainty and decision making.

Two main statistical methods are used in data analysis [4]:

descriptive statistics, which summarize data from a sample using indexes such as the mean or standard deviation.
inferential statistics, which draw conclusions from data that are subject to random variation (e.g., observational errors, sampling variation).

Descriptive statistics are most often concerned with two sets of properties of a distribution (sample or population): central tendency (or location) seeks to characterize the distribution’s central or typical value, while dispersion (or variability) characterizes the extent to which members of the distribution depart from its center and each other.

Inferences on mathematical statistics are made under the framework of probability theory, which deals with the analysis of random phenomena.

Broadly speaking, Statistics is outlined here:

%[https://en.wikipedia.org/wiki/Outline_of_statistics]

Thus, Statistical analysis follows a relative pattern to that of inductive inference: the set of data from our observation is being held as a set of inputs while the statistical procedure outputs a decision (based on some threshold or boundary) or evaluation that transcends the data such as a statistical model that predicts future events.

Although we can agree that, for the most part, Statistics is broadly divided into these categories (Inferential and Descriptive), we must also note that it pertains to the categories of statistical method and does not take accounts into the theory to which Statistics is subjected into namely the interpretations of probability such as Bayesian School of thought and the Classical (Frequentist) Approach to Probability which we will cover more closely in the succeeding section.

History

The earliest writing on statistics was found in a 9th-century book entitled Manuscript on Deciphering Cryptographic Messages, written by Arab scholar Al-Kindi (801–873). He made a detailed presentation on how to use statistics and frequency analysis to decipher encrypted messages. Scholars believed that Al-Kindi’s work laid out the foundations for statistics and cryptanalysis [5–6].

Early applications of statistical thinking revolved around the need for states to base policy on demographic and economic data. Statistical modes of tracking and gaining insights have become relevant since then.

The mathematical foundations of modern statistics were laid in the 17th century with the development of the probability theory by Gerolamo Cardano, Blaise Pascal, and Pierre de Fermat. Mathematical probability theory arose from the study of games of chance — which involves randomizing devices such as rolling a dice, shuffling cards, and roulettes wherein the outcome is strongly influenced by.

The initial conception of probability only involves discrete entities e.g. number of people or the number of cards, in other words, it follows a fine distinction between one entity to another entity of the same class and does not accept the notion of continuity; that is, the infinite sequence of numbers between 0 and 1. The initial methods that were relevant for analyzing discrete entities were mainly combinatorial.

Over time, analytical methods — which pertain to the methods used in mathematical analysis, a formalization of Calculus — were considered to incorporate continuity of the event space which contains all the possible events for a given set of variables. Because of this, the Soviet mathematician, Andrey Nikolaevich Kolmogorov (1933) presented his axiom system for probability theory which became the mostly undisputed axiomatic basis for modern probability theory.

Why Study Statistics?

Today, statistical methods are applied in all fields that involve decision making, for making accurate inferences from a collated body of data, and for making decisions in the face of uncertainty based on statistical methodology. The use of modern computers has expedited large-scale statistical computations and has also made possible new methods that are impractical to perform manually [7].

The mathematics that forms the basis of statistics stems from probability theory and has a firm axiomatic foundation and rigorously proved theorems [7]. However, the interpretations of probability that are mainly applied in statistical analysis remain a controversial subject in the philosophy of science and epistemology.

From Probability Theory to Statistics

Statistics was antedated by the theory of probability. In fact, any serious study of statistics must of necessity be preceded by a study of probability theory — since the theory of statistics grounds its foundation. While the theoretical ends of statistics ought to agree that (at least to serve as a common feature) it depends on probability; the question as to what probability is and how it is connected with statistics have experienced certain forms of disagreement [8].

Whereas there are arrays of varying statistical procedures that are still relevant today, most of them rely on the use of modern measure-theoretic probability theory (Kolmogorov) while others express near relative as a means to interpret hypotheses and relate them to data.

Probability is the most important concept in modern science, especially as nobody has the slightest notion of what it means (Russell, 1929).

What does probability mean? The mathematical notion of probability does not provide an answer to this. Hence, the formal axiomatization of probability does not guarantee that it be held meaningful for all possible worlds [11].

Interpretations of Probability Theory [10-11]

Since the notion of probability is deemed one of the foremost concepts in scientific investigation and spans its relevance to the philosophy of science in the analysis and interpretation of theories, epistemology, and philosophy of the mind, the foundations of probability [and its interpretations] which is held of utmost relevance in honing our understanding in statistics, bear, at least indirectly, and sometimes directly, upon scientific and philosophical concerns.

The probability function — a particular kind of function that is used to express the measure of a set (Billingsley, 1995) — may be interpreted as either physical or epistemic. In addition, the American Philosopher, Wesley C. Salmon (1966) provides a set of criteria for coming up with an adequate interpretation of probability which is briefly reviewed as follows [11]:

Admissibility — if the meanings assigned to the primitive terms in the interpretation transform the formal axioms, and consequently, all the theorems, into true statements.
Ascertainability — This criterion requires that there be some method by which, in principle at least, we can ascertain values of probabilities.
Applicability — The interpretation of probability should serve as a guide relative to the domain of discourse (or field of interest).

According to Salmon (as cited in Hájek, 2019), most of the work will be done by the applicability criterion. That is to say, more or less, that our decision for interpreting probability should cast the world in which we are interested. For example, Bayesian methods are more appropriately used when we know the prior distribution of our event space — for instances like rolling a dice where there is a geometric symmetry that follows a natural pattern of distribution. For most, experiments, however, Bayesian methods would require the researcher’s guess for setting some prior distribution over their hypotheses. This is where other interpretations may seem more appropriate.

Because we are more concerned with honing our deep understanding of statistics, I limit this article to the most relevant set of interpretations which can be classified into physical and epistemic classes. For a more detailed interpretation of probability, the reader is invited to consult the entry from the Stanford Encyclopedia of Philosophy on Interpretations of Probability [11].

Physical where the frequency or propensity of the occurrence of a state of affairs often referred to as the chance.
Epistemic where the degree of belief in the occurrence of the state of affairs, the willingness to act on its assumption, a degree of support or confirmation, or similar.

According to the University of Groningen Philosophy Professor Jan-Willem Romejin (2017), the distinction should not be confused with that between objective and subjective probability. Both physical and epistemic probability can be given an objective and subjective character, in the sense that both can be taken as dependent or independent of a knowing subject and her conceptual apparatus.

Meanwhile, the longheld debate between two different interpretations of probability namely being based on objective evidence and subjective degrees of belief has caused mathematicians such as Carl Friedrich Gauss and Pierre-Simon Laplace to search for alternatives for more than 200 years ago. As a result, two competing schools of statistics were developed: Bayesian theory and Frequentist theory.

Note that some authors may define the classical interpretation of probability as Bayesian while classical statistics is frequentist. To avoid this confusion, I will use the term classical to refer to the Frequentist theory.

In the following subsections, I will briefly define the key concepts between Bayesian Theory and Frequentist Theory of Statistics which I got from [14].

1. Bayesian Theory

The controversial key concept of the Bayesian School of thought was their assumption for prior probabilities — which relies solely on the researcher’s naive guess or confidence towards their hypotheses. But there are also good reasons for using Bayesian methods over the Frequentist approach. The following highlights the key ideas as to why you should or should not use Bayesian methods for your analysis.

Bayesian inference depends on one’s degree of confidence in the chosen prior. Bayesian inference uses probabilities for both hypotheses and data; it depends on the prior and likelihood of observed data [14].

Criticisms [14]

Subjective nature of selecting priors. There is no systematic method for selecting priors.
Assigning subjective priors does not constitute outcomes of repeatable experiments.

Reasons for using Bayesian Methods [14–15]

Using Bayesian methods is logically rigorous because once we have a prior distribution, all our calculations are carved with the certainty of deductive logic [14]. Philosophers of science usually come down strongly on the Bayesian side [15].
The simplicity of the Bayesian approach is especially appealing in a dynamic context, where data arrives sequentially, and where updating one’s beliefs is a natural practice[15].
By trying different priors, we can ascertain how sensitive our results are to the choice of priors [14].
It is relatively easier to communicate a result framed in terms of probabilities of hypotheses.

2. Frequentist Theory

While Bayesian methods rely on its priors, Frequentism focuses on behavior. The frequentist approach uses conditional distributions of data given specific hypotheses. The frequentist approach does not depend on a subjective prior that may vary from different researchers. However, there are some objections that one has to keep in mind when deciding to use a frequentist approach.

Criticism [14]

Struggles to balance behavior over a family of possible distributions.
It is highly experimental; it does not carry the template of deductive logic. P-values depend on the exact experimental set-up (p-values are the threshold of which inference is drawn).
P-values and Significance level (both are forms of a threshold for inferential decision) are notoriously prone to misinterpretation [14].

Reasons for using Frequentist Methods [14]

The frequentist approach dominated in the 20th century, and we have achieved tremendous scientific progress [14].
The Frequentist experimental design demands a careful description of the experiment and methods of analysis before starting — it helps control for the experimenter’s bias [14].

Comparison:

Comparative analysis by Little (2005):

	Bayesian	Frequentist
`+`	- complete	- inferences well calibrated
	- coherent	- no need to specify prior distributions
	- prescriptive	- flexible range of procedures
		- unbiasness, sufficiency, ancillarity...
		- widely applicable and dependable
		- asymptotic theory
		- easy to interpret
		- can be calculated by hand
	- strong inference from model	- strong model formulation & assessment
`-`	- too subjective for scientific inference	- incomplete
	- denies the role of randomization for design	- ambiguous
	- requires and relies on full specification of a model (likelihood and prior)	- incoherent
		- not prescriptive
		-not unified theory
		(over?)emphasis on asymptotic properties
	- weak model formulation & assessment	- weak inference from model

The difference is that Bayesians aim for the best possible performance versus a single (presumably correct) prior distribution, while frequentists hope to do reasonably well no matter what the correct prior might be [15].

Important Notes:

The competing notions between Frequentist’s approach to statistical analysis and Bayesian methods have been around for over 250 years. Both schools of thought have been challenged by one another. The Bayesian method had been greatly criticized for its subjective nature while the Frequentist’s method had been put into question for its justification of the probability threshold of which it draws an inference (p-values, and significance value). It is worth noting that, albeit the Frequentist’s approach to Statistic prevailed in
20th-century science, the resurgence of the Bayesian method has been greatly valued in 21st-century Statistics.

For a more detailed discussion for this matter, the reader is invited to consult [15].

Takeaways

Throughout this article, we learned the importance of statistics in science and discussed some cases where it could go wrong (p-hacking). We discussed the foundations of Statistics namely the roots of statistical thinking in the span of history and the debate between interpretations of probability. For honing our understanding, the reader is advised to read through the learning materials mentioned in the succeeding sections.

Learning Materials and external resources you should check out:

Brilliant.org | Statistics
Khan Academy | Statistics & Probability
MIT-OCW | Introduction to Probability & Statistics
MIT OCW RES.6–012| Introduction to Probability (2018)
MIT OCW 18.650| Statistics for Applications (2016)
Dr. Todd Grande| Statistical Analyses using SPSS
Brandon Foltz | Statistics Playlist
Kevin deLaplante | Reasoning with Probabilities

References

Romijn, Jan-Willem (2014). “Philosophy of statistics”. Stanford Encyclopedia of Philosophy.
Moses, Lincoln E. (1986) Think and Explain with Statistics, Addison-Wesley, ISBN 978–0–201–15619–5. pp. 1–3
Dodge, Y. (2006) The Oxford Dictionary of Statistical Terms, Oxford University Press. ISBN 0–19–920613–9.
Lund Research Ltd. “Descriptive and Inferential Statistics”. statistics.laerd.com.
Singh, Simon (2000). The codebook: the science of secrecy from ancient Egypt to quantum cryptography (1st Anchor Books ed.). New York: Anchor Books. ISBN 978–0–385–49532–5.
Ibrahim A. Al-Kadi “The origins of cryptology: The Arab contributions”, Cryptologia, 16(2) (April 1992) pp. 97–126.
Varberg, D. E. (1963). The development of modern statistics. The Mathematics Teacher, 56(4), 252–257.
Savage, L.J. (1972). Foundations of Statistics (second ed.).
Wikipedia contributors. (2019, July 20). Statistics. In Wikipedia, The Free Encyclopedia. Retrieved 14:09, July 27, 2019, from https://bityl.co/6Xtx
Romeijn, Jan-Willem, “Philosophy of Statistics”, The Stanford Encyclopedia of Philosophy (Spring 2017 Edition), Edward N. Zalta (ed.), URL = https://plato.stanford.edu/archives/spr2017/entries/statistics/.
Hájek, Alan, “Interpretations of Probability”, The Stanford Encyclopedia of Philosophy (Fall 2019 Edition), Edward N. Zalta (ed.), URL = https://plato.stanford.edu/archives/fall2019/entries/probability-interpret/.
Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD (2015) The Extent and Consequences of P-Hacking in Science. PLoS Biol 13(3): e1002106. https://doi.org/10.1371/journal.pbio.1002106.
Salmon, W., 1966, The Foundations of Scientific Inference, Pittsburgh: University of Pittsburgh Press.
Orloff, J. & Bloom, J. (n.d.). Comparison of frequentist and Bayesian inference. Retrieved from: https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading20.pdf
Efron, Bradley (2013). “A 250-year argument: Belief, behavior, and the bootstrap”. Bulletin of the American Mathematical Society. New Series. 50 (1): 129–146. doi:10.1090/s0273–0979–2012–01374–5.

DEV Community