Sameer Soin

Posted on Sep 9, 2021 • Originally published at thewanderingengineer.hashnode.dev

Probability vs Likelihood

#machinelearning #statistics #probability

When I first came across the term 'Likelihood' in the same sentence having the word 'Probability', I was very confused and could not make any sense of that statement! This actually happened while I was learning about maximum likelihood estimation for logistic regression. The reason for this confusion was my common day understanding that the likelihood of something refers to the probability of something and the often incorrect use of these terms interchangeably. I then started digging more in order to make any sense of that statement. And therefore, in this blog, I have tried to explain the difference between likelihood and probability.

Understanding Probability

The common understanding of Probability of an event is that it is the ratio of the favorable outcomes of the event to the total possible outcomes for the concerned experiment. (in the example of housing prices of a housing dataset, the event refers to getting a house having the price less than x, and the experiment refers to the random picking of a house from a sample).

Let's take a little deeper look at the above definition. The above definition of probability is defined over some underlying assumptions on the experiment, or the process of that experiment (like all houses are equally likely to be picked, all neighborhoods' houses are equally likely to be picked, etc.) that are known as the parameters of that experiment, \(\theta\).

When the parameters \(\theta\) of the experiment are fixed, it gives a specific probability density function or probability distribution (in the case of discrete variables) for the outcomes of an event. Given the probability density function, the probability of an event can be easily computed by finding the area under the curve for the concerned event as shown in the below image for house prices more than x.

This means that if a house is picked, the probability of its price being more than x is given by the area of the shaded region in the above image.

Mathematical representation

Given the parameters \(\theta\), the probability of an event is formally represented as:

P(price > x | mean = y and variance = z).

*Note: For the purpose of this example, I have assumed the houses price as a normal distribution with some mean and variance. The concept is valid for all distributions.
The mean and variance are the characteristics of the PDF (and define its position and shape respectively) and hence are written on the right-hand side of the 'pipe' symbol in the formal representation. *

Since the parameters \(\theta\) define the PDF, and changing the parameters can change the PDF, it can also be wirtten as:

P(price > x | \(\theta\)) or P(data | \(\theta\))

This implies that given the fixed process parameters, the probability of an event or a value will change on changing the event. For example, P(price > 100k | \(\theta\)) will be different from P(price > 80k | \(\theta\)).

Understanding Likelihood

Often when trying to model a real-world process, the parameters \(\theta\) are unknown. We observe the outcomes O and try to estimate the parameters \(\theta\) that are a plausible choice for the given observed outcomes O. This implies that given the observed outcomes O, we try to estimate the PDF or the probability distribution that best explains the observed outcomes O.

In the above image, as we keep changing the model parameters \(\theta\), we get different PDFs that give different values on the y-axis for the given data point x. These different values are known as the likelihood of the distribution.

*Therefore, Likelihood is the plausability of a particular distribution explaining the given data. The higher the likelihood of a distribution, the more likely it is to explain the observed data.
*

In the housing price example given above to explain the probability, suppose we change the model or process parameters \(\theta\) (model parameters refer to some set of values that define the house picking process like, all the neighborhoods are not equally likely to be picked and the picking process has some inherent bias for one neighborhood over another, etc.) it will change the PDFs of the house prices and thus the likelihood of the distribution for the given data will change with the model parameters \(\theta\).

Mathematical representation

Mathematically likelihood for a given data is represented as:

L(\(\theta\) | data)

This implies that the model parameters are changed and the observed data is kept fixed.

Summary

Probability is how likely are the chances of a certain data to occur if the model parameters are fixed and Likelihood is the chances of a particular model parameter explaining the given observed data.

The mathematical representation of Probability and Likelihood have reverse order of occurrence for the observed data and the model parameters. In probability, the \(\theta\) appears on the right-hand side and is kept fixed while the data is varied, and in likelihood, the observed data appears on the right-hand side and is kept fixed while the model parameters \(\theta\) are varied.

Note: The same concept is valid for continuous variables as well as random variables.

I hope this blog would have helped you in understanding the crucial difference between probability and likelihood in the statistics world. If you have any doubts, or suggestions regarding the writing or the explanation of the concept, please let me know in the comments.

Oldest comments (4)

Bernd Wechner • Sep 10 '21 • Edited

Thanks for the article. I appreciate it and cherish probability theory. I did find it hard to read because of sloppy markdown though and recommend you replace:

(\theta)

with θ or (θ) depending on what you mean.

If struggling with Greek characters BTW you can rest confident in 2021 that almost any context in which you are writing or reading, unicode accepted (the days of uncertainty in this space while not completely behind us are effectively so). And so you can always search on-line for say "Unicode Theta" and copy/paste.

I'd also counsel against claims like:

"the often incorrect use of these terms interchangeably"

There is nothing, repeat, nothing incorrect about using these terms interchangeably. The claim rests in a misunderstanding or misrepresentation of language and its context. These terms are synonyms and they can be used interchangeably, as evidenced by the fact that they are!

When a specific discipline applies English words in a specific ways, then these terms are defined clearly for use within that discipline and often found well defined in texts on the subject (albeit often omitted from academic papers when the discipline is clear from the context - typically the journal publishing the paper and its narrow focus).

In this case you're referring specifically to Probability Theory in which Probability is generally an idea attached to outcomes or results while Likelihood is attached to hypothesis of models. Both describe in their own way confidence or uncertainty (on a scale of 0 certainly not to 1 certainly so and anything in between with 0.5 roughly anyman's guess or could go either way who knows?).

But probability is a measure of likelihood in common parlance likelihood is measured by probability in common parlance and that is not an error, that is, the common tongue, as distinct from the in-discipline jargon. To wit when using the terms one needs a clear context to be tabled, and lacking one, the common parlance is generally the safe assumption as anyone in-discipline will (and should) make the context clear.

Finally, a stylistic tip for on-line writing: In your into, start with the conclusion or premise if you will, the claim that the body will make clear .... I also found this a tad hard to read as I like many am an impatient on-line reader and I'm immediately into a section on what probability is and asking myself "cut to the chase" (as in I know what probability is). The introduction would work well to table the a priori claim that in probability theory, Probability is used to describe the chance of a given result or outcome, and Likelihood us used to describe the chance that a given hypothesis or model is accurate ... allow me to explain ... and then proceed into the two following sections. This whets the appetite, and draws a reader in if they want, and allows more experienced readers to nod, and think aha, good reminder and move on.

coinflips flip • Dec 6 '24

kindly tell me one thing that when we coin toss what is the probability of getting head or tails?

Bernd Wechner • Dec 23 '24

Is that a serious question?

Kyllie • May 29 '25

Great explanation! I always struggled with understanding the difference between probability and likelihood, especially when getting into machine learning models like logistic regression. The distinction you made about which variable is fixed really helped clarify things. Thanks for sharing this!