## DEV Community is a community of 865,621 amazing developers

We're a place where coders share, stay up-to-date and grow their careers.

Mukesh Mithrakumar

Posted on

# Why Probability for Deep Learning?

Unlike the world of computer scientists and software engineers where things are entirely deterministic and certain, the world of machine learning must always deal with uncertain quantities and sometimes stochastic (non-deterministic or randomly determined) quantities.

There are three possible sources of uncertainty:

1. Inherent stochasticity: These are systems that have inherent randomness. Like using the python rand() function which outputs random numbers each time you run, or the dynamics of subatomic particles in quantum mechanics which are described as probabilistic in quantum mechanics.

2. Incomplete observability: The best example for this is the Monty Hall problem, the one in the movie 21 Jim Sturgess gets asked, there are three doors and there's a ferrari behind one door and the other two lead to a goat. Watch the scene to understand how to solve the Monty Hall problem. In this even though the contestant's choice is deterministic, but from the contestant's point of view the outcome is uncertain and deterministic systems appear to be stochastic when you can't observe all the variables.

3. Incomplete modeling: Spoiler Warning, well at this point I doubt it's a spoiler! Well, at the end of End Game, when Iron man snapped away all of Thanos' forces, (I know, still recovering from the scene), we are left to wonder what happened to Gamora right, was she snapped away because she was with Thanos's forces initially or was she saved because she turned against Thanos. When we discard some information about the model the discarded information in this case whether Tony knew Gamora was good or bad results in an uncertainty in the model's predictions, in this case we don't know for certain if she is alive or not.

Okay, swear, last Avengers reference.

When Dr. Strange said we have 1 in 14 million chances of winning the war, he practically saw those 14 million futures, this is called frequentist probability, which defines an event's probability as the limit of its relative frequency in a large number of trials. But not always do we have Dr. Strange's time stone to see all the possible futures or events that are repeatable, in this case we turn to Bayesian probability, which uses probability to represent a degree of belief for certain events, with 1 indicating absolute certainty and 0 indicating absolute uncertainty.

Even though the frequentist probability is related to rates at which events occur and Bayesian probability is related to qualitative levels of certainty, we treat both of them as behaving the same and we use the exact same formulas to compute the probability of events.

This is section one of the Chapter on Probability and Information Theory with Tensorflow 2.0 of the Book Deep Learning with Tensorflow 2.0.

You can read this section and the following topics:

03.00 - Probability and Information Theory
03.01 - Why Probability?
03.02 - Random Variables
03.03 - Probability Distributions
03.04 - Marginal Probability
03.05 - Conditional Probability
03.06 - The Chain Rule of Conditional Probabilities
03.07 - Independence and Conditional Independence
03.08 - Expectation, Variance and Covariance
03.09 - Common Probability Distributions
03.10 - Useful Properties of Common Functions
03.11 - Bayes' Rule
03.12 - Technical Details of Continuous Variables
03.13 - Information Theory
03.14 - Structured Probabilistic Models

at Deep Learning With TF 2.0: 03.00- Probability and Information Theory. You can get the code for this article and the rest of the chapter here. Links to the notebook in Google Colab and Jupyter Binder are at the end of the notebook.