In 2017, a few months before my mother passed away, my mom had a dream that in the near future my wife and I would have a family consisting of three daughters. I thought this dream prediction is an interesting study on "predictive life events", so I decided to write about it and synthesize this with basic math & statistical modeling concepts. Basically, after reading this article you will learn how to craft simple statistical models to assign probabilities of life events!
In this article, I will cover the following concepts:
- Most people think that measuring the environment is impossible, hard or don't know where to start. This article will show the reader how to craft simple statistical models using basic statistical rules to quantify (put math labels on) life events. You will see how easily these models can be crafted (even in your head!) and applied to forecasts, predictions and future decision making
- One doesn't need sophisticated Artificial Intelligence, Machine Learning or Neural Nets for performing accurate forecasts; classic "old school" statistical foundations still apply to many use cases
- Creating statistical models is an exercise in extracting knowledge and applying information about the environment. Secondly, one will see that predictive modeling is almost an "art" as much as it is about the mathematics and statistics
- How "predictive" my mom's dream forecast was and what causal conclusions one can draw from these types of life events. For example, did my mom predict afterlife's existance?
Note: All the images in the article were generated by Artificial Intelligence using Stable Diffusion 2.x1. The pictures depicting "daughters" were completely generated by artificial intelligence and any likeness to anyone is a coincidence.
Going back several years to one day in early 2017; I received a call from my parents that my mom had a dream the night prior, that I would have a family with my wife consisting of three daughters. At that time of my mom's predictive dream, my wife and I had been married for 2 years and we had no kids together. We were just starting family planning. My mom had not been well, so I thought this was just one of those "comfort" dreams where my mom was connecting with my "future family" via dreams. Having said that, that day in 2017 I didn't think much of my mom's dream. Fast forward seven years to 2023...I am blessed with a bigger family and YES, we have three young daughters!!! Our first daughter was born in 2018, second in 2020 and our third in 2022. Immediately when we found out the gender of our third baby in 2022, I started to think back to my mom's dream prediction of three daughters. How strange was this? Was this in fact a prediction from the "beyond"?
Calculating the Family Probability of Three Daughters - Part 1 - Naive Approach
Let's tackle the first question. How strange was my mom's predictive dream? Can a number be placed on it and in effect measure the probability of my mom being correct? Therefore, what is the probability for having a family of three daughters? One of the foundational rules of statistics can be used to calculate this: "The Statistical Multiplication Rule"2. The statistical multiplication rule, also known as the "rule of product," is a principle in probability that helps to calculate the probability of several events occurring together. It states that the probability of several independent events occurring simultaneously is the product of the probabilities of each event occurring individually.
For example, consider the probability of rolling a "4" on a six-sided die and also flipping a coin that lands on heads. The probability of rolling a 4 is 1/6 (one of the six sides of the dice) and the probability of flipping a coin to show heads is 1/2 (50%). Using the statistical multiplication rule, the calculated probability of both events occurring together is 1/6 x 1/2 = 1/12 (8.3%). Simple, right? The statistical multiplication rule can be used to calculate the probability of complex events that are composed of multiple independent events.
The multiplication rule can be applied to my mom's predictive dream of three daughters to assign a numeric probability. Based on global birth data, the actual probability of having a daughter is less than 49%3. To keep it simple, assume the probability of having a daughter [P(Daughter)] is 1/2 (50%). Furthermore, assume having each specific baby gender is an independent event, which means having a prior daughter has no impact on the gender of the future baby. This means that the total probability of having three daughters can simply be calculated by multiplying the independent probability of having a daughter three times. This equates to 1/2 * 1/2 * 1/2 = 1/8 (12.5%).
So, is this it? Well, this is a very naive model of the prediction and only part of the solution, but a good start. There are several key problems with this simple model, to highlight a couple issues:
- The model is assuming there will be three children born. The correct probability statement of the model above should be "Given there are three children in a family, what is the probability that all three would be girls". This is called a conditional probability4, as the probability is "conditioned" on the given event occurring.
- The model is missing a great deal of other various independent information that can influence the model of starting a family with children. Many (biological/environmental/family/economic) events must "perfectly align" to have a baby; it clearly isn't just as simple as forecasting of the gender.
Calculating the family Probability of Three Daughters - Part 2 - Improved Model with Selected Independent Events
It turns out the model that was calculated above is quite naive, overly optimistic, and frankly not exactly correct. While it is "roughly true" the probability of a baby daughter is roughly 1/2 (50%), life is much more complex. There is much more information to be added for the ability to maintain a family to birth three kids. To simplify, the statistical model will have two key components:
- Calculated probability of three daughters, given a family of three children (calculated in Part 1 as %12.5%)
- Calculated probability of maintaining a family with partner and being able to have three children with a spouse
To calculate the probability of maintaining a family and being able to have children, the following could be considered:
- What if either my wife or I couldn't have kids?
- What if our marriage failed (before or during the family process) and ended in divorce?
- What if one of us lost our job and we couldn't afford to have more children?
- Going more out of the box...What if WW3 started or an asteroid was going to destroy the planet?
There are almost infinite scenarios to list! This is where the process of crafting statistical models becomes an exercise in understanding our environment and somewhat of an "art"! Which events should be considered in the model to make proper inferences, claims or forecasts? Highly unlikely events such as WW3 happening or a black hole swallowing up earth will absolutely impact family planning, however, probably not enough to impact to the statistical model. More importantly, which of these events are independent? Which ones can be used in the model using the earlier introduced "the statistical multiplication rule"?
Some of the events that can be considered are more likely to happen on average (i.e. a marriage ends in divorce, infertility) versus a catastrophic asteroid/comet impact. All these events have a non-zero (>0%) probability of happening even if the probability is super low (nothing in forecasting the future has a 0% probability). The good news is that many of these events have been studied with many years of empirical data and can simply be looked up at no cost using a search engine. For example, scientists estimate the chance of an asteroid/comet hitting Earth and causing catastrophic impact in a given year is 1 in 300,000 (or 0.0003% probability)5. Since a catastrophic impact either happens or it doesn't (binary event), it can be re-stated another way: there is a 299,999 in 300,000 (or 99.9967%) chance of an asteroid/comet NOT having catastrophic impact in a given year. It could be argued a great deal of these events probably shouldn't be taken into consideration into the statistical model of "a family with three daughters". It is generally accepted to exclude highly improbable events from models (i.e. large asteroid/comet hitting Earth) as they do not change the overall "spirit" of the model nor do they change the overall probability meaningfully. Excluding these events does potentially leave the statistical model unable to predict "black swan" events. For example, how many business/sales/inventory supply chain/stock market forecasts predicted a global pandemic (black swan event) in 2020? As the global pandemic has played out over 3 years so far, it turns out not many forecasts accounted for "black swan" events. Therefore, it is up to the statistician to decide which events to include.
After adding a set of selected independent events, what does the model architecture look like visually? The diagram below displays the overall statistical structure. The foundation is still the calculation of the calculated probability of having three daughters (given a family of three kids) multiplied by the probabilities of selected set of independent events. Per the diagram below, the statistical model has now become a synthesis of math & subjectively selected independent events. Note even with just some basic & plausible events added, the model's probability output has been cut in half to 6.5%! The "simple" statistical model has now become quite complex, but arguably much more accurate. This is where you can clearly see it is up to the statistician to select the events to add into the model and compose the potential story/narrative that this will ultimately convey; where statistical modeling becomes "art".
The model structure above is meant to be more instructive in nature, but it is much better than the model in Part 1. An external observer could argue to add more high impact probabilistic impact events, such as: chance of getting a serious disease, chance of a serious accident, chance of having a special needs baby etc. In fact, some thought leaders state that because the statistician performing the modeling is applying "structured game" modeling concepts (dice, flipping coin, cards) and determining a set of fixed unknowns (set of independent events), that it really is impossible to perform this measurement. This misuse of using simple statistical models to model complex life unknowns is called the Lucid fallacy6. The next part will demonstrate how all these various events can be potentially "collapsed" and replaced using very accurate empirical data.
Calculating the Family Probability of Three Daughters - Part 3 - Consolidate & Collapse Information
Clearly adding many independent events adds inherent complexity to the statistical model. As mentioned earlier, which actual events are added to the model is directly influenced by the statistician's diligence and can quickly fall into the Lucid fallacy6. This variability could greatly affect the predictive power of the statistical model. The second key issue with adding a many independent events is that almost no events are truly independent. Independent events are ones where one event doesn't convey information about another event. For example, an asteroid hitting earth probably is truly indepdent of a family having three daughters. However, there could be inter-dependence and dependence on multiple events together. This is what statisticians need to be aware of. For example, the probability of divorce can be influenced if one partner in the relationship can't have children or if one partner can't keep a job. Basically, some partners simply won't stick around through tough times and that is a dependence. Another example: what if someone falls seriously ill causing them to lose their job, lose their fertility or lose their partner thus not being able to ever achieve a family of 3 children. Therefore, a great deal of these events are called "naively" independent in the real world. Finally, unless one has the true intersections of the data, it is hard to calculate the how much dependence there is on each of the events.
Rather than coming up with a selected set of independent events that are "just right" for the model, using empirical data these events can be "collapsed" into an average probability with fewer assumptions. Luckily, we are living in the era of big data and these available data sets are very accessible for research. There are families that have gone through family planning, have tried to have three children and potentially failed. I decided to use the USA Census Bureau data as it has USA family structure that on average resembles the economic, fertility, capability to have a family of both parents with three or more children. In my opinion, the most recent USA census data is much more predictive in nature versus global data for my model as I reside in the USA, and I am tied closer to the economic and environment in the USA. There are two key pieces of data from tens of thousands of USA families used in the updated model:
- Percentage of families having three or more children in a family - 12.6% (can be interpreted as a frequentist probability from historical data)7
- Percentage of families with both parents present - 72.9% (can be interpreted as a frequentist probability from historical data)7
Note: I am making the assumption that these two events are independent (for the most part). The structure of a family (i.e. single father, single mother or both parents) does not convey much information about how many children are present. Conversely, how many children are in a family doesn't convey much information on the structure of the family either. A good test of independence is asking simple questions of "knowing". For example, if someone told you there is a couple that lives several blocks from your residence and that is the only piece of information you have. Then that person asked you the following: "How many kids does that couple have?". Without other pieces of information, it is unlikely to determine a confident and accurate answer. Therefore, these two events from the Census Bureau could be considered independent and multiplied together using the statistical multiplication rule.2
These two data sets can be interpreted as average probabilities and "collapse" all those possible independent events (divorce, economic, fertility, fear of catastrophe) on average into a single probability. The model is much simpler and backed by government certified data! Furthermore, it also can be argued that empiricla data eliminates the lucid fallacy6 to a large degree, as a great deal of the subjectivity has been replaced with census data. The statistical model with the updated derived family census probabilities is now forecasting a 1.1% probability of my mom's predictive dream being possible due to random chance. Basically, this is a 1 in a 100 chance.
Why my mom didn't Predict Supernatural Existence
If this had been set up as a proper frequentist inference hypothesis experiment, then one could conclude something called statistical significance8. This is because my mom's predictive dream was so unlikely to happen by chance (1.1% calculated probability) and it was under a commonly used statistical threshold (5%); 1.1% < 5% thus statistical significance. The term statistical significance8 sounds impressive linguistically, but pragmatically, it does not allow someone to claim anything about the environment; nor does it prove things like the afterlife or supernatural existence. You probably have heard the term "correlation does not imply causation". While this model did not perform a correlation exercise with this statistical model the same spirit applies. So, how does one interpret or use a statistical model? One can use the model created to ask the question, "Is this weird or likely to happen just by chance?" and if the probability is low, one should investigate further. Based on the model probability, in my opinion one can claim that this predictive dream was highly unlikely due to chance, and a further investigation is warranted. So, let's investigate further.
John Allen Paulos is a mathematician and author who has written about the role of probability and statistics in everyday life, including the topic of dreams and their potential relationship to the future. In his book "A Mathematician Plays the Stock Market", Paulos discusses the idea that some people believe dreams can be predictive, offering an example of someone who dreams about a specific stock and then sees that stock rise in value the next day. Paulos notes that while this may seem like a coincidence, it is an example of the "law of truly large numbers," which states that with a large enough sample size, any outcome is possible. Paulos goes on to explain that it is natural for people to look for patterns and connections in the world around them, including in their dreams, and that this can lead to the illusion of causality when there is none. In other words, while it may seem like a dream was predictive of a future event, the reality is that the event was likely to happen anyway, and the dream was simply a coincidence.
Highly recommended book Innumeracy for basic statistical understanding
In his other book "Innumeracy", Paulos derives a simple statistical model (much like that was done above) for predictive dreams showing that they are potentially not as rare as people assume. He estimates that on a given night, the probability of a dream having some predictive nature just by random chance is 1/10,000 (0.01% probability). Highly unlikely, right? However, during a year you have 365 days or chances to have a 0.01% of a predictive dream. This is a little tricky to calculate if you are not versed in some statistics. It is NOT calculated as 365 days * 0.01% = 3.65%. There are several ways to-do this, but one simple way to think about this is using the Statistics Multiplication Rule2:
- First, set up the solution as a binary event problem. This means that there is a possibility of just two outcomes 1) having a predictive dream in a day result in an accurate outcome or 2) NOT having a predictive dream or predictive outcome in a given day. There are two possible outcomes: a predictive dream or no predictive dream in this system and no other possible outcomes for simplicity.
- Second, assign probabilities. Since, the system has only two possible event outcomes the probabilities must add up to 100%. Therefore, the probability of having a predictive dream per day is = 0.01% and the probability of NOT having a predictive dream per day must be 99.99% (100% - 0.01% = 99.99%).
- Third, use the multiplication rule to calculate the probability over the time period. The probability of NOT having a predictive dream in 365 days is simply calculated 99.99% * 99.99% * 99.99% (repeated 365 multiplication times) or 99.99%^365 = 96.4%. This should make sense as time goes on and the time period of days becomes larger, the chances of eventually NOT having a predictive dream eventually goes lower.
- Fourth, now that there is a calculated probability of NOT having a predictive dream in a year (365 days) as 96.4%, the probability of having a predictive dream in a year can be calculated. Using the same compliment rule, if the probability of NOT having even a single predictive dream in a year (365 days) is 96.4%, then the probability of having at least one predictive dream in a year must be the compliment (100% - 96.4% = 3.6%) which adds to 100%; 3.6%.
Therefore, Paulos claims that in a given random year, a random human has a 3.6% chance of having at least one predictive dream. Still a pretty low chance. Now consider this probability over multiple years, all the people you know or all the people in the USA and the probability of a predictive dream actually becomes quite likely and almost certain to happen. Some examples of long periods of time with calculations:
- In a 20-year time period and assuming the probability of having a predictive dream per year is consistently 3.6%; there is a ~52% chance (better than half) that a person will have a predictive dream in a 20-year period. (Note: this is calculated using the technique above or other methods)
- If you know 100 people/acquaintances and assuming the probability of having a predictive dream per year is consistently 3.6%; there is a 97.4% chance that at least one of those 100 people will have had a predictive dream in a given year! (Note: this is calculated using the technique above or other methods)
These predictive dream probabilities become highly un-impressive by the law of large numbers even if the probability is very small (i.e. Earth will eventually be hit by a catastrophic asteroid/comet given enough chances over time). However, there are several huge environment assumptions and let's consider the devil's advocate predictive dream scenario:
- Assume the person having predictive dreams is keeping track of their dreams and noting the details. If someone's aunt or mom is calling them every day telling them about their various dreams and she hits on an occasional detail that probably isn't very impressive. However, if someone gets a call out of the blue one day of a predictive dream that happens to be highly detailed and correct; that is a little more "stranger" and much less explained by the law of large numbers as it is essentially a sample of one.
- Predictive dreams are usually relayed in a past sense where hindsight bias9 can become relevant. For example, someone gets that big promotion for a job, and they call their parents to share the good news; then their mom responds "I dreamt that last week that you would get a promotion! I just knew it!". A lot less impressive for someone to say they knew it was going to happen after an event has happened.
- Finally, most predictive dreams have little substantiative detail and are loosely interpreted as predictive. A great example is Nostradamus, astrology, or a fortune cookie where the details are quite squishy and could apply to many things in one's life. Astrology and fortune cookies are phrased in amorphous ways, so that almost anyone can interpret them as specific meaning for them individually.
What is interesting in my mom's dream scenario is that my mom's predictive dream was highly specific: having three daughters with my wife. It wasn't a nebulous astrology/fortune teller prediction of "you will have a successful family". Most importantly, I don't ever remember my mom having predictive dreams nor telling me about them. This is probably my personal bias speaking and I am sure my mom did have predictive dreams she told me about, but honestly can't name another case. If my mom had even a handful predictive dreams and she had been calling me about then, I would be in fundamental agreement with John Allen Paulos's interpretation. To be clear, I do not think any afterlife or supernatural was confirmed with the statistical model. However, given the specificity of the predictive dream and how it was basically a single prediction event; I do believe there was "something" special here that wasn't explainable by just random chance.
Summary and Conclusion
My mom's unique dream prediction came true without her even knowing our family plans and all the challanges my family could face. There are many things that I can't explain and can't answer. I want to believe that there was some kind of supernatural connection, but I am a logical person and can't ignore the math and science in this world. I think about this a lot, what if my mom had never dreamed this dream? What if this never happened? I am a strong believer that everything happens for a reason and that there is a higher purpose for all of us. While my mom's predictive dream was the vehicle for the article, the core concept was to show an approach how to craft simple statistical models. In this article:
- Introduced an approach to quantify life events using simple statisical models
- Showed how many probabilities have been studied and can simply be looked up, which can then be used as inputs into the model
- Illuminated how to keep the model approach, techniques and data as transparent as possible. This way the reader can draw their own conclusions and challenge the approach accordingly. Basically, don't "lie with statistics" and paint your own biased narratives with statistics.
- Demonstrated howto document personal biases and potentially include a devil's advocate position to try to a "middle ground" narrative
- Exhibited that statistical models should be used as tools that can illuminate "rare" events. Good statistical models are basically indicators of "strange" and "is it weird". However, they should NOT be used as tools to draw causal conclusions. Usually much further thought and investigation should be done
Stable Diffusion on Hugging Face: https://huggingface.co/spaces/stabilityai/stable-diffusion ↩
General Multiplication Rule (Explanation & Examples): https://www.statology.org/general-multiplication-rule/ ↩
Birth rates for male to female are skewed to male across the world. On average 51%+ of new baby births are male: https://ourworldindata.org/sex-ratio-at-birth ↩
Conditional Probability foundations: https://www.investopedia.com/terms/c/conditional_probability.asp ↩
Chance of catastrophic impact from asteroid or comet: https://stardate.org/astro-guide/faqs/what-chance-earth-being-hit-comet-or-asteroid ↩
Lucid fallacy (Explanation & Examples): https://en.wikipedia.org/wiki/Ludic_fallacy ↩
US Census America's Families and Living Arrangements: https://www.census.gov/data/tables/2022/demo/families/cps-2022.html ↩
Statistical Significance Overview: https://www.investopedia.com/terms/s/statistical-significance.asp ↩
Hindsight Bias (Explanation & Examples): https://en.wikipedia.org/wiki/Hindsight_bias ↩
Top comments (3)
I loved reading this, very interesting and also very informative around the knee jerk rules of thumb and the reason why models of the same phenomena diverge.
To build your model for a person we'd also need to know about how many families they had seen with 3 children in their general life, the prevalence of girls vs boys in their family etc. Also the likelihood of having a third female child increases when previous offspring have all been girls too, it can also have medical implications (my ex-wife is suspected of miscarrying male children due to a genetic condition) - so much to factor in...
At the individual level models can seem meaningless, even unfair, but on a societal level they are vital for understanding and decision support.
Exactly, the idea is to have a position on your “architecture” of the model, but by being open and doing math you are “showing your work” so others can contribute and improve. These statistical models are much more powerful than human intuition/expertise as they are: consistent, fair, consider various input, can be less biased if performed with diverse opinions.
This is really interesting!