The Strange Math That Predicts (Almost) Anything

Key Concepts:

Law of Large Numbers: The average outcome of independent trials approaches the expected value as the number of trials increases.
Independence vs. Dependence: Whether events influence each other. Independent events don't, dependent events do.
Markov Chain: A sequence of events where the probability of each event depends only on the state attained in the previous event.
Transition Probabilities: The probabilities of moving from one state to another in a Markov chain.
Monte Carlo Method: A computational technique that uses random sampling to obtain numerical results.
PageRank: An algorithm used by Google to rank websites in search results based on the network of links pointing to them.
Damping Factor: A probability used in PageRank to ensure that a random surfer can jump to any page on the web, preventing them from getting stuck in loops.
Attention (in Language Models): A mechanism that allows a model to focus on the most relevant parts of the input when making predictions.
Feedback Loops: Situations where the output of a system influences its input, making it difficult to predict its behavior.

1. The Math Feud in Russia:

In 1905, Russia was divided between Tsarists (supporting the Tsar) and Socialists (demanding political reform).
This division extended to mathematicians: Pavel Nekrasov (Tsarist, "Tsar of Probability") vs. Andrey Markov (Socialist, "Andrey The Furious").
Nekrasov argued math could explain free will and the will of God, while Markov, an atheist, criticized Nekrasov's work as unrigorous.

2. Nekrasov's Argument for Free Will:

Nekrasov believed the Law of Large Numbers implied independence.
He observed convergence in social statistics (e.g., Belgian marriages from 1841-1845, averaging around 29,000 per year).
He reasoned that because these statistics followed the Law of Large Numbers, the decisions causing them (marriage, crime, birth) must be independent acts of free will.

3. Markov's Counter-Argument and Markov Chains:

Markov aimed to prove that dependent events could also follow the Law of Large Numbers.
He used "Eugene Onegin" by Alexander Pushkin, analyzing the first 20,000 letters.
He found that vowel-vowel pairs occurred only 6% of the time, far less than the 18% expected if letters were independent.
Markov created a "prediction machine" (Markov chain) with states for vowels and consonants and transition probabilities between them.
This machine demonstrated that even with dependent events, the ratio of vowels to consonants converged to the observed values (43% vowels, 57% consonants).
Markov concluded that observing convergence in social statistics doesn't prove independence or free will.
"Thus, free will is not necessary to do probability."
Markov's work allowed for probability calculations with dependent events, a significant breakthrough.

4. The Manhattan Project and the Monte Carlo Method:

Stanislaw Ulam, a mathematician, worked on the Manhattan Project.
He sought to understand neutron behavior inside a nuclear bomb to determine the amount of uranium-235 needed.
After suffering from encephalitis, Ulam played Solitaire and wondered about the probability of winning a randomly shuffled game.
He realized he could approximate the answer by playing hundreds of games and counting wins.
Ulam applied this idea to neutron behavior, simulating random outcomes.
John von Neumann recognized the power of Ulam's idea but noted that neutron behavior is dependent, requiring a Markov chain.
They created a simplified Markov chain to model neutron scattering, absorption, and fission.
This chain was run on the ENIAC, the first electronic computer, to calculate the multiplication factor (k).
If k > 1, the reaction grows exponentially, leading to a bomb.
Ulam named the method "Monte Carlo" after his uncle, a gambler, and the Monte Carlo Casino.

5. Google's PageRank Algorithm:

In the mid-1990s, the internet exploded, creating a need for better search engines.
Early search engines ranked pages by keyword frequency, which was easily manipulated.
Sergey Brin and Larry Page at Stanford developed PageRank, modeling the web as a Markov chain.
Each webpage is a state, and links between pages are transitions.
The more links a page receives, the higher its rank. However, links from highly ranked pages are worth more.
A "damping factor" (15%) was introduced to prevent random surfers from getting stuck in loops and to ensure all parts of the web are explored.
PageRank allowed Google to provide more relevant search results.
Google was initially called BackRub, then Googol, and finally Google due to a misspelling.

6. Language Models and Markov Chains:

Claude Shannon, the father of information theory, used Markov chains to predict text.
He found that using more previous words as predictors improved the accuracy of the predictions.
Modern language models use Markov chains and "attention" mechanisms to predict the next word in a sequence.
Attention allows the model to focus on the most relevant parts of the input.
However, feedback loops, where the output of a model becomes training data for future models, can lead to a "dull, stable state."

7. Limitations and Power of Markov Chains:

Markov chains don't work well for systems with strong feedback loops, such as global warming.
However, for many dependent systems, Markov chains offer a way of doing probability.
Markov chains are "memoryless," meaning they only consider the current state, simplifying complex systems.
"Problem-solving is often a matter of cooking up an appropriate Markov chain."

8. Card Shuffling and Randomness:

9. Conclusion:

The development of Markov chains stemmed from a math feud between Nekrasov and Markov.
Markov's work has had a profound impact on various fields, including nuclear physics (Monte Carlo method), search engines (PageRank), and language models.
Markov chains provide a powerful tool for modeling and predicting the behavior of complex, dependent systems by focusing on the current state and ignoring the past.