You walk into a room full of confused college students, each wearing shirts bearing formulae they can neither remember nor explain. They are attempting some sort of coordinated dance, but are so disorganized that they accidentally bump into each other and fall over. The lights are flickering and one of the students steps on a rake, launching a textbook up into the fluorescent lights above and cracking open the bulb, releasing poisonous mercury into the room. In a panic, the professor throws a stone paperweight through the window to get some fresh air.

Every one of these students gets a C and goes to the bar to promptly forget everything that just happened.

This is not a commentary on the students. This is what a typical statistics class feels like to a mathematician. There's no foundation. No axioms followed by proofs, only a confused cacophony of formulas and statistical tests that the careful student can perform.

The mathematical discipline that underlies probability and statistics is called Measure Theory.

To begin, we start with set theory. What we are trying to find is a way to measure the size of the set, but this size needs to be well behaved on infinite sets. This "measure" of sets will be interpreted as the probability that an event happens.

In order to make a concrete definition of a measure we can use to compute probabilities, let's start with the set ℝ of real numbers. P(ℝ) is called the power set of ℝ, it's the set of all subsets of ℝ.

The basic shape of the measure will look like this:
$$m : \Omega \to \mathbb{R}$$
Where Ω ⊂ P(ℝ) (omega is a collection of subsets of ℝ).

Also, we will want the following properties out of a measure:

The empty set as 0 measure: $m(\empty)=0$
Countable additivity: $ m(\cup_i E_i) = \sum_i m(E_i) $ whenever $ \{ E_i \}_i $ is a collection of disjoint sets.
Non-negativity: $ m(E_i) \geq 0 $ for all $ E_i in \Omega $

This will motivate a structure called a sigma-algebra. We don't want just any set Ω ⊂ P(ℝ). In order to satisfy the properties of a measure, we want a set Ω that contains the empty set and the whole set, ℝ, and because of countable additivity, we want closure under set complement too. This is what lets us do things like take the probability that an event doesn't happen. Put symbolically, if E is in Ω, m(E) is defined, and so is m(ℝ-E)

And if m is a probability measure, then:
$$m(E) + m(ℝ-E) = 0$$

The French mathematician Henri Lebesgue was trying to improve on Riemann integration, and in doing so introduced a measure of the size of sets that we can use for probability:

Lebesgue Outer Measure

$$m^*(A) = \inf \{ Z_A \} $$

where

$$Z_A = \left\{ \sum_{n=1}^\infty \ell(I_n) : A \subseteq \cup_{n=1}^\infty I_n \right\}$$

And the $ I_n $ are intervals, indexed by n.

Which can be restated in words as "the Lebesgue measure of a set is the infimum (greatest lower bound) of the sum of the lengths of the covering intervals.

The collection $ \{ I_n \}_n $ of intervals whose union contains A is called a cover of A

tlehman.blog >

Probability from first principles

Lebesgue Outer Measure