Probability theory

Let Ω be a set with probability measure $\mathbb{P}$ defined (on some sigma-algebra). A random variable is a measurable function $X:\Omega\to \mathbb{R}$ . (Roughly speaking, it means $\mathbb{P}\{\omega:X(\omega)\in(a,b)\}$ defined for every interval $(a,b)$ ). It’s impossible to study random variable directly, instead we study $X$ by studying its probability distribution. The probability distribution of $X$ is a measure μ on $\mathbb{R}$ given by $\mu(a,b) = \mathbb{P}\{\omega\in \Omega:X(\omega)\in (a,b)\}$ , it is a probability measure on $\mathbb{R}$ . If $d\mu = f(x)dx$ for some function $f(x)$ , we call $f$ the density function of $X$ .

Expectations¶

The expectation of $X$ is given by $EX = \int_{\Omega}X(\omega)d\mathbb{P}(\omega)$ . To calculate integrals of random variable and function of random variable, one usually use the following handy formula:

Variance¶

The variance of $X$ is the $L^2$ norm of $X - E(X)$ , i.e. $D(X) = E((X - E(X))^2 ) = E(X^2) - (E(X))^2$ .

The Characteristic function¶

Let $X$ be a random variable, the characteristic function of $X$ is its Fourier transform, given by $\Phi_X(s) = E(e^{-2\pi i sX})$ .

Let $d\mu$ be the law of distribution of $X$ , then $\Phi_X(s) = \mathcal F\mu(s)$ . Recall that this is given by $\mathcal F\mu(s) = \int_{\mathbb{R}}e^{-2 \pi i s x}d\mu(x)$ .
If $X$ has a distribution function $f(t)$ , then $\Phi_X(s) = \int_{-\infty}^{+\infty}f(t)e^{-2\pi i s t}dt = \mathcal Ff(s)$ .

Characteristic function and momentum of $X$ ¶

The characteristic function can detect momentums of $X$ . By derivative theorem $\frac{d^k}{ds^k}\Phi_X(s) = \mathcal F((-2\pi i x)^k \cdot \mu)(s) = \int_{-\infty}^{+\infty}(-2\pi i x )^k e^{-2\pi i s x}d\mu(x) = (-2\pi i )^k \int x^k e^{-2\pi i s x}d\mu(x).$ So let $s = 0$ we get $E(X^k) = \frac{1}{(-2\pi i )^k}\Phi_X^{(k)}(0)$ .
Converesely, if $\Phi_X$ has global Taylor expansion (this is the case when $X$ or $d\mu$ is compactly supported so its Fourier transform is analytic by Pauley Wiener theorem),
$\Phi_X(s) = \sum_{k\geq 0}\frac{\Phi^{(k)}(0)}{k!}s^{k} = \sum_{k\geq 0} \frac{(-2\pi i s)^k E(X^k)}{k!}.$
(2)

Uncertainty principle¶

A complex function ψ is called a wave function if $|\psi(x)|^2$ is a finite measure on $\mathbb{R}$ . We normalize so that it is a probability measure. We also assume $\psi\in \mathcal S$ for simplicity, so that $\mathcal F\psi$ is also in $\mathcal S$ . The uncertainty principle says that the variance of random variables ψ and $\mathcal F\psi$ cannot be both small. In other words, the measurements ψ and $\mathcal F\psi$ cannot be simouteneously localized.

Theorem (Uncertainty principle)¶

\int_{\mathbb{R}}x^2 |\psi(x)|^2 dx \cdot \int_{\mathbb R}s^2|\mathcal F\psi (s)|^2 ds\geq \frac{1}{16\pi^2}.

(3)

Proof. By our normalization we have $\int |\psi (x)|^2 dx= 1$ , integration by parts implies $1 = \int \psi(x)\overline{\psi(x)} dx = x|\psi(x)|_{-\infty}^{+\infty} - \int x d(\psi \bar{\psi}) = 0 - \int x (\psi' \bar{\psi} + \psi \bar{\psi'} )dx$ . By absolute value inequality we get $1 = |\int_{-\infty}^{+\infty} x(\bar{\psi} \psi' + \psi'\bar{\psi})|dx \leq \int_{-\infty}^{+\infty} 2|x \psi| |\psi'|dx \leq 2\|x\psi\|_{L^2}\|\psi'\|_{L^2}$ by Cauchy-Schwartz inequality. Taking square we get

4\|x\psi(x)\|^2_{L^2} \cdot \|\psi'(x)\|_{L^2}^2\geq 1.

(4)

Now we look at the derivative term. By Plancherel identity and derivative theorem $\|\psi'\|_{L^2}^2 = \|\mathcal F\psi'\|_{L^2}^2 = \|2\pi i s \mathcal F\psi(s)\|_{L^2}^2 = \int 4\pi^2 s^2|\mathcal F\psi(s)|^2 ds.$ Plug in above inequality we get $16\pi^2 \|x\psi(x)\|_{L^2}^2 \cdot \|s\mathcal F\psi(s)\|^2_{L^2}\geq 1$ as claimed.

Central Limit Theorem¶

Independent random variables¶

Random variables $X$ and $Y$ are said to be independent provided $E(f(X)g(Y)) = E(f(X))E(g(Y))$ for every measurable $f,g$ . There are two main consequences for random variables to be independent that we are going to use:

$E(XY) = E(X)E(Y)$ . This implies $D(X+Y) = D(X)+D(Y)$ , where $D(X) = E(X^2) - (E(X))^2$ is the variance of $X$ .
Let $f_X,f_Y$ be probability density functions of $X,Y$ . Then the density of $X+Y$ satisfies $f_{X+Y} = f_X*f_Y$

Statement of Central limit theorem¶

Two random variables $X,Y$ are called identically distributed provided that their law of distribution are the same (i.e. corresponds to the same probability measure on $\mathbb{R}$ ).

We call a family $(X_i)_{i = 1}^n$ of identically distributed and independent random variables i.i.d. to simplify.

We are interested in the behaviour of distribution of $X_1+\cdots+X_n$ when $n\to \infty$ . For example, consider a point at origin on $\mathbb{R}$ . At every second, the point randomly moves distance $\leq \frac{1}{2}$ with equal probability and independent from previous moves. Then the location of the point at $n$ -th second is $X_1+\cdots+X_n$ where $X_i$ are uniform distribution random variable on $[-\frac{1}{2},\frac{1}{2}]$ . What is the probability distribution of location look like when $n$ is large? In this case we can calculate the distribution directly since they are convolutions. However, it turns out successive convolutions are very complicated even in this simplist case. The following is taken from this wikipedia page.

The frequency point of view¶

Assume $X_i$ are i.i.d. with common density function $\Pi(x) = \begin{cases}1,x\in [-\frac{1}{2},\frac{1}{2}] \\ 0,else\end{cases}$ . Then $E(X_i) = 0$ , $D(X_i) = E(X_i^2) = \int_{-\frac{1}{2}}^{\frac{1}{2}} x^2 dx = \frac{1}{12}$ . $D(X_1+\cdots+X_n) = \frac{n}{12}$ . Let $S_n:=\frac{X_1+\cdots+X_n}{\sqrt{\frac{n}{12}}}$ be the standard normalization so that $D(S_n) = 1$ .

$\Phi_{X_1+\cdots+X_n}(s) = (\mathcal F\Pi(s))^n = (\frac{\sin(\pi s)}{\pi s})^n$ . Note that one has $\Phi_{\frac{X}{a}}(s) = E(e^{2\pi i s\frac{X}{a}}) = E(e^{2\pi i \frac{s}{a}X}) = \Phi_X(\frac{s}{a})$ , so 1

\Phi_{S_n}(s) = \left( \frac{\sin(\frac{2\sqrt{3}\pi s}{\sqrt{n}})}{\frac{2\sqrt{3}\pi s}{\sqrt{n}}}\right)^n.

(5)

For every $s$ , when $n$ is large, $\frac{s}{\sqrt{n}}$ will be near 0. Then by Taylor expansion we have $\frac{\sin(x)}{x} = \frac{x - \frac{x^3}{3}+\frac{x^5}{5!}+o(x^5)}{x} = 1 - \frac{x^2}{2\cdot 3}+O(x^4)$ . Actually in this case we have the global Taylor series so the “ $O(x^4)$ ” is concretely given by a power series given by $\sum_{k\geq 4,\text{even}} \frac{(-1)^{k/2}{\pi}^k(\frac{s}{2})^k}{k!(k+1)}$ but we don’t need it.

\begin{split} \Phi_{S_n}(s) &= \left(1 - \frac{1}{2\cdot 3}\cdot (\frac{2\sqrt 3 \pi s}{\sqrt n})^2 + O(\frac{1}{n^2})\right)^n \\ &= \left(1 - \frac{2\pi ^2 s^2}{n}+O(\frac{1}{n^2})\right)^n \\ & \to e^{-2\pi^2s^2} \end{split}

(6)

when $n\to \infty$ .

Taking inverse Fourier transform we see that $f_{S_n}(x)\to \frac{1}{\sqrt{2\pi}}e^{-\frac{x^2}{2}}$ since the Fourier transform is continuous.

The central limit theorem says this is true for general random variable.

Theorem¶

Let $(X_i)_{i = 1}^{\infty}$ be a sequence of i.i.d. random variable with common expectation μ and standard deviation σ. Then $\frac{X_1+\cdots+X_n - n\mu}{\sqrt{n}\sigma}$ converges weakly to the Gaussian random variable. In other words,

\lim_{n\to \infty}\mathbb{P}(a<\frac{X_1+\cdots+X_n - n\mu}{\sqrt{n}\sigma} < b ) = \frac{1}{\sqrt{2\pi}}\int_{a}^b e^{-\frac{x^2}{2}}dx.

(7)

Here Gaussian random variable means a random variable whose probability density function is $\frac{1}{\sqrt{2\pi}} e^{-\frac{x^2}{2}}$ .

Proof of Central Limit theorem¶

By shifting if necessary we can assume the $X_i$ has expectation zero. Let $S_n:=\frac{X_1+\cdots+X_n}{\sqrt{n}\sigma}$ be the normalized sum. We 'll prove

The structure of proof of central limit theorem is:

Note that by scaling theorem, the characteristic function of Gauss random variable is $e^{-2\pi s^2}$ . Show that i.i.d. implies that $\Phi_{S_n}(s)$ converges pointwise to $e^{-2\pi s^2}$ .
Use the fact that for probability measures, pointwise convergence of Fourier transform implies weak convergence. This is called Levy continunity theorem.

We first give proof of 1 in full detail. Then we give a sketch proof of 2.

Proof of 1. The proof will be similar to the above using Taylor expansion. In the general case, the Taylor expansion $\Phi_X(s) = 1 - \frac{4\pi \sigma^2 s^2}{2} + o(s^2)$ , so for fixed $s$ (note that we think of $s$ as constant and $n$ as variable)

\Phi_{S_n}(s) = \left(1 - 2\pi ^2\sigma^2 (\frac{s}{\sqrt{n}\sigma})^2 + O(\frac{1}{n^{3/2}}) \right)^n.

(8)

Write $\Phi_{S_n} = (1+c_n)^n$ , where $c_n = -2\pi ^2 s^2 + e_n$ with $e_n\to 0$ when $n\to \infty$ . For every $\epsilon > 0$ , there exists $N$ large enough such that when $n>N$ , $|e_n|<\epsilon$ . There exists $N$ larger such that when $n>N$ , $(1+\frac{-2\pi^2s^2 + \epsilon}{n})^n < e^{-2\pi^2 s^2 + \epsilon}+\epsilon$ . So $\limsup_{n\to \infty} (1+c_n)^n \leq e^{-2\pi s^2 +\epsilon}+\epsilon$ for every ε. This implies $\limsup_{n\to \infty}(1+c_n)^n \leq e^{-2\pi s^2}$ . By the same argument, we can show that $\liminf_{n\to \infty}(1+c_n)^n \geq e^{-2\pi i s^2}$ . So the limit exists and $\lim_{n\to \infty}\Phi_{S_n}(s) = e^{-2\pi s^2}$ . We have proved the pointwise convergence.

The Levy continuous theorem¶

Let $(\mu_n)_{n = 1}^{\infty}$ be a sequence of probability measures. Let $\Phi_n$ be their characteristic functions (i.e. Fourier transform). Note that they are uniformly continuous functions as Fourier transform of probability measures. Suppose $\Phi_n \to \phi$ pointwise, then the following are equivalent:

$(1)$ $\mu_n$ converges weakly to a probability measure μ.

$(2)$ ϕ is Fourier transform of a probability measure μ.

$(3)$ ϕ is continuous at zero.

$(4)$ The family $(\mu_n)_{n = 1}^{\infty}$ is tight (to be defined later).

To see $(1)\implies(2)$ we recall that by definition, $\mu_n\to \mu$ weakly iff $(\mu_n,f) = (\mu,f)$ for every bounded continuous function $f$ . But $\Phi_n(s) = (\mu_n(x),e^{-2\pi i s x})$ and $e^{-2\pi i s x}$ is bounded continuous on $x$ variable for every $s$ . This implies $\Phi_n(s) = (\mu_n(x),e^{-2\pi i x s})\to (\mu(x),e^{-2\pi i x s})$ for every $s$ . But we already know that $\Phi_n(s)\to \Phi(s)$ pointwise, this implies $\phi(s) = (\mu(x),e^{-2\pi i x s}) = \Phi_\mu(s)$ .

$(2)\implies(3)$ follows from the remark that Fourier transform of probability measure is uniformly continuous, so in particular continuous at 0.

Tightness of measures¶

Let $(\mu_n)_{n = 1}^{\infty}$ be a family of probability measures.

We know that for any $\mu_i$ , $\lim_{R\to \infty}\mu_i(-R,R) = 1$ . The family is called tight provided that this is uniform, i.e. for every $\epsilon > 0$ , there exists $R$ (not depending on $i$ ) such that $\mu_i(|x|>R)<\epsilon$ for every $i$ .

Consequence of tightness: $(4)\implies (1)$ .¶

We give a sketch proof here to illustrate how to use tightness.

Uniform estimate of $\mu_n(|x|>R)$ by mean value of $\Phi_n$ near zero.¶

$(3)\implies(4)$

For $\epsilon > 0$ , $\int_{-\epsilon}^{\epsilon}(1 - \Phi_n(t))dt = \int_{\mathbb R}[2\epsilon - \frac{\sin(2\epsilon \pi x)}{\pi x} ]d\mu_n(x)$ . So $\frac{1}{\epsilon}\int_{-\epsilon}^{\epsilon}(1-\Phi_n(t))dt = \int_{\mathbb{R}}[2 - 2 \frac{\sin(2\epsilon \pi x)}{2\epsilon \pi x} ]d\mu_n(x)$ . The main part of integrand is scaling of $1-sinc(x)$ which looks like below.

Observe that $1 - \frac{\sin x}{x} \geq \frac{1}{2}\chi_{\{|x|> 2\}}(x)$ , so

2 - 2\frac{\sin(2\pi \epsilon x)}{2\epsilon \pi x} \geq \chi_{\{|2\pi \epsilon x|>2\}}(x)

(9)

and it follows that

\frac{1}{\epsilon}\int_{-\epsilon}^{\epsilon}(1-\Phi_n(t))dt \geq \mu_n\{|x|> \frac{1}{\pi \epsilon}\}.

(10)

Claim: For every $\delta > 0$ , there exists small $\epsilon_0$ and large $N$ depending on δ such that for every $n>N$ , $\frac{1}{\epsilon_0}\int_{-\epsilon_0}^{\epsilon_0}(1-\Phi_n(t))dt < \delta$ , so this implies $\mu_n\{|x|> \frac{1}{\pi \epsilon_0}\} < \delta$ for every $n>N$ .

First we point out that this claim imply tightness. For $\delta > 0$ , we want to find large enough $R$ such that $\mu_n\{|x| >R\} < \delta$ for every $n$ . The claim implies there is an $N$ and $R'$ such that for every $n>N$ blablabla. To fix the “for every” part, just choose larger $R_i$ for every $1\leq i \leq N$ , and let $R = \max\{{R_1,\dots,R_N,R'}\}$ .

So it reduces to prove the claim. The LHS is the (two times) mean value of $\Phi_n$ on a small interval near zero.Recall that we have $\Phi_n$ converges to ϕ pointwise and ϕ is continuous at zero, this means $1-\phi$ is small near zero. These two conditions should somehow give uniform control of value of $\Phi_n$ in a small nbh of zero. To check this, $|\frac{1}{\epsilon}\int_{-\epsilon}^{\epsilon}(1 - \Phi_n(t))dt| \leq |\frac{1}{\epsilon}\int_{-\epsilon}^{\epsilon}(1 - \phi(t))dt| + |\frac{1}{\epsilon} \int_{-\epsilon}^{\epsilon}\phi(t) - \Phi_n(t)dt|$ . The first term is $\leq \frac{\delta}{2}$ for some small $\epsilon_0$ by mean-value theorem of integral and continunity of ϕ. For this $\epsilon_0$ , the second is $\leq \delta$ when $n$ is large by dominated convergence theorem (dominated by the constant function 2 because both $|\Phi_n|\leq 1$ as Fourier transform of probability measures and pointwise convergence implies $|\phi|\leq 1$ as well.

Appendix¶

Lecture Notes

Lecture 16