Let Ω be a set with probability measure P defined (on some sigma-algebra). A random variable is a measurable functionX:Ω→R. (Roughly speaking, it means P{ω:X(ω)∈(a,b)} defined for every interval (a,b)). It’s impossible to study random variable directly, instead we study X by studying its probability distribution. The probability distribution of X is a measure μ on R given by μ(a,b)=P{ω∈Ω:X(ω)∈(a,b)}, it is a probability measure on R. If dμ=f(x)dx for some function f(x), we call f the density function of X.
The expectation of X is given by EX=∫ΩX(ω)dP(ω). To calculate integrals of random variable and function of random variable, one usually use the following handy formula:
The characteristic function can detect momentums of X.
By derivative theorem dskdkΦX(s)=F((−2πix)k⋅μ)(s)=∫−∞+∞(−2πix)ke−2πisxdμ(x)=(−2πi)k∫xke−2πisxdμ(x). So let s=0 we get E(Xk)=(−2πi)k1ΦX(k)(0).
Converesely, if ΦX has global Taylor expansion (this is the case when X or dμ is compactly supported so its Fourier transform is analytic by Pauley Wiener theorem),
A complex function ψ is called a wave function if ∣ψ(x)∣2 is a finite measure on R. We normalize so that it is a probability measure. We also assume ψ∈S for simplicity, so that Fψ is also in S. The uncertainty principle says that the variance of random variables ψ and Fψ cannot be both small. In other words, the measurements ψ and Fψ cannot be simouteneously localized.
Proof. By our normalization we have ∫∣ψ(x)∣2dx=1, integration by parts implies 1=∫ψ(x)ψ(x)dx=x∣ψ(x)∣−∞+∞−∫xd(ψψˉ)=0−∫x(ψ′ψˉ+ψψ′ˉ)dx. By absolute value inequality we get 1=∣∫−∞+∞x(ψˉψ′+ψ′ψˉ)∣dx≤∫−∞+∞2∣xψ∣∣ψ′∣dx≤2∥xψ∥L2∥ψ′∥L2 by Cauchy-Schwartz inequality. Taking square we get
Now we look at the derivative term. By Plancherel identity and derivative theorem ∥ψ′∥L22=∥Fψ′∥L22=∥2πisFψ(s)∥L22=∫4π2s2∣Fψ(s)∣2ds. Plug in above inequality we get 16π2∥xψ(x)∥L22⋅∥sFψ(s)∥L22≥1 as claimed.
Random variables X and Y are said to be independent provided E(f(X)g(Y))=E(f(X))E(g(Y)) for every measurable f,g. There are two main consequences for random variables to be independent that we are going to use:
E(XY)=E(X)E(Y). This implies D(X+Y)=D(X)+D(Y), where D(X)=E(X2)−(E(X))2 is the variance of X.
Let fX,fY be probability density functions of X,Y. Then the density of X+Y satisfies fX+Y=fX∗fY
Two random variables X,Y are called identically distributed provided that their law of distribution are the same (i.e. corresponds to the same probability measure on R).
We call a family (Xi)i=1n of identically distributed and independent random variables i.i.d. to simplify.
We are interested in the behaviour of distribution of X1+⋯+Xn when n→∞. For example, consider a point at origin on R. At every second, the point randomly moves distance ≤21 with equal probability and independent from previous moves. Then the location of the point at n-th second is X1+⋯+Xn where Xi are uniform distribution random variable on [−21,21]. What is the probability distribution of location look like when n is large? In this case we can calculate the distribution directly since they are convolutions. However, it turns out successive convolutions are very complicated even in this simplist case. The following is taken from this wikipedia page.
Assume Xi are i.i.d. with common density function Π(x)={1,x∈[−21,21]0,else. Then E(Xi)=0, D(Xi)=E(Xi2)=∫−2121x2dx=121. D(X1+⋯+Xn)=12n. Let Sn:=12nX1+⋯+Xn be the standard normalization so that D(Sn)=1.
ΦX1+⋯+Xn(s)=(FΠ(s))n=(πssin(πs))n. Note that one has ΦaX(s)=E(e2πisaX)=E(e2πiasX)=ΦX(as), so 1
For every s, when n is large, ns will be near 0. Then by Taylor expansion we have
xsin(x)=xx−3x3+5!x5+o(x5)=1−2⋅3x2+O(x4). Actually in this case we have the global Taylor series so the “O(x4)” is concretely given by a power series given by ∑k≥4,evenk!(k+1)(−1)k/2πk(2s)k but we don’t need it.
Let (Xi)i=1∞ be a sequence of i.i.d. random variable with common expectation μ and standard deviation σ. Then nσX1+⋯+Xn−nμ converges weakly to the Gaussian random variable. In other words,
By shifting if necessary we can assume the Xi has expectation zero. Let Sn:=nσX1+⋯+Xn be the normalized sum. We 'll prove
The structure of proof of central limit theorem is:
Note that by scaling theorem, the characteristic function of Gauss random variable is e−2πs2. Show that i.i.d. implies that ΦSn(s) converges pointwise to e−2πs2.
Use the fact that for probability measures, pointwise convergence of Fourier transform implies weak convergence. This is called Levy continunity theorem.
We first give proof of 1 in full detail. Then we give a sketch proof of 2.
Proof of 1. The proof will be similar to the above using Taylor expansion. In the general case, the Taylor expansion ΦX(s)=1−24πσ2s2+o(s2), so for fixed s (note that we think of s as constant and n as variable)
Write ΦSn=(1+cn)n, where cn=−2π2s2+en with en→0 when n→∞. For every ϵ>0, there exists N large enough such that when n>N, ∣en∣<ϵ. There exists N larger such that when n>N, (1+n−2π2s2+ϵ)n<e−2π2s2+ϵ+ϵ. So limsupn→∞(1+cn)n≤e−2πs2+ϵ+ϵ for every ε. This implies limsupn→∞(1+cn)n≤e−2πs2. By the same argument, we can show that liminfn→∞(1+cn)n≥e−2πis2. So the limit exists and limn→∞ΦSn(s)=e−2πs2. We have proved the pointwise convergence.
Let (μn)n=1∞ be a sequence of probability measures. Let Φn be their characteristic functions (i.e. Fourier transform). Note that they are uniformly continuous functions as Fourier transform of probability measures. Suppose Φn→ϕ pointwise, then the following are equivalent:
(1)μn converges weakly to a probability measure μ.
(2) ϕ is Fourier transform of a probability measure μ.
(3) ϕ is continuous at zero.
(4) The family (μn)n=1∞ is tight (to be defined later).
To see (1)⟹(2) we recall that by definition,μn→μ weakly iff (μn,f)=(μ,f) for every bounded continuous function f. But Φn(s)=(μn(x),e−2πisx) and e−2πisx is bounded continuous on x variable for every s. This implies Φn(s)=(μn(x),e−2πixs)→(μ(x),e−2πixs) for every s. But we already know that Φn(s)→Φ(s) pointwise, this implies ϕ(s)=(μ(x),e−2πixs)=Φμ(s).
(2)⟹(3) follows from the remark that Fourier transform of probability measure is uniformly continuous, so in particular continuous at 0.
Let (μn)n=1∞ be a family of probability measures.
We know that for any μi, limR→∞μi(−R,R)=1. The family is called tight provided that this is uniform, i.e. for every ϵ>0, there exists R (not depending on i) such that μi(∣x∣>R)<ϵ for every i.
We give a sketch proof here to illustrate how to use tightness.
Uniform estimate of μn(∣x∣>R) by mean value of Φn near zero.¶
(3)⟹(4)
For ϵ>0, ∫−ϵϵ(1−Φn(t))dt=∫R[2ϵ−πxsin(2ϵπx)]dμn(x). So ϵ1∫−ϵϵ(1−Φn(t))dt=∫R[2−22ϵπxsin(2ϵπx)]dμn(x). The main part of integrand is scaling of 1−sinc(x) which looks like below.
Claim: For every δ>0, there exists small ϵ0 and large N depending on δ such that for every n>N, ϵ01∫−ϵ0ϵ0(1−Φn(t))dt<δ, so this implies μn{∣x∣>πϵ01}<δ for every n>N.
First we point out that this claim imply tightness. For δ>0, we want to find large enough R such that μn{∣x∣>R}<δ for every n. The claim implies there is an N and R′ such that for every n>N blablabla. To fix the “for every” part, just choose larger Ri for every 1≤i≤N, and let R=max{R1,…,RN,R′}.
So it reduces to prove the claim. The LHS is the (two times) mean value of Φn on a small interval near zero.Recall that we have Φn converges to ϕ pointwise and ϕ is continuous at zero, this means 1−ϕ is small near zero. These two conditions should somehow give uniform control of value of Φn in a small nbh of zero. To check this, ∣ϵ1∫−ϵϵ(1−Φn(t))dt∣≤∣ϵ1∫−ϵϵ(1−ϕ(t))dt∣+∣ϵ1∫−ϵϵϕ(t)−Φn(t)dt∣. The first term is ≤2δ for some small ϵ0 by mean-value theorem of integral and continunity of ϕ. For this ϵ0, the second is ≤δ when n is large by dominated convergence theorem (dominated by the constant function 2 because both ∣Φn∣≤1 as Fourier transform of probability measures and pointwise convergence implies ∣ϕ∣≤1 as well.