1. What is Central Limit Theorem?
Central Limit Theorem is the backbone of probability theory. The theorem shows that for a given fairly large ‘n’ (Sample size) from a population with finite variance, the mean samples taken from the population are same as a population mean.
What does CLT stand for?
CLT stands for Central Limit Theorem
What makes the Central Limit Theorem
so important?
This
theorem shows that despite of the underlying distribution when ‘n’ is large with finite
mean and variance, the arithmetic mean of the distribution under given
conditions follows approximately normal distribution. CLT tells us about the
shape of the distribution after repeated trials.
Condition:
CLT requires linear, additive behavior of variables involved.
2. Definition of Central Limit Theorem
Central Limit Theorem in the
mathematical theory of probability is expressed as:-
Let
Xi (i=1, 2,..n) be independent random variables and E (Xi) =µi and
variance (Xi) = σi2 , then under certain general
conditions, the random variable Sn= X1 + X2
+ X3 + X4 + X5 + X6.. Xn
tends to be asymptotically normal with mean µ and standard deviation σ
where µ = Σµi and σ2 = Σσi2. |
The above definition is also an
answer to the well- known and mostly asked question "What is the
Central Limit Theorem in Statistics?"
Laplace was the first to state this theory in 1812 and a proof was
given by Liapounoff under general conditions in 1901. Commonly speaking,
a Central Limit Theorem is a set of weak-convergence theorems used
in probability theory.
3. Some particular cases of C.L.T.
Here, we consider some general cases
of this Central Limit Theorem:
De- Moivre Laplace Theorem
If Xi= {1 with
probability p 0
with probability q Then, the distribution of the
random variable Sn = X1 + X2 + X3
+ X4 ……...Xn where Xi’s are independent, is asymptotically
normal as n → ∞ |
Proof: M.G.F is
Mxi (t) = E (etXi)
= et.1p + et.0q = (q + pet) (*)
M.G.F of sum, Sn= X1 + X2
+ X3 + X4 ……...Xn is given by
MSn (t) = MXi+Xi+X3
….Xn (t) = MX1 (t) MX2 (t) MX3(t)...MXn
(t) = [MXi(t)]n
(since Xi’s are)
= [q + pet]n
Which is M.G.F of binomial variate
with parameters ‘n’ and ‘p’.
Hence, by uniqueness theorem of M.G.F’s,
Sn ~ B (n, p)
∴ E (Sn) = np = μ (say) and V (Sn) = npq= σ2
Mz(t) = e-µt/σ MSn(t/σ) =
e-npt/√npq[q + pet/√npq]n
=
[1 + t2/ 2n + o( n-3/2)]n
Where o (n-3/2)
represents terms involving n3/2 and more high powers of n in
the denominator. Proceeding to the limits as limit and n ➡∞
=> limn→∞ Mz(t)
= limn→∞ [1 + t2/n + 0(n-3/2)]n
=> limn→∞[1+ t2/2n]n
= et2/2
Remarks: This theorem depicts that standard binomial variate tends
to standard normal variate as n → ∞ or in simple words Binomial Distribution
as n → ∞ Tends to Normal Distribution.
4. Lyapunov’s CLT
This
theorem, named after Russian mathematician Aleksandr Lyapunov. States that:
Let
X1, X2, X3 be independent r.v’s (not
necessarily identical) with finite expected value μi and
variance σi2 and random variables |Xi| have
moments of some order (2 + δ), the rate of growth of these moments is limited
by the Lyapunov condition: Sn2
= Σ σi2 i =1.2,... n If
for some δ >0 Lyapunov’s condition: lim n→∞1/ Sn2+δ
Σ E| Xi –μi |2+δ = 0 is satisfied then the
sum (Xi
–μi)/ Sn converges to standard normal variate
distribution as: n → ∞ |
5. Lindeberg Levy Theorem
Lindeberg and Levy proved the
following case of Central Limit Theorem -
“If X1,X2,X3…….Xn
are independently and identically distributed random variables with E(Xi)
= μi V(Xi)
= σ12 Then Sn = X1
+ X2 + X3 + X4 ……...Xn is
asymptotically normal with mean μ = nμ1 and σ2= nσ12µ |
Here the assumptions made are -
- The variables are independently and identically
distributed
- E (Xi2) exists for all i=1,2,3,4...n.
Proof: Let M1 (t) denote the M.G.F of each of the
deviation (Xi – μ1) and M (t) denote the M.G.F of
standard normal variate z= (Sn- μ)/σ
Since, μ1’ and μ2’
(about origin) of the deviation (Xi – μ1) are given by,
μ1’ = E (Xi- μ2)
= 0
Μ2’ = E (Xi- μ2)
= σ12
We have,
M1(t)= ( 1 +μ1’
+ μ2’ t2/2 + μ3‘ t3/3! +.....
) = {1 + σ12t2/2 +O (t3)}
Where O3 contains terms
with t3 and higher powers of t.
And since, Xi’s are independent,
we get
MZ (t) = MΣ (Xi - µi)
σ 2(t) = MΣ (Xi - µi) 2(t/σ) = ∏M (xi
- µi) 2(t/σ) = [M1 (t/σ)]n = [M1
(t/√nσ1)]
= [1 + t2/2n + o (n-3/2)]n
Where o(n-3/2) represents
terms involving n3/2 and higher powers of n in the denominator
proceeding to the limits as limit and n tends to infinity,
=> limn→∞ MZ
(t) = limn→∞ [1+ t2/n + o (n-3/2)]n
=> limn→∞ [1+ t2/2n]n=
et2/2
Which
is M.G.F of standard normal variate. Hence by uniqueness of M.G.F’s Sn
= X1 + X2 + X3 + X4 ……...Xn
is asymptotically normal with mean μ = nμ1 and σ2= nσ12
If
a sequence satisfies Lyapunov’s condition it also satisfies Lindeberg’s
condition. However, the converse is not true.
6. Liapounoff’s Central Limit Theorem
This
is the Central Limit Theorem for generalized case when the variables are
identically distributed and some further conditions are imposed.
Let X1, X2,
X3… Xn be independent random variables such that E(Xi)= μi V(Xi)= σi2 |
Suppose that the third absolute
moment say pi3 of Xi about its mean exists i.e.
pi3 = E{| Xi-
μi|3} is finite. i = 1,2,3...n
Further, let p3 = ∑pi3
If lim n→∞ p/σ = 0, the
sum X =X1 + X2 + X3 + X4
……...Xn is asymptotically N (μ,σ2)
Where μ=∑μi and σ2
= ∑σi2
Remarks: About Liapounoff’s theorem
If the variables X1, X2,
X3… Xn are identical, then p3 = ∑pi3
= n13
And σ2 = ∑σi2
= nσ12
Hence,
for identical variables, the conditions of Liapounoff’s theorem are satisfied.
It
may be pointed out that Lindebergh- Levy theorem is not a particular case of Liapounoff’s
theorem since the former does not assume the existence of the third moment.
7. Different forms of C.L.T
C.L.T can be stated in other forms
too which are as follows:
Remarks:
- It does not matter if we change non-strict inequalities to strict inequalities P [a< (.)
- CLT gives a good approximation in binomial case when p=½.
For p near about 0 or 1, CLT approximation still holds good, but for that ‘n’
has to be sufficiently large.
8. Applications of Central Limit Theorem
(a) If X1, X2, X3.. are i.i.d
B(r, p) variates and Sn = X1 + X2 + X3
+ X4 ……...Xn
Then, E (Sn) = E (Xi)
= nrp
And V (Sn) = V (Xi)
= nrpq
Where i=1,2..n
(b) Let X1, X2… be i.i.d Bernoulli
variates i.e . B(1,p), then
Sn = X1 + X2
+ X3 + X4 ……...Xn = B (n, p). Hence, we get in
(*)
(c) If Yn is distributed as Pn then,
Thus, for instance
limn→∞ P (Yn ≤ n) = ½
i.e. Σe-nnk/k!
= ½ as n ➡
∞
Proof: Let X1, X2… be i.i.d P (1) variates.
Then, Sn =X1 + X2 + X3 + X4
……...Xn~ P(n)
=> Yn = Sn
In particular, take a = - ∞
b=0
We get,
P (Yn ≤ n) ➡½ as n ➡∞
Relationship between Central Limit Theorem and
Weak Law of Large Numbers
1. Both the Central
Limit Theorem and the Weak Law of
Large Numbers (WLLN) hold good for the sequence of i.i.d random variables
with finite mean μ and variance σ2. However in this case the CLT
gives a stronger result then the WLLN in the sense that the former provides an
estimate of P [|(Sn -nμ)/n| ] as given below:
where Φ (.) is the distribution
function of standard normal variate. However, WLLN does not require the
existence of variance.
2.
For the sequence {Xn} of independent and uniformly bounded r.v.’s, WLLN holds good and CLT holds in this case
provided Bn= var( X1 + X2 + X3 + X4……...Xn)
= σ12 + σ22……….σn2 ➡ ∞
as n ➡∞.
3.
For the sequence {Xn} of independent r.v.’s, CLT may hold good, but the WLLN
may not hold well.
9. Uses of CLT
- CLT explains that a lot of
commonly used estimators follow an "Approximately Normal"
distribution which means tables of values (built-in functions in statistical
software or programming languages like ‘R’, ‘C- programming’) can be used to construct or
build confidence intervals and approximate p-values. It is very practical.
- CLT’s ability can be applied to
an kind of distributions. It allows statisticians to develop standardized
methods to derive much useful information from almost any sample by
obtaining CLT based statistics and hypothesis tests.
- Under certain conditions, CLT
also holds good for variables which are not independent.
10. Examples of Central Limit Theorem
1. Suppose a school has 1200 students, with 200 each in grades
from 7-12th standard. Here, each student has an effect on marks and each
student’s marks are independent of each other.
If
we take a sample of 25 students each for their marks and take total 10 samples
and find the mean grade. We observe that first sample has 9.52, then, we find
other sample’s mean is 932 hence, the table showing grades of different
students are shown below:
Sample
(n = 25) |
Average
Grade |
1 |
9.52 |
2 |
9.32 |
3 |
9.08 |
4 |
8.80 |
5 |
9.48 |
6 |
9.36 |
7 |
9.48 |
8 |
10.12 |
9 |
9.64 |
10 |
9.35 |
|
|
When
samples are taken and means are calculated, each time the mean, starts to form
their own distribution. This distribution is the sampling distribution
because it represents the distribution of estimates from population on repeated
samples. In such a case, a histogram of sample means, of say, 1,000
samples would appear like the following.
The
shape of the distribution of 25 samples looks a bit like Gaussian
distribution (Normal Distribution), regardless of the original
distribution being uniform and the shape of the sample means taken from the population
tend more towards normal distribution as we keep increasing ‘n’.
The
central limit theorem states that, it is easy to show that the mean of this
sampling distribution will be the population mean, and that the variance is
equal to the population variance divided by n, taking square root of the
variance, given standard deviation of population, which is known as standard
error. To conclude, this example depicts that the mean of the sample means will
be equal to the population means, and the variance will get smaller with
i)
Decrease in population variance or
ii) With the increment in sample size.
2.
Let X1, X2.. be i.i.d Poisson variates with
parameter λ. Use central limit theorem to estimate that:
P (120 ≤ Sn ≤ 160) where
Sn = X1+ X2+ X3… Xn = λ 2
and n=75.
Solution:
Since, Xi’s are i.i.d P (λ),
E (Xi) = λ
Var (Sn)= var ( X1+X2+X3…
Xn) = n λ
Hence, by Lindeberg -CLT (for large
n)
Sn ~ N (n λ, n λ) = N (µ
= 150, σ2 = 150); n=75 λ = 2
3.
The probability distribution for total distance covered in a walk (biased or
unbiased) will tend towards a normal distribution.
4.
Flipping coins for a large n will result in a normal distribution for the total
of heads (or equivalently total number of tails).
Try yourself:
- A distribution with unknown
mean μ has variance equal to 1.5. Use central limit theorem to find how
large a sample should be taken from the distribution in order that the
probability will be at least 0.95 that the sample mean will be within 0.5
of the population mean?
- The lifetime of a certain brand
of an electric tube light is considered a random variable with mean 1,200
hours and standard deviation 250 hours. Find the probability using central
limit theorem showing that the average lifetime of 60 bulbs exceeds 1400
hours.
Sir Francis Galton was an English Victorian statistician, progressive,
polymath, sociologist, psychologist, anthropologist, eugenicist, tropical
explorer, geographer, inventor, meteorologist, proto-geneticist, and psychometrician,
knighted in 1909. He produced over 340 papers and books and created the
statistical concept of correlation and widely promoted regression towards the
mean. He was the first man to apply statistical methods to the study of human
differences and inheritance of intelligence, and introduced the use of
questionnaires and surveys for collecting data on human communities, which he
needed for genealogical and biographical works and for his anthropometric
studies. He quoted as:
"I
know of scarcely anything so apt to impress the imagination as the wonderful
form of cosmic order expressed by the “Law of Frequency of Error”. The
law would have been personified by the Greeks and defied, if they had known of
it. It reigns with serenity and in complete self-effacement, amidst the wildest
confusion. The huger the mob, and the greater the apparent anarchy, the more
perfect is its sway. It is the supreme Law
of Unreason. Whenever a large sample of chaotic elements are taken in hand
and marshaled in the order of their magnitude, an unsuspected and most
beautiful form of regularity proves to have been latent all along".