Lecture 1: CDF and EDF 1-3
CDF of X
1
, ··· , X
n
. Let F
S
n
(x) be the CDF of S
n
. Then F
S
n
(x) is determined by F
X
1
,··· ,X
n
(x
1
, ··· , x
n
),
which under the IID case, is determined by F (x) and n (sample size).
Thus, when X
1
, ··· , X
n
∼ F ,
(F (x), n)
determine
−→ F
X
1
,··· ,X
n
(x
1
, ··· , x
n
)
determine
−→ F
S
n
(x). (1.1)
Mathematically speaking, there is a map Ψ : F × N 7→ F such that
F
S
n
= Ψ(F, n), (1.2)
where F is a collection of all possible CDF’s.
Example. Assume X
1
, ··· , X
n
∼ N (0, 1). Let S
n
=
1
n
P
n
i=1
X
i
be the sample average. Then the CDF
of S
n
is the CDF of N(0, 1/n) by the property of a normal distribution. In this case, F is the CDF of
N(0, 1). Now if we change the sampling distribution from N (0, 1) to N(1, 4), then the sample average S
n
has a CDF of N(1, 4/n). Here you see that the CDF of the sample average, a statistic, changes when the
sampling distribution F changes (and the CDF of S
n
is clearly dependent on the sample size n). This is
what equations (1.1) and (1.2) refer to.
Therefore, a key conclusion is:
Given F and the sample size n, the distribution of any statistic
from the random sample
X
1
, ··· , X
n
is determined.
Even if we cannot analytically write down the function F
S
n
(x), as long as we can sample from F , we can
generate many sets of size-n random samples and compute S
n
of each random sample and find out the
distribution of F
S
n
.
Here you see that the CDF F is very important in analyzing the distribution of any statistic. However, in
practice the CDF F is unknown to us; all we have is the random sample X
1
, ··· , X
n
. So here comes the
question:
Given a random sample X
1
, ··· , X
n
, how can we estimate F ?
1.3 EDF: Empirical Distribution Function
Let first look at the function F (x) more closely. Given a value x
0
,
F (x
0
) = P (X
i
≤ x
0
)
for every i = 1, ··· , n. Namely, F (x
0
) is the probability of the event {X
i
≤ x
0
}.
A natural estimator of a probability of an event is the ratio of such an event in our sample. Thus, we use
b
F
n
(x
0
) =
number of X
i
≤ x
0
total number of observations
=
P
n
i=1
I(X
i
≤ x
0
)
n
=
1
n
n
X
i=1
I(X
i
≤ x
0
) (1.3)
as the estimator of F (x
0
).
For every x
0
, we can use such a quantity as an estimator, so the estimator of the CDF, F (x), is
b
F
n
(x). This
estimator,
b
F
n
(x), is called the empirical distribution function (EDF).