Lecture 1: CDF and EDF 1.1 CDF: Cumulative Distribution Function

STAT/Q SCI 403: Introduction to Resampling Methods Spring 2017

Lecture 1: CDF and EDF

Instructor: Yen-Chi Chen

1.1 CDF: Cumulative Distribution Function

For a random variable X, its CDF F(x) contains all the probability structures of X. Here are some properties

of F (x):

• (probability) 0 ≤ F (x) ≤ 1.

• (monotonicity) F (x) ≤ F(y) for every x ≤ y.

• (right-continuity) lim

x→y

F (x) = F (y), where y

= lim

>0,→0

y + .

• lim

x→−∞

F (x) = F (−∞) = 0.

• lim

x→+∞

F (x) = F (∞) = 1.

• P (X = x) = F (x) − F (x

−

), where x

−

= lim

<0,→0

x + .

Example. For a uniform random variable over [0, 1], its CDF

F (x) =

1 du = x

when x ∈ [0, 1] and F (x) = 0 if x < 0 and F (x) = 1 if x > 1.

Example. For an exponential random variable with parameter λ, its CDF

F (x) =

λe

−λu

du = 1 − e

−λx

when x ≥ 0 and F (x) = 0 if x < 0. The following provides the CDF (left) and PDF (right) of an exponential

random variable with λ = 0.5:

−1 0 1 2 3 4 5

0.0 0.2 0.4 0.6 0.8

Exponential(0.5)

F(x)

−1 0 1 2 3 4 5

0.0 0.1 0.2 0.3 0.4 0.5

Exponential(0.5)

p(x)

1-1

1-2 Lecture 1: CDF and EDF

1.2 Statistics and Motivation of Resampling Methods

Given a sample X

, ··· , X

(not necessarily an IID sample), a statistic S

= S(X

, ··· , X

) is a function of

the sample.

Here are some common examples of a statistic:

• Sample mean (average):

S(X

, ··· , X

) =

i=1

• Sample maximum:

S(X

, ··· , X

) = max{X

, ··· , X

• Sample range:

S(X

, ··· , X

) = max{X

, ··· , X

} −min{X

, ··· , X

• Sample variance:

S(X

, ··· , X

) =

n −1

i=1



−



i=1

Here are some useful statistics but might not be so common as the previous few examples:

• Number of observations above a threshold t:

S(X

, ··· , X

) =

i=1

I(X

> t).

• Rank of the ﬁrst observation (X

S(X

, ··· , X

) = 1 +

i=2

I(X

> X

If X

is the largest number, then S(X

, ··· , X

) = 1; if X

is the smallest number, then S(X

, ··· , X

) =

• Sample second moment:

S(X

, ··· , X

) =

i=1

The sample second moment is a consistent estimator of E(X

Now we assume that our sample X

, ··· , X

is generated from a sampling distribution. Then the distribution

of these n numbers is determined by the joint CDF F

,··· ,X

, ··· , x

). In the IID case (or sometimes we

call it a random sample), the joint CDF is the product of the individual CDF’s (and they are all the same

because of being identically distributed). Thus, in the IID case, the individual CDF F (x) = F

(x) and the

sample size n determines the entire joint CDF.

For a statistic S

= S(X

, ··· , X

), it is a random variable when the sample is random. Because S

is a

function of the input data points X

, ··· , X

, the distribution of S

is completely determined by the joint

Lecture 1: CDF and EDF 1-3

CDF of X

, ··· , X

. Let F

(x) be the CDF of S

. Then F

(x) is determined by F

,··· ,X

, ··· , x

which under the IID case, is determined by F (x) and n (sample size).

Thus, when X

, ··· , X

∼ F ,

(F (x), n)

determine

−→ F

,··· ,X

, ··· , x

)

determine

−→ F

(x). (1.1)

Mathematically speaking, there is a map Ψ : F × N 7→ F such that

= Ψ(F, n), (1.2)

where F is a collection of all possible CDF’s.

Example. Assume X

, ··· , X

∼ N (0, 1). Let S

i=1

be the sample average. Then the CDF

of S

is the CDF of N(0, 1/n) by the property of a normal distribution. In this case, F is the CDF of

N(0, 1). Now if we change the sampling distribution from N (0, 1) to N(1, 4), then the sample average S

has a CDF of N(1, 4/n). Here you see that the CDF of the sample average, a statistic, changes when the

sampling distribution F changes (and the CDF of S

is clearly dependent on the sample size n). This is

what equations (1.1) and (1.2) refer to.

Therefore, a key conclusion is:

Given F and the sample size n, the distribution of any statistic

from the random sample

, ··· , X

is determined.

Even if we cannot analytically write down the function F

(x), as long as we can sample from F , we can

generate many sets of size-n random samples and compute S

of each random sample and ﬁnd out the

distribution of F

Here you see that the CDF F is very important in analyzing the distribution of any statistic. However, in

practice the CDF F is unknown to us; all we have is the random sample X

, ··· , X

. So here comes the

question:

Given a random sample X

, ··· , X

, how can we estimate F ?

1.3 EDF: Empirical Distribution Function

Let ﬁrst look at the function F (x) more closely. Given a value x

F (x

) = P (X

≤ x

)

for every i = 1, ··· , n. Namely, F (x

) is the probability of the event {X

≤ x

A natural estimator of a probability of an event is the ratio of such an event in our sample. Thus, we use

) =

number of X

≤ x

total number of observations

i=1

I(X

≤ x

)

i=1

I(X

≤ x

) (1.3)

as the estimator of F (x

For every x

, we can use such a quantity as an estimator, so the estimator of the CDF, F (x), is

(x). This

estimator,

(x), is called the empirical distribution function (EDF).

1-4 Lecture 1: CDF and EDF

Example. Here is an example of the EDF of 5 observations of 1, 1.2, 1.5, 2, 2.5:

1.0 1.5 2.0 2.5

0.0 0.2 0.4 0.6 0.8 1.0

ecdf(x)

Fn(x)

There are 5 jumps, each located at the position of an observation. Moreover, the height of each jump is the

same:

Example. While the previous example might not be look like an idealized CDF, the following provides a

case of EDF versus CDF where we generate n = 100, 1000 random points from the standard normal N(0, 1):

−3 −2 −1 0 1 2 3

0.0 0.2 0.4 0.6 0.8 1.0

n=100

Fn(x)

Lecture 1: CDF and EDF 1-5

−3 −2 −1 0 1 2 3

0.0 0.2 0.4 0.6 0.8 1.0

n=1000

Fn(x)

The red curve indicates the true CDF of the standard normal. Here you can see that when the sample size

is large, the EDF is pretty close to the true CDF.

1.3.1 Property of EDF

Because EDF is the average of I(X

≤ x), we now study the property of I(X

≤ x) ﬁrst. For simplicity, let

= I(X

≤ x). What is the random variable Y

Here is the breakdown of Y

(

1, if X

≤ x

0, if X

> x

So Y

only takes value 0 and 1–so it is actually a Bernoulli random variable! We know that a Bernoulli

random variable has a parameter p that determines the probability of outputing 1. What is the parameter

p for Y

p = P (Y

= 1) = P (X

≤ x) = F (x).

Therefore, for a given x,

∼ Ber(F (x)).

This implies

E(I(X

≤ x)) = E(Y

) = F (x)

Var(I(X

≤ x)) = Var(Y

) = F (x)(1 − F (x))

for a given x.

Now what about

(x)? Recall that

(x) =

i=1

I(X

≤ x) =

i=1

. Then



(x)



= E(I(X

≤ x)) = F (x)

Var



(x)



i=1

Var(Y

)

F (x)(1 − F (x))

What does this tell us about using

(x) as an estimator of F (x)?

1-6 Lecture 1: CDF and EDF

First, at each x,

(x) is an unbiased estimator of F (x):

bias



(x)



= E



(x)



− F (x) = 0.

Second, the variance converges to 0 when n → ∞. By Lemma 0.3, this implies that for a given x,

(x)

→ F (x).

i.e.,

(x) is a consistent estimator of F (x).

In addition to the above properties, the EDF also have the following interesting feature: for a given x,

√



(x) −F (x)



→ N(0, F (x)(1 − F (x))).

Namely,

(x) is asymptotically normally distributed around F (x) with variance F (x)(1 −F (x)).

Example. Assume X

, ··· , X

100

∼ F , where F is a uniform distribution over [0, 2]. Questions:

• What will be the expectation of

(0.8)?

−→ E



(0.8)



= F (0.8) = P (x ≤ 0.8) =

0.8

dx = 0.4.

• What will be the variance of

(0.8)?

−→ Var



(0.8)



F (0.8)(1 − F (0.8))

100

0.4 ×0.6

100

= 2.4 ×10

−3

Remark. The above analysis shows that for a given x,

(x) −F (x)|

→ 0.

This is related to the pointwise convergence in mathematical analysis (you may have learned this in STAT

300). We can extend this result to a uniform sense:

sup

(x) −F (x)|

→ 0.

However, deriving such a uniform convergence requires more involved probability tools so we will not cover

it here. But an important fact is that such a uniform convergence in probability can be established under

some conditions.

Question to think: Think about how to construct a 95% conﬁdence interval of F (x) for a given x.

♦ : The EDF can be used to test if the sample is from a known distribution or two samples are from the

same distribution. The former is called the goodness-of-ﬁt test and the latter is called the two-sample test.

Assume that we want to test if X

, ··· , X

are from an known distribution F

(goodness-of-ﬁt test). There

are three common approaches to carry out this test. The ﬁrst one is called the KS test (Kolmogorov–Smirnov

test)

, where the test statistic is the KS-statistic

= sup |

(x) −F

(x)|.

https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

Lecture 1: CDF and EDF 1-7

The second approach is the Cram´er–von Mises test

, which uses the Cram´er–von Mises statistic as the test

statistic



(x) −F

(x)



(x).

The third approach is the Anderson-Darling test

and the test statistic is

= n



(x) −F

(x)



(x)(1 −F

(x))

(x).

We reject the null hypothesis (H

: X

, ··· , X

∼ F

) when the test statistic is greater than some threshold

depending on the signiﬁcance level α. Note that here we present the test statistics for the goodness-of-ﬁt

test, there are corresponding two-sample test version of each of them.

1.4 Inverse of a CDF

Let X be a continuous random variable with CDF F (x). Let U be a uniform distribution over [0, 1]. Now

we deﬁne a new random variable

W = F

−1

(U),

where F

−1

is the inverse of the CDF. What will this random variable W be?

To understand W , we examine its CDF F

(w) = P (W ≤ w) = P (F

−1

(U) ≤ w) = P (U ≤ F (w)) =

F (w)

1 dx = F (w) − 0 = F (w).

Thus, F

(w) = F (w) for every w, which implies that the random variable W has the same CDF as the

random variable X! So this leads a simple way to generate a random variable from F as long as we know

−1

. We ﬁrst generate a random variable U from a uniform distribution over [0, 1]. And then we feed the

generated value into the function F

−1

. The resulting random number, F

−1

(U), has a CDF being F .

This interesting fact also leads to the following result. Consider a random variable V = F (X), where F is

the CDF of X. Then the CDF of V

(v) = P (V ≤ v) = P (F (X) ≤ v) = P (X ≤ F

−1

(v)) = F (F

−1

(v)) = v

for any v ∈ [0, 1]. Therefore, V is actually a uniform random variable over [0, 1].

Example. Here is a method of generating a random variable X from Exp(λ) from a uniform random variable

over [0, 1]. We have already calculated that for an Exp(λ), the CDF

F (x) = 1 − e

−λx

when x ≥ 0. Thus, F

−1

(u) will be

−1

(u) =

−1

log(1 − u).

So the random variable

W = F

−1

(U) =

−1

log(1 − U)

will be an Exp(λ) random variable.

https://en.wikipedia.org/wiki/Cram%C3%A9r%E2%80%93von_Mises_criterion

https://en.wikipedia.org/wiki/Anderson%E2%80%93Darling_test