Order Statistics 1 Introduction and Notation

Order Statistics

1 Introduction and Notation

Let X

, X

, . . . , X

be a random sample of size 15 from the uniform distribution over the interval

(0, 1). Here are three diﬀerent realizations realization of such samples.

Because these samples come from a uniform distribution, we expect them to be spread out “ran-

domly” and “evenly” across the interval (0, 1). (You might think that you are seeing some sort of

clustering but keep in mind that you are looking at a measly selection of only three samples. After

collecting more samples I’m sure your view would change!)

Consider the single smallest value from each of these three samples, highlighted here.

Collect the minimums onto a single graph.

Not surprisingly, they are down towards zero! It would be pretty diﬃcult to get a sample of 15

uniforms on (0, 1) that has a minimum up by the right endpoint of 1. In fact, we will show that if

we kept collecting minimums of samples of size 15, they would have a probability density function

that looks like this.

Notation: Let X

, X

, . . . , X

be a random sample of size n from some distribution. We denote

the order statistics by

(1)

= min(X

, X

, . . . , X

)

(2)

= the 2nd smallest of X

, X

, . . . , X

. =

(n)

= max(X

, X

, . . . , X

)

(Another commonly used notation is X

1:n

, X

2:n

, . . . , X

n:n

for the min through the max, respec-

tively.)

In what follows, we will derive the distributions and joint distributions for each of these statistics

and groups of these statistics. We will consider continuous random variables only. Imagine

taking a random sample of size 15 from the geometric distribution with some ﬁxed parameter p.

The chances are very high that you will have some repeated values and not see 15 distinct values.

For example, suppose we observe 7 distinct values. While it would make sense to talk about the

minimum or maximum value here, it would not make sense to talk about the 12th largest value in

this case. To further confuse the matter, the next sample might have a diﬀerent number of distinct

values! Any analysis of the order statistics for this discrete distribution would have to be well-

deﬁned in what would likely be an ad hoc way. (For example, one might deﬁne them conditional

on the number of distinct values observed.)

2 The Distribution of the Minimum

Suppose that X

, X

, . . . , X

is a random sample from a continuous distribution with pdf f and

cdf F . We will now derive the pdf for X

(1)

, the minimum value of the sample. For order statistics,

it is usually easier to begin by considering the cdf. The game plan will be to relate the cdf of the

minimum to the behavior of the individual sampled values X

, X

, . . . , X

for which we know the

pdf and cdf.

The cdf for the minimum is

(1)

(x) = P (X

(1)

≤ x).

Imagine a random sample falling in such a way that the minimum is below a ﬁxed value x. It might

look like this

or this

or even this.

In other words,

(1)

(x) = P (X

(1)

≤ x) = P ( at least one of X

, X

, . . . , X

is ≤ x).

There are many ways for the individual X

to fall so that the minimum is less than or equal to x.

Considering all of the possibilities is a lot of work! On the other hand, the minimum is greater

than x if and only if all the X

are greater than x. So, it is easy to relate the probability P (X

(1)

> x)

back to the individual X

. Thus, we consider

(1)

(x) = P (X

(1)

≤ x) = 1 − P (X

(1)

> x)

= 1 − P ( X

> x, X

> x, . . . , X

> x )

= P (X

> x)P (X

> x) · · · P (X

> x) by independence

= 1 − [P (X

> x)]

because the X

are identically distributed

= 1 − [1 − F (x)]

So, we have that the pdf for the minimum is

(1)

(x) =

(1)

(x) =

{1 − [1 − F (x)]

}

= n[1 − F (x)]

n−1

f(x)

Going back to the uniform example of Section 1, we had f(x) = I

(0,1)

(x) and

F (x) =











0 , x < 0

x , 0 ≤ x < 1

1 , x ≥ 1.

The pdf for the minimum in this case is

(1)

(x) = n[1 − x]

n−1

(0,∞)

(x).

This is the pdf for the Beta distribution with parameters 1 and n. Thus, we can write

(1)

∼ Beta(1, n).

3 The Distribution of the Maximum

Again consider our random sample X

, X

, . . . , X

from a continuous distribution with pdf f and

cdf F . We will now derive the pdf for X

(n)

, the maximum value of the sample. As with the

minimum, we will consider the cdf and try to relate it to the behavior of the individual sampled

values X

, X

, . . . , X

The cdf for the minimum is

(1)

(x) = P (X

(1)

≤ x).

Imagine a random sample falling in such a way that the maximum is below a ﬁxed value x. This

will happen if and only if all of the X

are below x.

Thus, we have

(n)

(x) = P (X

(n)

≤ x)

= P ( X

≤ x, X

≤ x, . . . , X

≤ x )

= P (X

≤ x)P (X

≤ x) · · · P (X

≤ x) by independence

= [P (X

≤ x)]

because the X

are identically distributed

= [F (x)]

Take the derivative, we get the pdf for the maximum to be

(n)

(x) =

(1)

(x) =

[F (x)]

= n[F (x)]

n−1

f(x)

In the case of the random sample of size 15 from the uniform distribution on (0, 1), the pdf is

X(n)

(x) = nx

n−1

(0,1)

(x)

which is the pdf of the Beta(n, 1) distribution.

Not surprisingly, all most of the probability or “mass” for the maximum is piled up near the right

endpoint of 1.

4 The Joint Distribution of the Minimum and Maximum

Let’s go for the joint cdf of the minimum and the maximum

(1)

(n)

(x, y) = P (X

(1)

≤ x, X

(n)

≤ y).

It is not clear how to write this in terms of the individual X

. Consider instead the relationship

P (X

(n)

≤ y) = P (X

(1)

≤ x, X

(n)

≤ y) + P (X

(1)

> x, X

(n)

≤ y). (1)

We know how to write out the term on the left-hand side. The ﬁrst term on the right-hand side is

what we want to compute. As for the ﬁnal term,

P (X

(1)

> x, X

(n)

≤ y),

note that this is zero if x ≥ y. (In this case, P (X

(1)

≤ x, X

(n)

≤ y) = P (X

(n)

≤ y) and (1) gives

us only P (X

(n)

≤ y) = P (X

(n)

≤ y) which is both true and uninteresting! So, we consider the case

that x < y. Note then that

P (X

(1)

> x, X

(n)

≤ y) = P (x < X

≤ y, x < X

≤ y, . . . , x < X

≤ y)

iid

= [P (x < X

≤ y)]

= [F (y) − F (x)]

Thus, from (1), we have that

(1)

(n)

(x, y) = P (X

(1)

≤ x, X

(n)

≤ y)

= P (X

(n)

≤ y) − P (X

(1)

> x, X

(n)

≤ y)

= [F (y)]

− [F (y) − F (x)]

Now the joint pdf is

(1)

(n)

(x, y) =

{[F (y)]

− [F (y) − F (x)]

}



n[F (y)]

n−1

f(y) − n[F (y) − F (x)]

n−1

f(y)



= n(n − 1)[F (y) − F (x)]

n−2

f(x)f(y).

This hold for x < y and for x and y both in the support of the original distribution.

For the sample of size 15 from the uniform distribution on (0, 1), the joint pdf for the min and max

(1)

(n)

(x, y) = 15 · 14 · [y − x]

(0,y)

(x) I

(0,1)

(y).

A Heuristic:

Since X

, X

, . . . , X

are assumed to come from a continuous distribution, the min and max are

also continuous and the joint pdf does not represent probability– it is a surface under

which volume represents probability. However, if we bend the rules and think of the joint pdf as

probability, we can develop a heuristic method for remembering it.

Suppose (though it is not true) that

(1)

(n)

(x, y) = P (X

(1)

= x, X

(n)

= y).

This would mean that we need one value in the sample X

, X

, . . . , X

to fall at x, one value to

fall at y, and the remaining n − 2 values to fall in between.

The “probability” one of the X

is x is “like” f(x). (Remember, we are bending the rules here

in order to develop a heuristic. This probability is, of course, actually 0 for a continuous random

variable.)

The “probability” one of the X

is y is “like” f(y).

The probability that one of the X

is in between x and y is (actually) F (y) − F (x).

The sample can fall many ways to give us a minimum at x and a maximum at y. For example,

imagine that n = 5. We might get X

= x, X

= y and the remaining X

, X

in between x and

This would happen with “probability”

f(x)[F (y) − F (x)]

f(y).

Another possibility is that we get X

= x and X

= y and the remaining X

, X

in between x

and y.

This would also happen with “probability”

f(x)[F (y) − F (x)]

f(y).

We have to add this “probability” up as many times as there are scenarios. So, let’s count them.

There are 5! diﬀerent ways to lay down the X

. For each one, there are 3! diﬀerent ways to lay

down the remaining values in between that will result in the same min and max. So, we need to

divide these redundancies out for a total of 5!/3! = (5)(4) ways to get that min at x and max at y.

In general, for a sample of size n, there are n! diﬀerent ways to lay down the X

. For each one,

there are (n − 2)! diﬀerent ways that result in the same min and max. So, there are a total of

n!/(n − 2)! = n(n − 1) ways to get that

Thus, the “probability” of getting a minimum of x and a maximum of y is

n(n − 1)f(x)[F (y) − F (x)]

n−2

f(y),

which looks an awful lot like the formula we derived above!

5 The Joint Distribution for All of the Order Statistics

We wish now to ﬁnd the pdf

(1)

(2)

,...,X

(n)

, x

, . . . , x

This time, we will start with the heuristic aid.

Suppose that n = 3 and we want to ﬁnd

(1)

(2)

(3)

, x

) “=” P (X

(1)

= x

, X

(2)

= x

, X

(3)

= x

The ﬁrst thing to notice is that this probability will be 0 if we don’t have x

< x

. (Note that

we need strict inequalities here. For a continuous distribution, we will never see repeated values so

the minimum and second smallest, for example, could not take on the same value.)

Fix values x

< x

. How could a sample of size 3 fall so that the minimum is x

, the next

smallest is x

, and the largest is x

? We could observe

= x

, X

= x

, X

= x

, X

= x

, X

= x

, X

= x

, X

= x

or...

There are 3! possibilities to list. The “probability” for each is f(x

)f(x

). Thus,

(1)

(2)

(3)

, x

) “=” P (X

(1)

= x

, X

(2)

= x

, X

(3)

= x

) = 3!f(x

)f(x

For general n, we have

(1)

(2)

,...,X

(n)

, x

, . . . , x

) “=” P (X

(1)

= x

, X

(2)

= x

, . . . X

(n)

= x

)

= n!f(x

)f(x

) · · · f(x

)

which holds for x

< x

< · · · < x

with all x

in the support for the original distribution. The

joint pdf is zero otherwise.

The Formalities:

The joint cdf,

P (X

(1)

≤ x

, X

(2)

≤ x

, . . . , X

(n)

≤ x

is a little hard to work with. Instead, we consider something similar:

P (y

< X

(1)

≤ x

, y

< X

(2)

≤ x

, . . . , y

< X

(n)

< x

)

for values y

< x

≤ y

< x

≤ y

< x

≤ · · · ≤ y

< x

This can happen if

< X

≤ x

, y

< X

≤ x

, . . . , y

< X

< x

or if

< X

≤ x

, y

< X

≤ x

, . . . , y

< X

n−2

< x

or...

Because of the constraints on the x

and y

, these are disjoint events. So, we can add these n!

probabilities, which will all be the same, together to get

P (y

< X

(1)

≤ x

, . . . , y

< X

(n)

< x

) = n! P (y

< X

≤ x

, . . . , y

< X

< x

Note that

P (y

< X

≤ x

, . . . , y

< X

< x

)

indep

i=1

P (y

< X

≤ x

) =

i=1

[F (x

) − F (y

)].

So,

P (y

< X

(1)

≤ x

, . . . , y

< X

(n)

< x

) = n!

i=1

[F (x

) − F (y

)] (2)

The left-hand side is

n−1

· · ·

(1)

(2)

,...,X

(n)

, u

, . . . , u

) du

. . . , du

Taking derivatives

· · ·

gives

(1)

(2)

,...,X

(n)

, x

, . . . , x

)

Diﬀerentiating both sides of (2) with respect to x

, x

, . . . , x

gives us

(1)

(2)

,...,X

(n)

, x

, . . . , x

) = n!f(x

)f(x

) · · · f(x

)

which holds for x

< x

< · · · , x

and all x

in the support of the original distribution. The pdf is

zero otherwise.

6 The Distribution of X

(i)

We can get the marginal pdf for the ith order statistic X

(i)

, by taking the joint pdf for all order

statistics from Section 5 and integrating out the unwanted x

Let’s start by integrating out x

. Since the support of the joint pdf for the order statistics includes

the constraint x

< x

< · · · < x

, limits of integration are −∞ to x

(2)

,...,X

(n)

, . . . , x

) =

−∞

(1)

(2)

,...,X

(n)

, x

, . . . , x

) dx

−∞

n!f(x

)f(x

) · · · f(x

) dx

= n!f(x

) · · · f(x

)

−∞

f(x

) dx

= n!f(x

) · · · f(x

)F (x

)

for x

< x

< · · · < x

Now let’s integrate out x

which goes from −∞ to x

(3)

,...,X

(n)

, . . . , x

) =

−∞

(2)

,...,X

(n)

, . . . , x

) dx

= n!f(x

) · · · f(x

)

−∞

F (x

)

| {z }

f(x

) dx

| {z }

= n!f(x

) · · · f(x

)

[F (x

)]



=−∞

= n!f(x

) · · · f(x

)

([F (x

)]

− [F (−∞)

| {z }

]

)

f(x

) · · · f(x

)[F (x

)]

which holds for x

< x

< · · · < x

The next time through, we will integrate out x

from −∞ to x

. Using u = F (x

) and du =

f(x

) dx

, we get

(4)

,...,X

(n)

, . . . , x

) =

(3)(2)

f(x

) · · · f(x

)[F (x

)]

Continue until we reach X

(i)

,...,X

(n)

, . . . , x

) =

(i − 1)!

f(x

) · · · f(x

)[F (x

)]

i−1

which holds for x

< x

i+1

< · · · < x

Now, we start integrating oﬀ x’s from the other side.

(i)

,...,X

(n−1)

, . . . , x

n−1

) =

∞

n−1

(i)

,...,X

(n−1)

, . . . , x

) dx

(i−1)!

f(x

) · · · f(x

n−1

)[F (x

)]

i−1

∞

n−1

f(x

) dx

(i−1)!

f(x

) · · · f(x

n−1

)[F (x

)]

i−1

[1 − F (x

n−1

)]

for x

< x

i+1

< · · · , x

n−1

(i)

,...,X

(n−2)

, . . . , x

n−2

) =

∞

n−2

(i)

,...,X

(n−1)

, . . . , x

n−1

) dx

n−1

(i−1)!

f(x

) · · · f(x

n−2

)[F (x

)]

i−1

∞

n−2

f(x

n−1

)[1 − F (x

n−1

)] dx

n−1

Letting u = 1 − F (x

n−1

) and du = −f(x

n−1

) dx

n−1

, we get

(i)

,...,X

(n−2)

, . . . , x

n−2

) =

(i−1)!

f(x

) · · · f(x

n−2

)[F (x

)]

i−1

−

[1 − F (x

n−1

)]

n−1

=∞

n−1

n−2

2(i−1)!

f(x

) · · · f(x

n−2

)[F (x

)]

i−1

[1 − F (x

n−2

)]

for x

< x

i+1

, · · · < x

n−2

The next time through we will integrate out x

n−2

from x

n−3

to ∞. Note that

∞

n−3

f(x

n−2

)[1 − F (x

n−2

)

| {z }

]

n−2

= −

[1 − F (x

n−2

)]



n−2

=∞

n−2

n−3

[1 − F (x

n−3

)]

Thus,

(i)

,...,X

(n−3)

, . . . , x

n−3

) =

(3)(2)(i − 1)!

f(x

) · · · f(x

n−3

)[F (x

)]

i−1

[1 − F (x

n−3

)]

for x

< x

i+1

< · · · < x

n−3

Continuing all the way down to the marginal pdf for X

(i)

alone, we get

(i)

(n−i)!(i−1)!

[F (x

)]

i−1

f(x

)[1 − F (x

n−3

)]

n−i

for −∞ < x

< ∞. (← This may be further restricted by indicators in f(x

).)

The Heuristic:

We once again will think of the continuous random variables X

, X

, . . . , X

as discrete and f

(i)

)

as the “probability” that the ith order statistic is at x

. First not that there are n! diﬀerent ways to

arrange the x

s. We need to put 1 at x

, which will happen with “probability” f(x

). We need to

put i − 1 below x

, which will happen with probability [F (x

)]

i−1

and we need to put n− i above x

which will happen with probability [1 − F (x

)]

n−i

. There are (i − 1)! diﬀerent ways to arrange the

x’s chosen to go below x

. These arrangements are redundant and need to be divided out. Hence,

we have (i − 1)! in the denominator. There are (n − i)! diﬀerent ways to arrange the x’s chosen to

go above x

. These arrangements are also redundant and need to be divided out. Thus, we also

have (n − i)! in the denominator.

7 The Joint Distribution of X

(i)

and X

(j)

for i < j

As in Section 6, one could start with the joint pdf for all of the order statistics and integrate out

the unwanted ones. The result will be

(i)

(j)

, x

) =

(i − 1)!(j − i − 1)!(n − j)!

[F (x

)]

i−1

f(x

)[F (x

)−F (x

)]

j−i−1

f(x

)[1−F (x

)]

n−j

for −∞ < x

< x

< ∞.

Can you convince yourself of this heuristically?