A Constructive Proof of the Glivenko-Cantelli Theorem

10/25/2021
by   Daniel Salnikov, et al.
0

The Glivenko-Cantelli theorem states that the empirical distribution function converges uniformly almost surely to the theoretical distribution for a random variable X ∈ℝ. This is an important result because it establishes the fact that sampling does capture the dispersion measure the distribution function F imposes. In essence, sampling permits one to learn and infer the behavior of F by only looking at observations from X. The probabilities that are inferred from samples 𝐗 will become more precise as the sample size increases and more data becomes available. Therefore, it is valid to study distributions via samples. The proof present here is constructive, meaning that the result is derived directly from the fact that the empirical distribution function converges pointwise almost surely to the theoretical distribution. The work includes a proof of this preliminary statement and attempts to motivate the intuition one gets from sampling techniques when studying the regions in which a model concentrates probability. The sets where dispersion is described with precision by the empirical distribution function will eventually cover the entire sample space.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

12/18/2021

A Machine-Checked Direct Proof of the Steiner-Lehmus Theorem

A direct proof of the Steiner-Lehmus theorem has eluded geometers for ov...
01/19/2018

A Smeary Central Limit Theorem for Manifolds with Application to High Dimensional Spheres

The (CLT) central limit theorems for generalized Frechet means (data des...
01/28/2017

A proof of Hilbert's theorem on ternary quartic forms with the ladder technique

This paper proposes a totally constructive approach for the proof of Hil...
12/28/2018

Towards a constructive formalization of Perfect Graph Theorems

Interaction between clique number ω(G) and chromatic number χ(G) of a ...
01/12/2018

Deep Learning for Sampling from Arbitrary Probability Distributions

This paper proposes a fully connected neural network model to map sample...
04/07/2021

Concentration bounds for the empirical angular measure with statistical learning applications

The angular measure on the unit sphere characterizes the first-order dep...
01/12/2018

How Many Samples Required in Big Data Collection: A Differential Message Importance Measure

Information collection is a fundamental problem in big data, where the s...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Abstract

The Glivenko-Cantelli theorem states that the empirical distribution function converges uniformly almost surely to the theoretical distribution for a random variable . This is an important result because it establishes the fact that sampling does capture the dispersion measure the distribution function imposes. In essence, sampling permits one to learn and infer the behavior of by only looking at observations from . The probabilities that are inferred from samples will become more precise as the sample size increases and more data becomes available. Therefore, it is valid to study distributions via samples. The proof present here is constructive, meaning that the result is derived directly from the fact that the empirical distribution function converges pointwise almost surely to the theoretical distribution. The work includes a proof of this preliminary statement and attempts to motivate the intuition one gets from sampling techniques when studying the regions in which a model concentrates probability. The sets where dispersion is described with precision by the empirical distribution function will eventually cover the entire sample space.

2 Preliminaries

The following definitions will be used throughout the proof.

Definition.

Empirical Distribution Function [mood]
Let be a random variable, and be a random sample from , the empirical distribution function assigns probabilities by counting how many observations are smaller than , in essence:

As a preliminary result, the work includes a proof of the following theorem:

Theorem 1.

Let be a random variable, assume that and that is a random sample from , then the empirical distribution function converges almost surely to the theoretical distribution for each point .

Proof.

For , it is possible to index a stochastic process as follows:

Therefore, and

Then for each fixed the random variable is a Bernoulli random variable that arises from indexing the stochastic process by points in the sample space of . Moreover, for all the

have finite variances and means,

Thus, by The Strong Law of Large Numbers

[rigProb, rossMod], for fixed the sample mean of the converges almost surely to the mean,

Since the choice of is arbitrary, then for all fixed it is true that

It is worth noting that the speed of convergence depends on how heavy the tails are, kurtosis and asymmetry of

. The more probability there is concentrated far from the center, the more the process will require larger samples to cover those regions empirically.

3 Constructive proof

Theorem 2 (Glivenko-Cantelli Theorem).

Let be a random variable, assume and that is a random sample from , then as the sample size approaches infinity the empirical distribution function converges uniformly almost surely to the theoretical distribution. In essence,

The following proof is motivated by ideas used in the proof of Egorov’s theorem in [grabs].

Proof.

Assume that and let be a random sample from . Define as:

Then by theorem (1),

Now for a fixed and define as:

Thus, it is possible to verify that

In essence, once the points fall into the ball no further empirical distribution functions in the sequence escape from the convergence radius. Also, for all and there exists a such that if , then

Thus,

And,

Furthermore, for all , by theorem (1), as the sample size increases, it is possible to find the required among the integers. Therefore, the union -over all integers of points where the sequence does not escape the convergence radius- will include ,

Hence, implies that , so . Now, consider the following:

So that

And, it is possible to find to get

Then,

Now fix and , and note that, given the previous limits, it is possible to find such that

Then, for all possible values of , it is true that

Since is arbitrary, as , the limit results in:

However, if , then for all , , it is true that

In essence, if , then uniformly. And, note that has the following probability

Therefore, the probability that uniformly is equal to one, in essence, uniformly almost surely.

4 Discussion

The proof exhibits how the observed variation in sample represents the dispersion that is characterized by the model . Thus, as the sample size increases, the empirical process will be a more efficient representation of the model. Nonetheless, the theorem requires the use of nice intervals, and in higher dimensions classes of sets, as well as strong assumptions about independence between different observations. However, it also motivates further study of how empirical processes and nonparametric methods can be used to study the concentration of measure that characterizes the data under analysis.

Appendix

The sampling process eventually covers all areas with positive probability. If is discrete, then

Thus if , then as

Similarly if

denotes the range a sample from an absolutely continuous random variable covers, then the probability that the sampling process will not cover a particular point is:

Note that the intersection is an absurdity ( and ), thus, it is the empty set and it has zero probability. Moreover, as the sampling process will eventually sample from less concentrated regions. Suppose that , then

And,

Therefore as the sample size increases the range of coverage grows to capture both the left and right tails. In essence, if , then as ,

This motivates the study of how samples from the empirical distribution and a possible guess/initial distribution could report the regions in which appears to concentrate more probability. Suppose that is a partition of , then the observed clusters should resemble the concentration that a Dirichlet distribution defined over establishes for each interval . This connection between order statistics, the empirical distribution function and the measure concentrated in each observed and/or theorized interval sets and motivates the construction of the Dirichlet process and other nonparametric techniques. Moreover, it suggests that boundaries at the tail might need specific techniques to observe such data in a sample.

Notation

is Dirac measure with respect to the set and point .

  • : Distribution function of the random variable ;

  • : Probability measure of said event;

  • : Probability measure of the event under model’s assumption;

  • : Smallest order statistic;

  • : Largest order statistic;

  • : Sample space of the random variable , all possible observations belong to this set;

  • : Random sample assumed to arise from the model . A finite collection of independent and identically distributed random variables that are described by the dispersion model ;

  • : Indicator function for with respect to the set .

References