Measurability Aspects of the Compactness Theorem for Sample Compression Schemes

05/25/2012 ∙ by Damjan Kalajdzievski, et al. ∙ 0

It was proved in 1998 by Ben-David and Litman that a concept space has a sample compression scheme of size d if and only if every finite subspace has a sample compression scheme of size d. In the compactness theorem, measurability of the hypotheses of the created sample compression scheme is not guaranteed; at the same time measurability of the hypotheses is a necessary condition for learnability. In this thesis we discuss when a sample compression scheme, created from com- pression schemes on finite subspaces via the compactness theorem, have measurable hypotheses. We show that if X is a standard Borel space with a d-maximum and universally separable concept class C, then (X,C) has a sample compression scheme of size d with universally Borel measurable hypotheses. Additionally we introduce a new variant of compression scheme called a copy sample compression scheme.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1.1 Vapnik-Chervonenkis Dimension

We begin with the definitions of a concept space and the VC dimension associated to a concept space.

A concept space is a pair consisting of a set equipped with a set of subsets of . is referred to as the domain, and is referred to as the concept class. For a subset of , denote

and we say that is a subspace of if and .

[[vapnik:264]] We say that a subset of is shattered by if .

[[vapnik:264]] The Vapnik-Chervonenkis dimension or VC-dimension of (denoted , or when is understood) is

In particular if the value is infinite, we say .

The following are some elementary or well known examples of VC dimension which can be found in every text on statistical learning.

Example

Let be any infinite set and , Then clearly because every (finite) has and so is shattered.

Example

Let be any totally ordered set with at least two elements, and let

where is an initial segment of . For any where , without loss of generality , we have

hence is shattered, however and so is not shattered. Therefore .

Example

Let and

Clearly shatters . Now let

be given. Without loss of generality is the leftmost point, is the highest point, is the rightmost point, and is the lowest point. Since

we have

and so

Therefore =4.

Unless otherwise specified, from now on we will consider to be our concept space, and .

[[vapnik:264]] The n’th shatter coefficients of are defined to be

Note that .

Notation

Let denote

[Sauer-Shelah Lemma [MR0307902]] Let . Then

We can consider as “function class"; a family of valued functions on : Let

where is the indicator function of on . Similarly, if is a family of valued functions on we can get a concept class

Defining shattering for a function class as: is shattered by if . We can see that shatters iff shatters , and shatters iff shatters so the two notions are equivalent.
In the future we will consider concepts as functions, but will still use set relations and operations on concepts, which will have the obvious meaning; for instance will be the same as , the same as support support, the same as , etc.

1.2 Maximum and Maximal Classes

The following definitions are due to [welzl87rangespaces]. Let . A concept class is d-maximum if for every finite,

A concept class is d-maximal if ,
and for any we have .

Note that if is -maximum, then because for , if then

so is shattered, and if then

so is not shattered.

As a consequence of Zorn’s Lemma every concept class of VC dimension is contained in a -maximal concept class.

Maximum does not necessarily imply maximal and vice versa. Also note that if is -maximum, any subspace of is -maximum as well, but this is not necessarily the case for -maximal.

Example

Let ,
. It is easy to check is -maximal but not -maximum since

Example ([Floyd95samplecompression])

Let ,
. It is easy to check is -maximal but not -maximum since

Example

Let and . For any finite, without loss of generality with , we have that

thus is -maximum. However, is not -maximal since . Note that any concept space where is totally ordered with no minimal element, and where is the set of all initial segments, is -maximum. This is also the case if has at least two elements, where is the set of all initial segments and the empty set.

Remark

If is finite, then -maximum implies -maximal.
If is -maximum, then any has

hence by Sauer’s Lemma , and therefore is -maximal.

[[welzl87rangespaces]] Let be finite with VC-dimension . For , there are at most sets such that and .

Démonstration.

Let , and

Suppose

Then

thus by Sauer’s Lemma . Let be points in shattered by , and let . Now by the definition of , for each there is such that , hence

contradicting . ∎

[[welzl87rangespaces]] Let be finite with VC-dimension . The concept space is -maximum if and only if

Démonstration.

If is -maximum then by the definition

For the converse, we will use induction on .
If , then is maximum and

Assume the statement of the theorem is true for all where , and let have . Let and let . By the induction hypothesis, it suffices to show that

By lemma 1.2.7,
has size at most . Define

We will show is injective. Suppose there is in such that

If , then

and if , then

so without loss of generality . We get that

hence , a contradiction, therefore is injective. Finally,

1.3 Concepts as Relations

In this section we will look at concept spaces defined as a relation on a pair of sets. This will allow us to characterize useful notions of embeddings for concept spaces as found in [Ben-david98combinatorialvariability]. It will also allow us to define the dual concept space of a concept space.

We can define a concept class on a domain via a relation for some set , by where . Similarly given , the corresponding space in the form is . A subclass of is where , and . This is convenient for defining the idea of a dual to a concept space as follows: Given a concept space , the dual concept space of , denoted

is

The dual concept space of a space represented as , can be thought of as

[[Ben-david98combinatorialvariability]] Let , be concept spaces. An embedding from to is a function such that for every


A generalized embedding from to is a function and a function such that for every ,

is weakly (generalized) embeddable in if every finite subclass of is (generalized) embeddable in .

The above notions partially order any set of concept spaces; if there exists an embedding or generalized embedding from to , we will denote that

or

respectively.
If is weakly embeddable in , or weakly generalized embeddable in , we will denote that

or

respectively.

Let us say that and are bi-embeddable if and .

A concept space may have some redundant points in as far as is concerned, but we can reduce it to its essential information by setting:

separates the points of and is bi-embeddable to via the quotient map for , and mapping each equivalence class to its (choose any) representative for
.

Remark

(1) .

(2) .

Notation

In the proof of the next proposition and throughout the further text we use the notation for symmetric difference of a set; i.e.

[[Ben-david98combinatorialvariability]] If then .

Démonstration.

Let be a finite subset of that is shattered, let

and let , be the generalized embedding from into . is injective because for , there exists . Without loss of generality . We have:

In either case and so . This also shows that is injective, hence

and therefore is shattered in . ∎

[[Laskowski92vapnik-chervonenkisclasses]] For any class :

Démonstration.

Since , it suffices to show the first inequality. Let be a set of cardinality . One has via . Noting that is embeddable in any class of the same or greater VC-dimension, , and thus . Therefore and so . ∎

if and only if .

2.1 Introduction of Sample Compression Schemes

Sample compression schemes, introduced by Littlestone and Warmuth ([Littlestone86relatingdata]), are naturally arising algorithms which learn concepts by saving finite samples of concepts to subsets of size at most .

The following notations will be used in the definitions of sample compression schemes, and throughout the text.

Notation

For let

let

where and is the function restricted to the domain , and let

We can similarly define

Notation

For two functions , with , let

be the notation for extending .

For , an unlabelled sample compression scheme of size d on is a function

with the property that

A labelled sample compression scheme of size d on is a function

with the property that

We will call the range of a sample compression scheme the hypothesis class and denote it by .

Example

Let be any totally ordered set, and let be the set of all initial segments of . Defining

and

we will show and are unlabelled sample compression schemes of size on . Given a sample , if on its domain then and . Otherwise exists, and so

Thus is a sample compression scheme of size on .
Similarly for , if on its domain then and . Otherwise exists, and so

Therefore is also a sample compression scheme of size on .

If has an unlabelled compression scheme of size , then has a labelled compression scheme of size .

Démonstration.

Let have an unlabelled compression scheme of size . For every there is such that , and so any function where will be a labelled compression scheme of size . ∎

From now on we will only be dealing with unlabelled sample compression schemes unless otherwise mentioned.

[[Ben-david98combinatorialvariability]] If and has a (labelled or unlabelled) sample compression scheme of size , then also has a sample compression scheme of size and of the same type. If has a sample compression scheme of size , then every subspace has a sample compression scheme of size .

2.2 Compactness Theorem

[Compactness Theorem, Ben-David and Litman [Ben-david98combinatorialvariability]] A concept space has a sample compression scheme of size if and only if every finite subspace of has a sample compression scheme of size .

The compactness theorem is true for both types of sample compression schemes and similarly for all forms of extended sample compression schemes given in a following section. We will provide the proof of the theorem for unlabelled sample compression schemes. The proof we provide is simpler and more direct than the proof in [Ben-david98combinatorialvariability] which is based on the Compactness Theorem of Predicate Logic. We use an approach with ultralimits, normally used in Analysis. (For preliminary information on filters and ultrafilters, see appendix A.2)

Démonstration.

Necessity: By corollary 2.1.7 if has a sample compression scheme of size every (finite) subspace of has a sample compression scheme of size .
Sufficiency: For all denote the sample compression scheme of size for as . Let be an ultrafilter on containing the filter base

Define as

Note for given , is defined as the ultralimit of the net of zeros and ones along .

We will show is a sample compression scheme of size on . Let , and denote . Note that

(1)

We have that is finite so let . For letting

by (1) we see that

thus, by a property of ultrafilters, such that . Let and let

We have