1.1 Vapnik-Chervonenkis Dimension
We begin with the definitions of a concept space and the VC dimension associated to a concept space.
A concept space is a pair consisting of a set equipped with a set of subsets of . is referred to as the domain, and is referred to as the concept class. For a subset of , denote
and we say that is a subspace of if and .
[[vapnik:264]] We say that a subset of is shattered by if .
[[vapnik:264]] The Vapnik-Chervonenkis dimension or VC-dimension of (denoted , or when is understood) is
In particular if the value is infinite, we say .
The following are some elementary or well known examples of VC dimension which can be found in every text on statistical learning.
Let be any infinite set and , Then clearly because every (finite) has and so is shattered.
Let be any totally ordered set with at least two elements, and let
where is an initial segment of . For any where , without loss of generality , we have
hence is shattered, however and so is not shattered. Therefore .
Clearly shatters . Now let
be given. Without loss of generality is the leftmost point, is the highest point, is the rightmost point, and is the lowest point. Since
Unless otherwise specified, from now on we will consider to be our concept space, and .
[[vapnik:264]] The n’th shatter coefficients of are defined to be
Note that .
[Sauer-Shelah Lemma [MR0307902]] Let . Then
We can consider as “function class"; a family of valued functions on : Let
where is the indicator function of on . Similarly, if is a family of valued functions on we can get a concept class
Defining shattering for a function class as: is shattered by if . We can see that shatters iff shatters , and shatters iff shatters so the two notions are equivalent.
In the future we will consider concepts as functions, but will still use set relations and operations on concepts, which will have the obvious meaning; for instance will be the same as , the same as support support, the same as , etc.
1.2 Maximum and Maximal Classes
The following definitions are due to [welzl87rangespaces]. Let . A concept class is d-maximum if for every finite,
A concept class is d-maximal if ,
and for any we have .
Note that if is -maximum, then because for , if then
so is shattered, and if then
so is not shattered.
As a consequence of Zorn’s Lemma every concept class of VC dimension is contained in a -maximal concept class.
Maximum does not necessarily imply maximal and vice versa. Also note that if is -maximum, any subspace of is -maximum as well, but this is not necessarily the case for -maximal.
. It is easy to check is -maximal but not -maximum since
. It is easy to check is -maximal but not -maximum since
Let and . For any finite, without loss of generality with , we have that
thus is -maximum. However, is not -maximal since . Note that any concept space where is totally ordered with no minimal element, and where is the set of all initial segments, is -maximum. This is also the case if has at least two elements, where is the set of all initial segments and the empty set.
If is finite, then -maximum implies -maximal.
If is -maximum, then any has
hence by Sauer’s Lemma , and therefore is -maximal.
[[welzl87rangespaces]] Let be finite with VC-dimension . For , there are at most sets such that and .
Let , and
thus by Sauer’s Lemma . Let be points in shattered by , and let . Now by the definition of , for each there is such that , hence
contradicting . ∎
[[welzl87rangespaces]] Let be finite with VC-dimension . The concept space is -maximum if and only if
If is -maximum then by the definition
For the converse, we will use induction on .
If , then is maximum and
Assume the statement of the theorem is true for all where , and let have . Let and let . By the induction hypothesis, it suffices to show that
By lemma 1.2.7,
has size at most . Define
We will show is injective. Suppose there is in such that
If , then
and if , then
so without loss of generality . We get that
hence , a contradiction, therefore is injective. Finally,
1.3 Concepts as Relations
In this section we will look at concept spaces defined as a relation on a pair of sets. This will allow us to characterize useful notions of embeddings for concept spaces as found in [Ben-david98combinatorialvariability]. It will also allow us to define the dual concept space of a concept space.
We can define a concept class on a domain via a relation for some set , by where . Similarly given , the corresponding space in the form is . A subclass of is where , and . This is convenient for defining the idea of a dual to a concept space as follows: Given a concept space , the dual concept space of , denoted
The dual concept space of a space represented as , can be thought of as
[[Ben-david98combinatorialvariability]] Let , be concept spaces. An embedding from to is a function such that for every
A generalized embedding from to is a function and a function such that for every ,
is weakly (generalized) embeddable in if every finite subclass of is (generalized) embeddable in .
The above notions partially order any set of concept spaces; if there exists an embedding or generalized embedding from to , we will denote that
If is weakly embeddable in , or weakly generalized embeddable in , we will denote that
Let us say that and are bi-embeddable if and .
A concept space may have some redundant points in as far as is concerned, but we can reduce it to its essential information by setting:
separates the points of and is bi-embeddable to via the quotient map for , and mapping each equivalence class to its (choose any) representative for
In the proof of the next proposition and throughout the further text we use the notation for symmetric difference of a set; i.e.
[[Ben-david98combinatorialvariability]] If then .
Let be a finite subset of that is shattered, let
and let , be the generalized embedding from into . is injective because for , there exists . Without loss of generality . We have:
In either case and so . This also shows that is injective, hence
and therefore is shattered in . ∎
[[Laskowski92vapnik-chervonenkisclasses]] For any class :
Since , it suffices to show the first inequality. Let be a set of cardinality . One has via . Noting that is embeddable in any class of the same or greater VC-dimension, , and thus . Therefore and so . ∎
if and only if .
2.1 Introduction of Sample Compression Schemes
Sample compression schemes, introduced by Littlestone and Warmuth ([Littlestone86relatingdata]), are naturally arising algorithms which learn concepts by saving finite samples of concepts to subsets of size at most .
The following notations will be used in the definitions of sample compression schemes, and throughout the text.
where and is the function restricted to the domain , and let
We can similarly define
For two functions , with , let
be the notation for extending .
For , an unlabelled sample compression scheme of size d on is a function
with the property that
A labelled sample compression scheme of size d on is a function
with the property that
We will call the range of a sample compression scheme the hypothesis class and denote it by .
Let be any totally ordered set, and let be the set of all initial segments of . Defining
we will show and are unlabelled sample compression schemes of size on . Given a sample , if on its domain then and . Otherwise exists, and so
Thus is a sample compression scheme of size on .
Similarly for , if on its domain then and . Otherwise exists, and so
Therefore is also a sample compression scheme of size on .
If has an unlabelled compression scheme of size , then has a labelled compression scheme of size .
Let have an unlabelled compression scheme of size . For every there is such that , and so any function where will be a labelled compression scheme of size . ∎
From now on we will only be dealing with unlabelled sample compression schemes unless otherwise mentioned.
[[Ben-david98combinatorialvariability]] If and has a (labelled or unlabelled) sample compression scheme of size , then also has a sample compression scheme of size and of the same type. If has a sample compression scheme of size , then every subspace has a sample compression scheme of size .
2.2 Compactness Theorem
[Compactness Theorem, Ben-David and Litman [Ben-david98combinatorialvariability]] A concept space has a sample compression scheme of size if and only if every finite subspace of has a sample compression scheme of size .
The compactness theorem is true for both types of sample compression schemes and similarly for all forms of extended sample compression schemes given in a following section. We will provide the proof of the theorem for unlabelled sample compression schemes. The proof we provide is simpler and more direct than the proof in [Ben-david98combinatorialvariability] which is based on the Compactness Theorem of Predicate Logic. We use an approach with ultralimits, normally used in Analysis. (For preliminary information on filters and ultrafilters, see appendix A.2)
Necessity: By corollary 2.1.7 if has a sample compression scheme of size every (finite) subspace of has a sample compression scheme of size .
Sufficiency: For all denote the sample compression scheme of size for as . Let be an ultrafilter on containing the filter base
Note for given , is defined as the ultralimit of the net of zeros and ones along .
We will show is a sample compression scheme of size on . Let , and denote . Note that
We have that is finite so let . For letting
by (1) we see that
thus, by a property of ultrafilters, such that . Let and let