## 1. Introduction

The Vapnik-Chervonenkis-dimension (in short VC-dimension) is an important combinatorial concept and has interesting applications in different fields, such as discrepancy theory (see for example [Matousek1999], [Matousek1993], and [Hinrichs2004]), connections to dispersion (see [Rudolf2018]), measure theory (for literature see [Dudley1984], [Dudley2014], or [vdVaart1996]

), and machine learning (for literature see

[Mohri2018] or [Shalev2014]). The last became more and more popular over the last years and is one of the foundation stone of artificial intelligence.

The VC-dimension was introduced in [Vapnik1971]

and can be used to measure the ability of a model to classify datasets

^{1}

^{1}1We also mention here the related problem of sample compression which is treated for example in [Devroye1996], [Doliwa2014], [Moran2017], [Moran2016], and [Vapnik1998].. Given a dataset of points which can be labeled in different ways with labels (usually named

*positive*and

*negative*) and a

*hypothesis set*. It is said that the hypothesis set shatters the point set if for any labeling of the points in there is a hypothesis that separates the negative points from the positive points. The maximal number of points that can be shattered by a given hypothesis set is called the Vapnik-Chervonenkis-dimension of (a more formal definition can be found in Section 2).

Many classical examples of exactly computed VC-dimensions on the space are known whereas for these examples it is equivalent to consider . Already in [Vapnik1971] was established that the VC-dimension of half-spaces in dimension is . As a direct consequence one finds that the VC-dimension of spheres in dimension is also . Another example are axis-parallel boxes in either anchored at the origin or not with corresponding VC-dimensions and respectively. Furthermore, there is a new result by Despres [Despres] showing that the VC-dimension of cubes in is .

We denote by (resp. the set of -dimensional axis-parallel boxes (resp. cubes) within the -dimensional torus, see Section 2 for precise definitions. With this we can state our main result. We are mainly interested in the case of axis-parallel boxes, but since the exact same method also works for cubes, we treat them as well.

###### Theorem 1.1.

For sufficiently large we have

This shows in particular that .

###### Remark.

For small dimensions, we know the exact values , , , and the lower bound which were determined using computer assistance (personal communication with Manfred Scheucher, TU Berlin).

This result is interesting for a few reasons.
First of all, the VC-dimension does not grow linearly in contrast to the (very similar looking) examples stated above.
Secondly, there is an essential difference for the VC-dimension of boxes and cubes in which seems to vanish when considering the -dimensional Torus.
Lastly, the example of boxes in can also be considered as the direct product of times the interval . For such products there have been a few general lower bounds and some upper bounds. For example [Dudley2014], p. 192–200 contains results which implies the VC-dimension for boxes. Not a lot of general lower bounds are known for different hypothesis sets and usually there are some strong assumptions connected to the sets. Some interesting upper bounds are known, besides the ones in aforementioned sources, such as one given by van der Vaart and Wellner in [vdVaart2009] including the so-called *entropy* of the involved sets.
The shortage of such general bounds makes our result even more interesting since axis-parallel boxes on the torus do not fulfill any of the assumption usually used to calculate the VC-dimension of such hypothesis sets.

The paper is structured as follows. In Section 2 we give some basic definitions and discuss the VC-dimension of a very simple subclass of called stripes. We give an upper bounds for in Section 3 which solely relies on quite precise counting. The heart of the paper is Section 4 where we establish the lower bound for . This relies on a sophisticated scheme which allows to build configurations of points (based on the result for stripes) that can be shattered by .

## 2. Basic definitions and stripes on the torus

### 2.1. Definitions

First we give some basic definitions that are needed throughout the paper. For two functions, and , where only takes strictly positive real values such that is bounded, we write or . If we write and if we write .

Furthermore, we give a formal definition of the VC-dimension.

###### Definition 2.1.

Let be a set family (a set of sets) and a set. Their intersection is defined as the following set-family:

We say that a set is *shattered* by if contains all the subsets of , i.e.:

The *VC-dimension of *, which we denote by , is the largest integer such that there exists a set with cardinality that is shattered by .

We are working on the torus , i.e. the interval where we identify with .
Thus, an interval takes either the form for or for .
We define open intervals analogously.
We can also assign each interval a *length* which is given by and corresponding to the cases described above.

This allows us to define the set of *-dimensional axis-parallel boxes*, within the -dimensional torus, as the product of intervals on the torus i.e. for all .
Furthermore, we define the subset of *-dimensional axis-parallel cubes* , as axis-parallel boxes where the intervals have the same length.

It is clear that shattering a set is equivalent to shattering the same set , where denotes the element-wise complement of .

Thus, we can also work with the element-wise complement of , in which element are of the form

We are especially interested in the case where for all .

###### Definition 2.2.

We call a *-dimensional stripe anchored in dimension * if .
Furthermore, we denote by the set of all -dimensional stripes (anchored in some dimension ).
Moreover, we define its *length* to be and denote by the set of all -dimensional stripes of length .

###### Remark.

We note that it is equivalent for stripes to be defined on or as .

Note that in particular the complement of contains the union of -dimensional stripes anchored in different dimensions. Moreover, the complement of contains the union of stripes anchored in different dimensions with equal length.

### 2.2. The special subclass

This part is devoted to estimating

.###### Proposition 2.3.

We have for any and sufficiently large ,

This shows in particular that .

###### Proof.

To show the first inequality, it is sufficient to construct a set of points in dimension that are shattered by . Therefore, we pair the subsets of as . Obviously, there are such pairs. Now we choose the coordinates in dimension such that the points belonging to have coordinates equal to and the points belonging to have coordinates equal to . These coordinates have distance , so that any interval of length can contain at most one of them.

Thus, we can construct for any some such that . Indeed write or . In any case we can choose a stripe anchored in dimension with length that contains either or and avoids the other.

It remains to show the upper bound. Therefore, we count the number of ways in which points can be separated with -dimensional stripes. Let us take now points in . The coordinates of these points in dimension are cyclically ordered, so we can only take intervals in this order. Therefore, there are at most ways that a stripe anchored in dimension can separate points. Thus, we have in total at most ways to separate points. Assuming that shatters points in dimension gives, therefore,

This gives in particular for ,

which obviously does not hold for large enough . This shows that for large enough we cannot shatter points. ∎

## 3. Upper bounds for

Similar computations also give surprisingly accurate upper bounds for . We first give an easy version which is computed almost identically to the upper bound for . Basically the same upper bound was also given by van der Vaart and Wellner in [vdVaart2009].

###### Lemma 3.1.

We have for sufficiently large ,

###### Proof.

We note that, similarly to the upper bound for , there are at most different ways to intersect points when one only looks at one dimension. Therefore, there are in total at most different ways to intersect points in dimensions, i.e.

(1) |

Thus, assuming that shatters points, we have

Now if one puts this gives

which does not hold for large enough . This shows that for large enough we cannot shatter points. ∎

The main over-simplification in the proof above origins from considering all the coordinates independently. Now we give a more involved, but also more accurate estimate.

###### Theorem 3.2.

For sufficiently large we have

###### Remark.

The factor is not optimized and could for example be replaced by for any or equivalently by , i.e. letting depend on .

###### Proof.

We fix a set of points in . Given , we can write . We define restricted versions of via

We have in particular .

This allows us to define via

A simple computation gives .

Now we want to count how many different ways some can separate points of for some given . We start by fixing some where each and (the case where some will be dealt with separately).

We start by proving

To see this inequality we look again at the coordinates of the points in in dimension . These are again cyclically ordered, so that there are at most different ways of separating consecutive points. Furthermore, we denote by

We find by the discussion above that and for any we have .

Next we find as ,

We find by the same reasoning as above that for any we have

This shows in total that

An inductive argument proves (since ),

We find in total

We already have a good upper bound for the case where for all and we compute this one explicitly,

We see that the inner most sum is actually

in reverse order. One can bound this sum from above by an integral

(2) |

Iterating this procedure gives in total

where denotes the double factorial.

Now we discuss the case where at least one of the . Suppose that there are indices such that . One sees that this is equivalent to changing and ignoring the coordinates , which forces the remaining to fulfill . This gives now the total estimate

Assuming , we find

(3) |

Thus, we have improved on (1) by more than a factor . We rewrite this expression and use Stirling’s formula to find

Thus, assuming that shatters points, we have

Now if one puts this gives

or equivalently

which obviously does not hold for large enough . This shows that for large enough we cannot shatter points. ∎

## 4. A lower bound for

To find a lower bound one naturally aims to construct some configuration of points that can be shattered by . The main idea is to start with a set of points that can be shattered by for some and map them multiple times into such that they can be shattered by . Therefore, we will use the following concept.

### 4.1. Extraction Property

###### Definition 4.1.

We say a matrix with entries in has the *-extraction property* if for any there exist which are pairwise different, such that for all .

###### Example.

We show that the following matrix has the -extraction property. is given as follows,

We see that appears in every row exactly once and always in different columns. Let us take now any word . We can choose for each with . The remaining can be simply chosen by picking the first column, that has not yet been used, for each row that contains .

Now we show how the -extraction property can be used to find lower bounds for . In particular, it allows to transfer lower bounds of to lower bounds for . This will be the main tool for the proof of our lower bounds of .

###### Proposition 4.2.

Suppose there exists a matrix with the -extraction property. Then for any ,

(4) |

###### Remark.

The construction ensures that we only need cubes with length or equivalently stripes with length .

###### Proof.

Let us denote and pick a set of points that is shattered by . We write . Furthermore, we have a matrix with entries in fulfilling the -extraction property. This allows us to define new points , where . We write them as . We set for ,

(5) |

where is the unique integer such that .

We see in particular that for all . Thus, the points are grouped into sets of size , all of which are contained in .

It remains to show that we can shatter these points by . As already mentioned earlier, we work with the complement of instead, which contains the set of unions of stripes of equal length.

We take now an arbitrary subset of . Our goal is to find a union of stripes of length such that its intersection with is .

We define for each a set and . As we assumed that we can shatter by , there exists a -dimensional stripe of length anchored in some dimension such that intersection with is . We denote it by . Finally we define for .

As has the -extraction property we find some pairwise different such that for all . Now for a given we define a stripe anchored in dimension by . It is clear that this stripe has length and we claim that its intersection with is .

Indeed it cannot contain any points of the form for as they belong to which is disjoint from which contains the stripe. Thus, we only need to consider for . By the definition of the stripe, we are only interested in the -th coordinate. We have by definition

where is given by . By the extraction choice we have that . This shows that . We have in total that if and only if , which is by definition the case if and only if or equivalently .

Thus, as , we can cover with stripes with length anchored in pairwise different dimensions. Now it just remains to choose the stripes for the remaining dimensions which contain no points. The second inequality is simply an application of Theorem 3.2. ∎

### 4.2. An equivalent definition for the extraction property and a random construction

This equivalent definition will be particularly useful for the generation of matrices with the extraction property.

###### Proposition 4.3.

A matrix with entries in does not satisfy the -extraction property if and only if there exist , and of cardinality such that for all there is such that for all if then .

###### Proof.

The matrix has not the -extraction property if and only if there exists some , such that there exist no pairwise different such that for all .

We assign now to any a set .
With this notation, we have the equivalent statement that there exists no injective function such that .
This is by Hall’s Marriage Theorem equivalent to the violation of the *marriage condition* which states that for any we have .

If fails the marriage condition, then with , we have , hence we can pick of cardinality . Then satisfies the required condition of the proposition. Reciprocally if satisfies the condition then fails the marriage condition. ∎

###### Remark.

If satisfies the property of Proposition 4.3 then we say that *witness the failure of the -extraction property*.

It remains to find good matrices with the -extraction property, i.e. for given we want to be as large as possible. The next lemma gives a non-constructive argument ensuring the existence of matrices with the -extraction property.

###### Lemma 4.4.

Let with . Let and be integers. Assume that is an integer, and

(6) |

Set , and . There exists a matrix with the -extraction property.

###### Proof.

Set . We say that a word of length is *balanced* if each symbol of appears exactly times. We call a matrix *balanced* if each row corresponds to a balanced word. The number of balanced words of length is

(7) |

Therefore the number of balanced matrices is

(8) |

We shall compute an upper bound for the number of balanced matrices not having the -extraction property, by looking at the ones failing the equivalent property seen in Proposition 4.3.

Let , be of cardinality and be of cardinality . Let . Denote by the number of balanced words of length such that the symbol can only appear at positions in . We have

as we can start by picking the positions where appears, among the possible positions, then pick all other symbols (among the positions left).

Denote by the number of balanced words of length such that there is a symbol that can only appear at positions in . We have

(9) |

Indeed the set of words counted by is the union for of the set of words counted by .

The number of balanced matrices, such that witness the failure of the -extraction property is therefore

Indeed, for each line, whose index is in , we have to pick a word with at least one symbol appearing only at positions within (this is counted by ). For other lines we can pick any balanced word (counted by ). Finally denote by the number of balanced matrices, such that there are of cardinality and of cardinality that witness the failure of the extraction property. We have

(10) |

Comments

There are no comments yet.