Log In Sign Up

On the entropy numbers and the Kolmogorov widths

by   Guergana Petrova, et al.

Direct estimates between linear or nonlinear Kolmogorov widths and entropy numbers are presented. These estimates are derived using the recently introduced Lipschitz widths. Applications for m-term approximation are obtained.


A remark on entropy numbers

Talagrand's fundamental result on the entropy numbers is slightly improv...

On Thermodynamic Interpretation of Copula Entropy

Copula Entropy (CE) is a recently introduced concept for measuring corre...

Lipschitz widths

This paper introduces a measure, called Lipschitz widths, of the optimal...

An improved estimate of the inverse binary entropy function

Two estimates for the inverse binary entropy function are derived using ...

On the entropy numbers of the mixed smoothness function classes

Behavior of the entropy numbers of classes of multivariate functions wit...

On the condition number of the total least squares problem with linear equality constraint

This paper is devoted to the condition number of the total least squares...

Learning sparse messages in networks of neural cliques

An extension to a recently introduced binary neural network is proposed ...

1. Introduction

We consider a Banach space (or a Hilbert space ) equipped with a norm and a compact subset of . Typically, is a finite ball in smoothness spaces like the Lipschitz, Sobolev, or Besov spaces.

A well known classical result, called the Carl’s inequality, see [2] or [7], compares a certain characteristic of the set , called entropy numbers , with its approximability by linear spaces, measured by its Kolmogorov width . The Carl’s inequality states that for each , there is a constant such that for all ,


Inequality (1.1) has been generalized in [10], where the nonlinear Kolmogorov widths have been used instead of the linear Kolmogorov widths . More precisely, it has been shown there that for each , there is a constant such that for all ,


with a fixed constant. In addition, it was also proven that for each , there is a constant such that for all ,


where is a fixed constant and cannot be replaced by a slower growing function of .

All these inequalities are primarily useful when the linear or nonlinear Kolmogorov widths decay as a power of . In this paper, we give finer extensions of the (generalized) Carl’s inequalities (1.1), (1.2) and (1.3), using the recently introduced in [8] Lipschitz widths. We start with some definitions, presented in §2, and continue, see §3, with a comparison between the nonlinear Kolmogorov widths and the Lipschitz widths. Our main results are presented in §4, where we give a direct comparison between the entropy numbers of and its linear and nonlinear Kolmogorov widths. Finally, in §5, we derive what these estimates mean for the -term approximation in Hilbert spaces.

2. Preliminaries

We start this section with the definition of Kolmogorov widths. If we fix the value of , the Kolmogorov -width of is defined as

where the infimum is taken over all linear spaces of dimension . These are the classical Kolmogorov widths introduced in [6], or consult [7] for their modern exposition. To distinguish them from the introduced later nonlinear Kolmogorov widths, we call them linear Kolmogorov -widths. They describe the optimal performance possible for the approximation of the model class using linear spaces of dimension . However, they do not tell us how to select a (near) optimal space of dimension for this purpose. Let us also note that in the definition of Kolmogorov width, we are not requiring that the mapping which sends into an approximation to is a linear map.

A generalization of this concept was introduced in [10], where the so called nonlinear Kolmogorov -width was defined for as

where the last infimum is over the sets of at most linear spaces of dimension . Note that here the choice of the linear subspace from which we choose the best approximation to depends on . Clearly, , and the bigger the is, the more flexibility we have to approximate . These nonlinear Kolmogorov widths are used in estimating from below the best -term approximation, see e.g. [3, 10]. The cases considered in [10] are the cases when , and , where and are fixed constants, respectively. A useful observation that we are going to utilize is that both Kolmogorov widths are homogenous. Namely, if and , we have


where .

In going further, we introduce first the minimal -covering number of a compact set . A collection of elements of is called an -covering of if

An -covering of whose cardinality is minimal is called minimal -covering of . We denote by the cardinality of the minimal -covering of . Minimal inner -covering number of a compact set is defined exactly as but we additionally require that the centers of the covering are elements from .

Entropy numbers , , of the compact set are defined as the infimum of all for which balls with centers from and radius cover . If we put the additional restriction that the centers of these balls are from , then we define the so called inner entropy numbers . Formally, we write

A collection of elements from is called an -packing of if

An -packing of whose size is maximal is called maximal -packing of . We denote by the cardinality of the maximal -packing of . We have the following inequalities for every and every compact set




Finally, we introduce the Lipschitz widths , , , of the compact set , see [8]. We denote by , , the -dimensional Banach space with a fixed norm . For , we first define the fixed Lipschitz width ,

where the infimum is taken over all Lipschitz mappings

that satisfy the Lipschitz condition

with constant . We then define the Lipschitz width

where the infimum is taken over all norms in and all . We observe the following analog to (2.1)


3. Comparison between nonlinear Kolmogorov widths and Lipschitz widths

In this section, we derive direct inequalities between the nonlinear Kolmogorov widths and the Lipschitz widths. We then use known relations between entropy numbers and Lipschitz widths to derive improvements of the (generalized) Carl’s inequalities.

We first note the following comparison between the linear Kolmogorov widths and the Lipschitz widths, proven in [8], see Corollary 5.2.

Theorem 3.1.

For every and every compact set we have

We next proceed with estimates between the nonlinear Kolmogorov width and the Lipschitz widths. Clearly, it follows from the definition that

where we have used in the last inequality the above theorem. Better estimates in the case of being a subset of a Hilbert space or a general Banach space are described in the following lemmas.

Lemma 3.2.

For every , , and every compact , subset of a Hilbert space such that , we have


Proof: Let us fix , and consider the -dimensional linear spaces , , . We define a norm on ,

whose unit ball is


We want to construct a Lipschitz mapping from to whose image approximates well . We divide the interval into subintervals , ,

with centers and consider the univariate continuous piecewise linear functions , , , whose break points are , and

Let be the unit ball of the space . We fix an orthonormal basis in and consider the isometry map from onto ,

defined as


We use these mappings to construct as

Let us fix and denote by

We want to derive an upper bound for . Note that if and only if .We consider the following two cases:

  • if for some , then , , for all , and therefore

  • if for some , , we obtain that

    We can assume without loss of generality that

    Since , we have

    where we have used arguments similar to the first case.

In both cases we have that

and therefore is an -Lipschitz mapping.

Since , the approximant to from will belong to since is the orthogonal projection of onto . Thus, it follows from the definition of that there is , such that , and therefore

which gives

To show the second part of (3.1), we determine such that

and define a norm on by


The unit ball with respect to this norm is

Like before, we have . Next, we consider the disjoint cubes , , of side length such that

We denote by the center of , , and define the functions as

and as

where are the mappings defined in (3.2).

Using the fact that for any two numbers , we have , we obtain that

Moreover, the supports of the ’s are disjoint, with being the support of , and for all . Now, following similar arguments as the ones for , and denoting

we derive that:

  • if for some ,

  • if and , , we consider the line segment

    and fix


    Clearly , ,

    and similarly to the estimate for , one obtains

Therefore, is a -Lipschitz mapping. As before, since , we obtain

where we have used the fact that and , . The proof is completed.

The case of arbitrary Banach space is based on the following lemma.

Lemma 3.3.

Let be an -dimensional subspace of a Banach space and be its unit ball. Let be the unit ball in an -dimensional subspace of a Hilbert space . Then, there exists a linear map

with Lipschitz constant (i.e. norm ) at most such that . In addition, if , then the Lipschitz constant of is at most .

Proof: It follows from the Fritz John theorem, see Chapter 3 in [9] or [1], that there exists an invertible linear operator onto such that


Let us fix an orthonormal basis for and consider the coordinate mapping defined as

This mapping is isometry when is equipped with the norm

We now define the linear mapping

and notice that

The first inclusion gives that has a norm (Lipschitz constant) , and thus has a Lipschitz constant . The second inclusion shows that , and therefore is the desired mapping. It follows from [5, Cor. 5] that in the case of , we can replace in (3.3) by .

Remark 3.4.

Note that since is linear, we have that , and for every ,


where we can replace by in the case when .

Lemma 3.5.

For every , , and every compact set subset of a Banach space with , we have


When , we have

Proof: We fix , , and consider the dimensional linear spaces , , , with being the unit ball of . For a fixed , we apply Lemma 3.3 with and to find an -Lipschitz mapping , where or , depending on whether is a general Banach space or , such that


We show (3.5) by proceeding as in the proof of Lemma 3.2 and defining a mapping as

where and are as in Lemma 3.2. We fix , , denote by

and show in a similar way that

  • if for some ,