DeepAI

# On the entropy numbers and the Kolmogorov widths

Direct estimates between linear or nonlinear Kolmogorov widths and entropy numbers are presented. These estimates are derived using the recently introduced Lipschitz widths. Applications for m-term approximation are obtained.

• 7 publications
• 3 publications
08/29/2020

### A remark on entropy numbers

Talagrand's fundamental result on the entropy numbers is slightly improv...
11/28/2021

### On Thermodynamic Interpretation of Copula Entropy

Copula Entropy (CE) is a recently introduced concept for measuring corre...
11/02/2021

### Lipschitz widths

This paper introduces a measure, called Lipschitz widths, of the optimal...
05/05/2020

### An improved estimate of the inverse binary entropy function

Two estimates for the inverse binary entropy function are derived using ...
02/28/2016

### On the entropy numbers of the mixed smoothness function classes

Behavior of the entropy numbers of classes of multivariate functions wit...
08/19/2020

### On the condition number of the total least squares problem with linear equality constraint

This paper is devoted to the condition number of the total least squares...
08/20/2012

### Learning sparse messages in networks of neural cliques

An extension to a recently introduced binary neural network is proposed ...

## 1. Introduction

We consider a Banach space (or a Hilbert space ) equipped with a norm and a compact subset of . Typically, is a finite ball in smoothness spaces like the Lipschitz, Sobolev, or Besov spaces.

A well known classical result, called the Carl’s inequality, see [2] or [7], compares a certain characteristic of the set , called entropy numbers , with its approximability by linear spaces, measured by its Kolmogorov width . The Carl’s inequality states that for each , there is a constant such that for all ,

 (1.1) max1≤k≤nkrek(K)X≤C(r)max1≤m≤nmrdm−1(K)X.

Inequality (1.1) has been generalized in [10], where the nonlinear Kolmogorov widths have been used instead of the linear Kolmogorov widths . More precisely, it has been shown there that for each , there is a constant such that for all ,

 (1.2) max1≤k≤nkrek(K)X≤C(r,λ)max1≤m≤nmrdm−1(K,λm)X,

with a fixed constant. In addition, it was also proven that for each , there is a constant such that for all ,

 (1.3) max1≤k≤nkre(a+r)klogk(K)X≤C(r,a)max1≤m≤nmrdm−1(K,mam)X,

where is a fixed constant and cannot be replaced by a slower growing function of .

All these inequalities are primarily useful when the linear or nonlinear Kolmogorov widths decay as a power of . In this paper, we give finer extensions of the (generalized) Carl’s inequalities (1.1), (1.2) and (1.3), using the recently introduced in [8] Lipschitz widths. We start with some definitions, presented in §2, and continue, see §3, with a comparison between the nonlinear Kolmogorov widths and the Lipschitz widths. Our main results are presented in §4, where we give a direct comparison between the entropy numbers of and its linear and nonlinear Kolmogorov widths. Finally, in §5, we derive what these estimates mean for the -term approximation in Hilbert spaces.

## 2. Preliminaries

We start this section with the definition of Kolmogorov widths. If we fix the value of , the Kolmogorov -width of is defined as

 d0(K)X:=supf∈K∥f∥X,dn(K)X:=infdim(Xn)=nsupf∈Kdist(f,Xn)X,n≥1,

where the infimum is taken over all linear spaces of dimension . These are the classical Kolmogorov widths introduced in [6], or consult [7] for their modern exposition. To distinguish them from the introduced later nonlinear Kolmogorov widths, we call them linear Kolmogorov -widths. They describe the optimal performance possible for the approximation of the model class using linear spaces of dimension . However, they do not tell us how to select a (near) optimal space of dimension for this purpose. Let us also note that in the definition of Kolmogorov width, we are not requiring that the mapping which sends into an approximation to is a linear map.

A generalization of this concept was introduced in [10], where the so called nonlinear Kolmogorov -width was defined for as

 d0(K,N)X:=supf∈K∥f∥X,
 dn(K,N)X:=infLNsupf∈KinfXn∈LNdist(f,Xn)X,n≥1,

where the last infimum is over the sets of at most linear spaces of dimension . Note that here the choice of the linear subspace from which we choose the best approximation to depends on . Clearly, , and the bigger the is, the more flexibility we have to approximate . These nonlinear Kolmogorov widths are used in estimating from below the best -term approximation, see e.g. [3, 10]. The cases considered in [10] are the cases when , and , where and are fixed constants, respectively. A useful observation that we are going to utilize is that both Kolmogorov widths are homogenous. Namely, if and , we have

 (2.1) dn(tK,N)X=|t|dn(K,N)X  and  dn(tK)X=|t|dn(K)X,

where .

In going further, we introduce first the minimal -covering number of a compact set . A collection of elements of is called an -covering of if

 K⊂m⋃j=1B(gj,ϵ),whereB(gj,ϵ):={f∈X:∥f−gj∥X≤ϵ}.

An -covering of whose cardinality is minimal is called minimal -covering of . We denote by the cardinality of the minimal -covering of . Minimal inner -covering number of a compact set is defined exactly as but we additionally require that the centers of the covering are elements from .

Entropy numbers , , of the compact set are defined as the infimum of all for which balls with centers from and radius cover . If we put the additional restriction that the centers of these balls are from , then we define the so called inner entropy numbers . Formally, we write

 en(K)X=inf{ϵ>0 : K⊂2n⋃j=1B(gj,ϵ), gj∈X, j=1,…,2n},
 ~en(K)X=inf{ϵ>0 : K⊂2n⋃j=1B(hj,ϵ), hj∈K, j=1,…,2n}.

A collection of elements from is called an -packing of if

 mini≠j∥fi−fj∥X>ϵ.

An -packing of whose size is maximal is called maximal -packing of . We denote by the cardinality of the maximal -packing of . We have the following inequalities for every and every compact set

 (2.2) ~Pϵ(K)≥~Nϵ(K)≥~P2ϵ(K),

and

 (2.3) en(K)X≤~en(K)X≤2en(K)X.

Finally, we introduce the Lipschitz widths , , , of the compact set , see [8]. We denote by , , the -dimensional Banach space with a fixed norm . For , we first define the fixed Lipschitz width ,

 dγ(K,Yn)X:=infΦnsupf∈Kinfy∈BYn∥f−Φn(y)∥X,

where the infimum is taken over all Lipschitz mappings

 Φn:(BYn,∥⋅∥Yn)→X,BYn:={y∈Rn:∥y∥Yn≤1},

that satisfy the Lipschitz condition

 supy,y′∈BYn∥Φn(y)−Φn(y′)∥X∥y−y′∥Yn≤γ,

with constant . We then define the Lipschitz width

 dγn(K)X:=infk≤ninf∥⋅∥Ykdγ(K,Yk)X,

where the infimum is taken over all norms in and all . We observe the following analog to (2.1)

 (2.4) |t|dγ|t|n(tK)X=dγn(K)X, % where tK:={tf:f∈K}.

## 3. Comparison between nonlinear Kolmogorov widths and Lipschitz widths

In this section, we derive direct inequalities between the nonlinear Kolmogorov widths and the Lipschitz widths. We then use known relations between entropy numbers and Lipschitz widths to derive improvements of the (generalized) Carl’s inequalities.

We first note the following comparison between the linear Kolmogorov widths and the Lipschitz widths, proven in [8], see Corollary 5.2.

###### Theorem 3.1.

For every and every compact set we have

 dγn(K)X≤dn(K)X,for % every γ≥2supf∈K∥f∥X.

We next proceed with estimates between the nonlinear Kolmogorov width and the Lipschitz widths. Clearly, it follows from the definition that

 dn(K,N)X≥dnN(K)X≥dγnN(K)X,γ=2supf∈K∥f∥,

where we have used in the last inequality the above theorem. Better estimates in the case of being a subset of a Hilbert space or a general Banach space are described in the following lemmas.

###### Lemma 3.2.

For every , , and every compact , subset of a Hilbert space such that , we have

 (3.1) d(N+1)n+1(K)H≤dn(K,N)H,%andd3n+⌈log2N⌉(K)H≤dn(K,N)H.

Proof: Let us fix , and consider the -dimensional linear spaces , , . We define a norm on ,

 ∥(x,xn+1)∥Yn+1:=max{∥x∥ℓ2(Rn),|xn+1|},x:=(x1,…,xn),

whose unit ball is

 BYn+1:={(x,xn+1) : ∥x∥ℓ2(Rn)≤1 % and |xn+1|≤1}.

Clearly

 BYn+1=Bℓ2(Rn)×[−1,1],whereBℓ2(Rn):={x∈Rn : ∥x∥ℓ2(Rn)≤1}.

We want to construct a Lipschitz mapping from to whose image approximates well . We divide the interval into subintervals , ,

 Ij:=[aj,aj+1],aj:=2j/N−1,

with centers and consider the univariate continuous piecewise linear functions , , , whose break points are , and

 ψj(cj)=1,ψj(ak)=0,k=0,…,N−1.

Let be the unit ball of the space . We fix an orthonormal basis in and consider the isometry map from onto ,

 ¯ψj:(Bℓ2(Rn),∥⋅∥ℓ2(Rn))→(BXj,∥⋅∥H),

defined as

 (3.2) ¯ψj(x)=¯ψj(x1,…,xn):=n∑i=1xiφji.

We use these mappings to construct as

 Φn+1(x,xn+1):=N−1∑j=0ψj(xn+1)⋅¯ψj(x).

Let us fix and denote by

 A:=∥Φn+1(x,xn+1)−Φn+1(x′,x′n+1)∥H.

We want to derive an upper bound for . Note that if and only if .We consider the following two cases:

• if for some , then , , for all , and therefore

 A = ∥ψj(xn+1)¯ψj(x)−ψj(x′n+1)¯ψj(x′)∥H ≤ |ψj(xn+1)|∥¯ψj(x)−¯ψj(x′)∥H + |ψj(xn+1)−ψj(x′n+1)|∥¯ψj(x′)∥H ≤ ∥x−x′∥ℓ2(Rn)+N|xn+1−x′n+1| ≤ (N+1)∥(x,xn+1)−(x′,x′n+1)∥Yn+1.
• if for some , , we obtain that

 A=∥ψj(xn+1)¯ψj(x)−ψk(x′n+1)¯ψk(x′)∥H.

We can assume without loss of generality that

 xn+1≤aj+1≤ak≤x′n+1.

Since , we have

 A ≤ ∥ψj(xn+1)¯ψj(x)−ψj(aj+1)¯ψj(x)∥H + ∥ψk(ak)¯ψk(x)−ψk(x′n+1)¯ψk(x′)∥H ≤ |ψj(xn+1)−ψj(aj+1)|∥¯ψj(x)∥H + ∥ψk(ak)¯ψk(x)−ψk(x′n+1)¯ψk(x′)∥H ≤ N|aj+1−xn+1|+∥x−x′∥ℓ2(Rn)+N|x′n+1−ak| ≤ N|x′n+1−xn+1|+∥x−x′∥ℓ2(Rn) ≤ (N+1)∥(x,xn+1)−(x′,x′n+1)∥Yn+1,

where we have used arguments similar to the first case.

In both cases we have that

 ∥Φn+1(x,xn+1)−Φn+1(x′,x′n+1)∥H≤(N+1)∥(x,xn+1)−(x′,x′n+1)∥Yn+1,

and therefore is an -Lipschitz mapping.

Since , the approximant to from will belong to since is the orthogonal projection of onto . Thus, it follows from the definition of that there is , such that , and therefore

 Φn+1(xj,cj)=fj,and∥f−fj∥H=dist(f,Xj)H,

which gives

 d(N+1)n+1(K)H≤dn(K,N)H.

To show the second part of (3.1), we determine such that

 2ℓ−1

and define a norm on by

 ∥(x,y)∥Yn+ℓ:=max{∥x∥ℓ2(Rn),∥y∥ℓ∞(Rℓ)},

where

 x:=(x1,…,xn),y:=(y1,…,yℓ).

The unit ball with respect to this norm is

 BYn+ℓ:={(x,y)∈Rn+ℓ : ∥x∥ℓ2(Rn)≤1 and ∥y∥ℓ∞(Rℓ)≤1}.

Like before, we have . Next, we consider the disjoint cubes , , of side length such that

 [−1,1]ℓ=∪2ℓj=1Qj.

We denote by the center of , , and define the functions as

 ϕj(y):=2(12−∥cj−y∥ℓ∞(Rℓ))+,j=1,…,2ℓ,

and as

 Ψn+ℓ(x,y):=2ℓ∑j=1ϕj(y)⋅¯ψj(x),

where are the mappings defined in (3.2).

Using the fact that for any two numbers , we have , we obtain that

 |ϕj(y)−ϕj(y′)|≤2|∥cj−y∥ℓ∞(Rℓ)−∥cj−y′∥ℓ∞(Rℓ)|≤2∥y−y′∥ℓ∞(Rℓ).

Moreover, the supports of the ’s are disjoint, with being the support of , and for all . Now, following similar arguments as the ones for , and denoting

 B:=∥Ψn+ℓ(x,y)−Ψn+ℓ(x′,y′)∥H,

we derive that:

• if for some ,

 B=∥ϕj(y)¯ψj(x)−ϕj(y′)¯ψj(x′)∥H≤3∥(x,y)−(x′,y′)∥Yn+ℓ.
• if and , , we consider the line segment

 y+t(y′−y),0≤t≤1,

and fix

 dj:=y+t0(y′−y)∈∂Qj,

and

 bk:=y+t1(y′−y)∈∂Qk.

Clearly , ,

 ∥y−dj∥ℓ∞(Rℓ)+∥y′−bk∥ℓ∞(Rℓ)=(t0+1−t1)∥y−y′∥ℓ∞(Rℓ)≤∥y−y′∥ℓ∞(Rℓ),

and similarly to the estimate for , one obtains

 B = ∥ϕj(y)¯ψj(x)−ϕk(y′)¯ψk(x′)∥H ≤ |ϕj(y)−ϕj(dj)|∥¯ψj(x)∥H+∥ϕk(bk)¯ψk(x)−ϕk(y′)¯ψk(x′)∥H ≤ 2∥dj−y∥ℓ∞(Rℓ)+∥x−x′∥ℓ2(Rn)+2∥y′−bk∥ℓ∞(Rℓ) ≤ 2∥y−y′∥ℓ∞(Rℓ)+∥x−x′∥ℓ2(Rn) ≤ 3∥(x,y)−(x′,y′)∥Yn+ℓ.

Therefore, is a -Lipschitz mapping. As before, since , we obtain

 d3n+⌈log2N⌉(K)H≤dn(K,N)H,

where we have used the fact that and , . The proof is completed.

The case of arbitrary Banach space is based on the following lemma.

###### Lemma 3.3.

Let be an -dimensional subspace of a Banach space and be its unit ball. Let be the unit ball in an -dimensional subspace of a Hilbert space . Then, there exists a linear map

 ¯ψ:(BZ,∥⋅∥H)→Y,

with Lipschitz constant (i.e. norm ) at most such that . In addition, if , then the Lipschitz constant of is at most .

Proof: It follows from the Fritz John theorem, see Chapter 3 in [9] or [1], that there exists an invertible linear operator onto such that

 (3.3) ϕ(Bℓ2(Rn))⊂BY⊂√nϕ(Bℓ2(Rn)).

Let us fix an orthonormal basis for and consider the coordinate mapping defined as

 κZ(g)=(x1,…,xn)=x,whereg=n∑j=1xjφj.

This mapping is isometry when is equipped with the norm

 ∥x∥ℓ2(Rn)= ⎷n∑j=1x2j=∥g∥Z.

We now define the linear mapping

 ~ψ:=ϕ∘κZ:(Z,∥⋅∥H)→Y,

and notice that

 ~ψ(BZ)⊂BY⊂√n~ψ(BZ).

The first inclusion gives that has a norm (Lipschitz constant) , and thus has a Lipschitz constant . The second inclusion shows that , and therefore is the desired mapping. It follows from [5, Cor. 5] that in the case of , we can replace in (3.3) by .

###### Remark 3.4.

Note that since is linear, we have that , and for every ,

 (3.4) ∥¯ψ(z)∥Y=∥¯ψ(z)−¯ψ(0)∥Y≤√n∥z∥H≤√n,

where we can replace by in the case when .

###### Lemma 3.5.

For every , , and every compact set subset of a Banach space with , we have

 (3.5) d2(N+1)√nn+1(K)X≤dn(K,N)X,andd6√nn+⌈log2N⌉(K)X≤dn(K,N)X.

When , we have

 d2(N+1)n|1/2−1/p|n+1(K)Lp≤dn(K,N)Lp,andd6n|1/2−1/p|n+⌈log2N⌉(K)Lp≤dn(K,N)Lp.

Proof: We fix , , and consider the dimensional linear spaces , , , with being the unit ball of . For a fixed , we apply Lemma 3.3 with and to find an -Lipschitz mapping , where or , depending on whether is a general Banach space or , such that

 (3.6) ¯Ψj:(Bℓ2(Rn),∥⋅∥ℓ2(Rn))→Xj,andBXj⊂¯Ψj(Bℓ2(Rn)).

We show (3.5) by proceeding as in the proof of Lemma 3.2 and defining a mapping as

 Θn+1(x,xn+1):=2N−1∑j=0ψj(xn+1)⋅¯Ψj(x),

where and are as in Lemma 3.2. We fix , , denote by

 C:=∥Θn+1(x,xn+1)−Θn+1(x′,x′n+1)∥X,

and show in a similar way that

• if for some ,

 C2 = ∥ψj(xn+1)¯Ψ