    # Inapproximability of Matrix p→ q Norms

We study the problem of computing the p→ q norm of a matrix A ∈ R^m × n, defined as A_p→ q := _x ∈ R^n ∖{0}Ax_q/x_p This problem generalizes the spectral norm of a matrix (p=q=2) and the Grothendieck problem (p=∞, q=1), and has been widely studied in various regimes. When p ≥ q, the problem exhibits a dichotomy: constant factor approximation algorithms are known if 2 ∈ [q,p], and the problem is hard to approximate within almost polynomial factors when 2 ∉ [q,p]. The regime when p < q, known as hypercontractive norms, is particularly significant for various applications but much less well understood. The case with p = 2 and q > 2 was studied by [Barak et al, STOC'12] who gave sub-exponential algorithms for a promise version of the problem (which captures small-set expansion) and also proved hardness of approximation results based on the Exponential Time Hypothesis. However, no NP-hardness of approximation is known for these problems for any p < q. We study the hardness of approximating matrix norms in both the above cases and prove the following results: - We show that for any 1< p < q < ∞ with 2 ∉ [p,q], A_p→ q is hard to approximate within 2^O(^1-ϵn) assuming NP ⊆ BPTIME(2^^O(1)n). This suggests that, similar to the case of p ≥ q, the hypercontractive setting may be qualitatively different when 2 does not lie between p and q. - For all p ≥ q with 2 ∈ [q,p], we show A_p→ q is hard to approximate within any factor than 1/(γ_p^*·γ_q), where for any r, γ_r denotes the r^th norm of a gaussian, and p^* is the dual norm of p.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

We consider the problem of finding the norm of a given matrix , which is defined as

 ∥A∥p→q := maxx∈Rn∖{0}∥Ax∥q∥x∥p.

The quantity is a natural generalization of the well-studied spectral norm, which corresponds to the case . For general and , this quantity computes the maximum distortion (stretch) of the operator from the normed space to .

The case when and is the well known Grothendieck problem [KN12, Pis12], where the goal is to maximize subject to . In fact, via simple duality arguments (see Section 2), the general problem computing can be seen to be equivalent to the following variant of the Grothendieck problem (and to )

where denote the dual norms of and , satisfying .

#### Hypercontractive norms.

The case when , known as the case of hypercontractive norms, also has a special significance to the analysis of random walks, expansion and related problems in hardness of approximation [Bis11, BBH12]. The problem of computing

is also known to be equivalent to determining the maximum acceptance probability of a quantum protocol with multiple unentangled provers, and is related to several problems in quantum information theory

[HM13, BH15].

Bounds on hypercontractive norms of operators are also used to prove expansion of small sets in graphs. Indeed, if is the indicator function of set of measure in a graph with adjacency matrix , then we have that for any ,

 Φ(S) = 1−⟨f,Af⟩∥f∥22 ≥ 1−∥f∥q∗∥f∥q∗⋅∥Af∥qδ ≥ 1−∥A∥p→q⋅δ1/p−1/q.

It was proved by Barak et al. [BBH12] that the above connection to small-set expansion can in fact be made two-sided for a special case of the norm. They proved by that to resolve the promise version of the small-set expansion (SSE) problem, it suffices to distinguish the cases and , where

is the least non-zero singular value of

and are appropriately chosen constants based on the parameters of the SSE problem. Thus, the approximability of norm is closely related to the small-set expansion problem. In particular, proving the NP-hardness of approximating the norm is (necessarily) an intermediate goal towards proving the Small-Set Expansion Hypothesis of Raghavendra and Steurer [RS10].

However, relatively few results algorithmic and hardness results are known for approximating hypercontractive norms. A result by Steinberg’s [Ste05] gives an upper bound of on the approximation factor, for all . For the case of norm (for any ), Barak et al. [BBH12] give an approximation algorithm for the promise version of the problem described above, running in time . They also provide an additive approximation algorithm for the norm (where the error depends on norm and norm of ), which was extended to the norm by Harrow and Montanaro [HM13]. Barak et al. also prove NP-hardness of approximating within a factor of , and hardness of approximating better than in polynomial time, assuming the Exponential Time Hypothesis (ETH). This reduction was also used by Harrow, Natarajan and Wu [HNW16] to prove that levels of the Sum-of-Squares SDP hierarchy cannot approximate within any constant factor.

It is natural to ask if the bottleneck in proving (constant factor) hardness of approximation for norm arises from the fact from the nature of the domain (the ball) or from hypercontractive nature of the objective. As discussed in Section 1.1, all hypercontractive norms present a barrier for gadget reductions, since if a “true” solution is meant to encode the assignment to a (say) label cover problem with consistency checked via local gadgets, then (for ), a “cheating solution” may make the value of very large by using a sparse which does not carry any meaningful information about the underlying label cover problem.

We show that (somewhat surprisingly, at least for the authors) it is indeed possible to overcome the barrier for gadget reductions for hypercontractive norms, for any (and by duality, for any ). This gives the first NP-hardness result for hypercontractive norms (under randomized reductions). Assuming ETH, this also rules out a constant factor approximation algorithm that runs in for some .

###### Theorem 1.1.

For any such that or and a constant , it is NP-hard to approximate norm within a factor of . The reduction runs in time for , where .

We show that the above hardness can be strengthened to any constant factor via a simple tensoring argument. In fact, this also shows that it is hard to approximate

within almost polynomial factors unless NP is in randomized quasi-polynomial time. This is the content of the following theorem.

###### Theorem 1.2.

For any such that or and , there is no polynomial time algorithm that approximates the norm of an matrix within a factor unless . When is an even integer, the same inapproximability result holds unless

We also note that the operator arising in our reduction in Theorem 1.1 satisfies (and is in fact a product of a carefully chosen projection and a scaled random Gaussian matrix). For such an , we prove the hardness of distinguishing and , for constants . For the corresponding problem in the case of norm, Barak et al. [BBH12] gave a subexponential algorithm running in time (which works for every ). On the other hand, since the running time of our reduction is , we get that assuming ETH, we show that no algorithm can distinguish the above cases for norm in time , for any when .

While the above results give some possible reductions for working with hypercontractive norms, it remains an interesting problem to understand the role of the domain as a barrier to proving hardness results for the norm problems. In fact, no hardness results are available even for the more general problem of polynomial optimization over the ball. We view the above theorem as providing some evidence that while hypercontractive norms have been studied as a single class so far, the case when may be qualitatively different (with respect to techniques) from the case when . This is indeed known to be true in the non-hypercontractive case with . In fact, our results are obtained via new hardness results for the case , as described below.

#### The non-hypercontractive case.

Several results are known in the case when , and we summarize known results for matrix norms in Fig. 1, for the both the hypercontractive and non-hypercontractive cases. While the case of corresponds to the spectral norm, the problem is also easy when (or equivalently ) since this corresponds to selecting the row of with the maximum norm. Note that in general, Fig. 1 is symmetric about the principal diagonal. Also note that if is a hypercontractive norm () then so is the equivalent (the hypercontractive and non-hypercontractive case are separated by the non-principal diagonal). Figure 1: Upper and lower bounds for approximating ∥A∥p→q. Arrows indicate the region to which a boundary belongs and thicker shaded regions represent exact algorithms. Our results are indicated by [∗]. We omit UGC-based hardness results in the figure.

As is apparent from the figure, the problem of approximating for admits good approximations when , and is hard otherwise. For the case when , an upper bound of on the approximation ratio was proved by Steinberg [Ste05]. Bhaskara and Vijayaraghavan [BV11] showed NP-hardness of approximation within any constant factor, and hardness of approximation within an factor for arbitrary assuming .

Determining the right constants in these approximations when has been of considerable interest in the analysis and optimization community. For the case of norm, Grothendieck’s theorem [Gro56] shows that the integrality gap of a semidefinite programming (SDP) relaxation is bounded by a constant, and the (unknown) optimal value is now called the Grothendieck constant . Krivine [Kri77] proved an upper bound of on , and it was later shown by Braverman et al. that is strictly smaller than this bound. The best known lower bound on is about , due to (an unpublished manuscript of) Reeds [Ree91] (see also [KO09] for a proof).

An upper bound of on the approximation factor also follows from the work of Nesterov [Nes98] for any . A later work of Steinberg [Ste05] also gave an upper bound of , where denotes norm of a standard normal random variable (i.e., the -th root of the

-th Gaussian moment). Note that Steinberg’s bound is less than

for some values of , in particular for all values of the form with (and equivalently for ), where it equals (and for ).

On the hardness side, Briët, Regev and Saket [BRS15] showed NP-hardness of for the norm, strengthening a hardness result of Khot and Naor based on the Unique Games Conjecture (UGC) [KN08] (for a special case of the Grothendieck problem when the matrix is positive semidefinite). Assuming UGC, a hardness result matching Reeds’ lower bound was proved by Khot and O’Donnell[KO09], and hardness of approximating within was proved by Raghavendra and Steurer [RS09].

For a related problem known as the -Grothendieck problem, where the goal is to maximize for , results by Steinberg [Ste05] and Kindler, Schechtman and Naor [KNS10] give an upper bound of , and a matching lower bound was proved assuming UGC by [KNS10], which was strengthened to NP-hardness by Guruswami et al. [GRSW16]. However, note that this problem is quadratic and not necessarily bilinear, and is in general much harder than the Grothendieck problems considered here. In particular, the case of only admits an approximation instead of for the bilinear version [AMMN06, ABH05].

We extend the hardness results of [BRS15] for the and norms of a matrix to any . The hardness factors obtained match the performance of known algorithms (due to Steinberg [Ste05]) for the cases of and .

###### Theorem 1.3.

For any such that and , it is NP-hard to approximate the norm within a factor .

In subsequent work [BGG18] motivated by the hardness results herein, we also give an improved approximation for norm when (inspired by the above hardness result) which achieves an approximation factor of , where is a constant comparable to that arising in Krivine’s upper bound on the Grothendieck constant [Kri77].

Both Theorem 1.1 and Theorem 1.3 are consequences of a more technical theorem, which proves hardness of approximating for (and hence for ) while providing additional structure in the matrix produced by the reduction. This is proved in Section 3. We also show our methods can be used to provide a simple proof (albeit via randomized reductions) of the hardness for the non-hypercontractive case when , which was proved by [BV11]. This is presented in Section 4.5.

### 1.1 Proof Overview

#### The hardness of proving hardness for hypercontractive norms.

Reductions for various geometric problems use a “smooth” version of the Label Cover problem, composed with long-code functions for the labels of the variables. In various reductions, including the ones by Guruswami et al. [GRSW16] and Briët et al. [BRS15]

(which we closely follow) the solution vector

to the geometric problem consists of the Fourier coefficients of the various long-code functions, with a “block” for each vertex of the label-cover instance. The relevant geometric operation (transformation by the matrix in our case) consists of projecting to a space which enforces the consistency constraints derived from the label-cover problem, on the Fourier coefficients of the encodings.

However, this strategy presents with two problems when designing reductions for hypercontractive norms. Firstly, while projections maintain the norm of encodings corresponding to consistent labelings and reduce that of inconsistent ones, their behaviour is harder to analyze for norms for . Secondly, the global objective of maximizing is required to enforce different behavior within the blocks , than in the full vector . The block vectors in the solution corresponding to a satisfying assignment of label cover are intended to be highly sparse, since they correspond to “dictator functions” which have only one non-zero Fourier coefficient. This can be enforced in a test using the fact that for a vector , is a convex function of when , and is maximized for vectors with all the mass concentrated in a single coordinate. However, a global objective function which tries to maximize , also achieves a high value from global vectors which concentrate all the mass on coordinates corresponding to few vertices of the label cover instance, and do not carry any meaningful information about assignments to the underlying label cover problem.

Since we can only check for a global objective which is the norm of some vector involving coordinates from blocks across the entire instance, it is not clear how to enforce local Fourier concentration (dictator functions for individual long codes) and global well-distribution (meaningful information regarding assignments of most vertices) using the same objective function. While the projector also enforces a linear relation between the block vectors and for all edges in the label cover instance, using this to ensure well-distribution across blocks seems to require a very high density of constraints in the label cover instance, and no hardness results are available in this regime.

#### Our reduction.

We show that when , it is possible to bypass the above issues using hardness of as an intermediate (for ). Note that since is a concave function of in this case, the test favors vectors in which the mass is well-distributed and thus solves the second issue. For this, we use local tests based on the Berry-Esséen theorem (as in [GRSW16] and [BRS15]). Also, since the starting point now is the norm, the effect of projections is easier to analyze. This reduction is discussed in Section 3.

By duality, we can interpret the above as a hardness result for when (using ). We then convert this to a hardness result for norm in the hypercontractive case by composing with an “approximate isometry” from (i.e., ) since we can replace with . Milman’s version of the Dvoretzky theorem [Ver17] implies random operators to a sufficiently high dimensional () space satisfy this property, which then yields constant factor hardness results for the norm. A similar application of Dvoretzky’s theorem also appears in an independent work of Krishnan et al. [KMW18] on sketching matrix norms.

We also show that the hardness for hypercontractive norms can be amplified via tensoring. This was known previously for the norm using an argument based on parallel repetition for QMA [HM13], and for the case of [BV11]. We give a simple argument based on convexity, which proves this for all , but appears to have gone unnoticed previously. The amplification is then used to prove hardness of approximation within almost polynomial factors.

#### Non-hypercontractive norms.

We also use the hardness of to obtain hardness for the non-hypercontractive case of with , by using an operator that “factorizes” through . In particular, we obtain hardness results for and (of factors and respectively) using the reduction in Section 3. We then combine these hardness results using additional properties of the operator obtained in the reduction, to obtain a hardness of factor for the norm for . The composition, as well as the hardness results for hypercontractive norms, are presented in Section 4.

We also obtain a simple proof of the hardness for the non-hypercontractive case when (already proved by Bhaskara and Vijayaraghavan [BV11]) via an approximate isometry argument as used in the hypercontractive case. In the hypercontractive case, we started from a constant factor hardness of the norm and the same factor for norm using the fact that for a random Gaussian matrix of appropriate dimensions, we have for all . We then amplify the hardness via tensoring. In the non-hypercontractive case, we start with a hardness for norm (obtained via the above isometry), which we first amplify via tensoring. We then apply another approximate isometry result due to Schechtman [Sch87], which gives a samplable distribution over random matrices such that with high probability over , we have for all .

We thus view the above results as showing that combined with a basic hardness for norm, the basic ideas of duality, tensoring, and embedding (which builds on powerful results from functional analysis) can be combined in powerful ways to prove strong results in both the hypercontractive and non-hypercontractive regimes.

## 2 Preliminaries and Notation

### 2.1 Matrix Norms

For a vector , throughout this paper we will use to denote its -th coordinate. For , we define to denote the counting -norm and to denote the expectation -norm; i.e., for a vector ,

 ∥x∥ℓp:=⎛⎝∑i∈[n]|x(i)|p⎞⎠1/p and ∥x∥Lp:=Ei∼[n][|x(i)|p]1/p=⎛⎝1n⋅∑i∈[n]|x(i)|p⎞⎠1/p.

Clearly . For , we define . We will use to denote the ‘dual’ of , i.e. . Unless stated otherwise, we usually work with . We also define inner product to denote the inner product under the counting measure unless stated otherwise; i.e., for two vectors , .

We next record a well-known fact about -norms that is used in establishing many duality statements.

###### Observation 2.1.

For any , .

We next define the primary problems of interest in this paper.

###### Definition 2.2.

For , the norm problem is to maximize

 ∥Ax∥ℓq∥x∥ℓp

given an matrix .

###### Definition 2.3.

For , we define a generalization of the Grothendieck problem, namely -Grothendieck, as the problem of computing

 sup∥y∥ℓp=1sup∥x∥ℓq=1⟨y,Ax⟩

given an matrix .

The original Grothendieck problem is precisely -Grothendieck. We next state the well known equivalence of norm, -Grothendieck, and norm.

###### Observation 2.4.

For any and any matrix ,

 ∥A∥ℓp→ℓq=sup∥y∥ℓq∗=1sup∥x∥ℓp=1⟨y,Ax⟩=∥AT∥ℓq∗→ℓp∗.
###### Proof.

Using ,

 ∥A∥ℓp→ℓq =sup∥x∥ℓp=1∥Ax∥ℓq=sup∥x∥ℓp=1sup∥y∥ℓq∗=1⟨y,Ax⟩=sup∥y∥ℓq∗=1sup∥x∥ℓp=1⟨y,Ax⟩ =sup∥x∥ℓp=1sup∥y∥ℓq∗=1⟨x,ATy⟩=sup∥y∥ℓq∗=1∥ATy∥ℓp∗=∥AT∥ℓq∗→ℓp∗.

The following observation will be useful for composing hardness maps for norm and norm to get norm hardness for when and .

###### Observation 2.5.

For any and any matrices ,

 ∥BC∥ℓp→ℓq=supx∥BCx∥ℓq∥x∥ℓp≤supx∥B∥ℓr→ℓq∥Cx∥ℓr∥Cx∥ℓr∥x∥ℓp≤∥B∥ℓr→ℓq∥C∥ℓp→ℓr.

### 2.2 Fourier Analysis

We introduce some basic facts about Fourier analysis of Boolean functions. Let be a positive integer, and consider a function . For any subset let . Then we can represent as

 f(x1,…,xR)=∑S⊆[R]ˆf(S)⋅χS(x1,…xR), (1)

where

 ˆf(S)=Ex∈{±1}R[f(x)⋅χS(x)] % for all S⊆[R]. (2)

The Fourier transform refers to a linear operator that maps to as defined as (2). We interpret as a -dimensional vector whose coordinates are indexed by . Endow the expectation norm and the expectation norm to and respectively; i.e.,

 ∥f∥Lp:=(Ex∈{±1}R[|f(x)|p])1/p and ∥ˆf∥ℓp:=⎛⎝∑S⊆[R]|ˆf(S)|p⎞⎠1/p.

as well as the corresponding inner products and consistent with their -norms. We also define the inverse Fourier transform to be a linear operator that maps a given to defined as in (1). We state the following well-known facts from Fourier analysis.

For any , .

###### Observation 2.7.

and form an adjoint pair; i.e., for any and ,

 ⟨ˆg,Ff⟩=⟨FTˆg,f⟩.
###### Observation 2.8.

is the identity operator.

In Section 3, we also consider a partial Fourier transform that maps a given function to a vector defined as for all . It is the original Fourier transform where is further projected to coordinates corresponding to linear coefficients. The partial inverse Fourier transform is a transformation that maps a vector to a function as in (1) restricted to for some . These partial transforms satisfy similar observations as above: (1) , (2) , (3) and form an adjoint pair, and (4) if and only if is a linear function.

### 2.3 Smooth Label Cover

An instance of Label Cover is given by a quadruple that consists of a regular connected graph , a label set for some positive integer , and a collection of pairs of maps both from to associated with the endpoints of the edges in . Given a labeling , we say that an edge is satisfied if . Let be the maximum fraction of satisfied edges by any labeling.

The following hardness result for Label Cover, given in [GRSW16], is a slight variant of the original construction due to [Kho02]. The theorem also describes the various structural properties, including smoothness, that are identified by the hard instances.

###### Theorem 2.9.

For any and , there exist positive integers and , and a Label Cover instance as above such that

• (Hardness): It is NP-hard to distinguish between the following two cases:

• (Completeness): .

• (Soundness): .

• (Structural Properties):

• (-Smoothness): For every vertex and distinct , we have

 Pe:v∈e[πe,v(i)=πe,v(j)]≤1/J.
• (-to-): For every vertex , edge incident on , and , we have ; that is at most elements in are mapped to the same element in .

• (Weak Expansion): For any and vertex set such that , the number of edges among the vertices in is at least .

## 3 Hardness of 2→r norm with r<2

This section proves the following theorem that serves as a starting point of our hardness results. The theorem is stated for the expectation norm for consistency with the current literature, but the same statement holds for the counting norm, since if is an matrix, . Note that the matrix used in the reduction below does not depend on .

###### Theorem 3.1.

For any , there is a polynomial time reduction that takes a 3-CNF formula and produces a symmetric matrix with such that

• (Completeness) If is satisfiable, there exists with for all and . In particular, for all .

• (Soundness) for all .

We adapt the proof by Briët, Regev and Saket for the hardness of and norms to prove the above theorem. A small difference is that, unlike their construction which starts with a Fourier encoding of the long-code functions, we start with an evaluation table (to ensure that the resulting matrices are symmetric). We also analyze their dictatorship tests for the case of fractional .

### 3.1 Reduction and Completeness

Let be an instance of Label Cover with . In the rest of this section, and our reduction will construct a self-adjoint linear operator with , which yields a symmetric matrix representing in the standard basis. This section concerns the following four Hilbert spaces based on the standard Fourier analysis composed with .

1. Evaluation space . Each function in this space is denoted by . The inner product is defined as , which induces . We also define in this space.

2. Fourier space . Each function in this space is denoted by . The inner product is defined as , which induces .

3. Combined evaluation space . Each function in this space is denoted by . The inner product is defined as , which induces . We also define in this space.

4. Combined Fourier space . Each function in this space is denoted by . The inner product is defined as , which induces , which is neither a counting nor an expectation norm.

Note that and a vertex induces defined by , and similarly and a vertex induces defined by . As defined in Section 2.2, we use the standard following (partial) Fourier transform that maps to as follows. 111We use only linear Fourier coefficients in this work. was defined as in Section 2.2.

 ˆf(i)=(Ff)(i):=Ex∈{±1}R[xif(x)]. (3)

The (partial) inverse Fourier transform that maps to is defined by

 f(x)=(FTˆf)(x):=∑i∈[R]xiˆf(i). (4)

This Fourier transform can be naturally extended to combined spaces by defining as for all . Then maps to as for all .

Finally, let be the orthogonal projector to the following subspace of the combined Fourier space:

 ˆL:=⎧⎪⎨⎪⎩ˆf∈RV×R:∑j∈π−1e,u(i)ˆfu(i)=∑j∈π−1e,v(i)ˆfv(j) for all (u,v)∈E and i∈[L]⎫⎪⎬⎪⎭. (5)

Our transformation is defined by

 A:=(FT)ˆPF. (6)

In other words, given , we apply the Fourier transform for each , project the combined Fourier coefficients to that checks the Label Cover consistency, and apply the inverse Fourier transform. Since is a projector, is self-adjoint by design.

We also note that a similar reduction that produces was used in Guruswami et al. [GRSW16] and Briët et al. [BRS15] for subspace approximation and Grothendieck-type problems, and indeed this reduction suffices for Theorem 3.1 except the self-adjointness and additional properties in the completeness case.

#### Completeness.

We prove the following lemma for the completeness case. A simple intuition is that if admits a good labeling, we can construct a such that each is a linear function and is already in the subspace . Therefore, each of Fourier transform, projection to , and inverse Fourier transform does not really change .

###### Lemma 3.2 (Completeness).

Let be a labeling that satisfies every edge of . There exists a function such that is either or for all and .

###### Proof.

Let for every . Consider . For each vertex , if and otherwise. Since satisfies every edge of , and . Finally, since each is a linear function, the partial inverse Fourier transform satisfies , which implies that . Therefore, .

### 3.2 Soundness

We prove the following soundness lemma. This finishes the proof of Theorem 3.1 since Theorem 2.9 guarantees NP-hardness of Label Cover for arbitrarily small and arbitrarily large .

###### Lemma 3.3 (Soundness).

For every , there exist (that determines as in Theorem 2.9) and such that if , is -to-, and is -smooth, for every .

###### Proof.

Let be an arbitrary vector such that . Let , , and so that . By Parseval’s theorem, for all and . Since is an orthogonal projection, . Fix and suppose

 ∥g∥rLr=Ev∈V[∥gv∥rLr]≥γrr+4ε2−r. (7)

Use Lemma A.2 to obtain such that implies for all (so that does not depend on ), and consider

 V0:={v∈V:∥ˆgv∥ℓ4>δε and ∥ˆgv∥ℓ2≤1/ε}. (8)

We prove the following lemma that lower bounds the size of .

###### Lemma 3.4.

For defined as in (8), we have .

###### Proof.

The proof closely follows the proof of Lemma 3.4 of [BRS15]. Define the sets

 V1 ={v∈V:∥ˆgv∥ℓ4≤δε and ∥ˆgv∥ℓ2<ε}, V2 ={v∈V:∥ˆgv∥ℓ4≤δε and ∥ˆgv∥ℓ2≥ε}, V3 ={v∈V:∥ˆgv∥ℓ2>1/ε}.

From (7), we have

 ∑v∈V0∥gv∥rLr+∑v∈V1∥gv∥rLr+∑v∈V2∥gv∥rLr+∑v∈V3∥gv∥rLr≥(γrr+4ε2−r)|V|. (9)

We bound the four sums on the left side of (9) individually. Parseval’s theorem and the fact that implies , and since for every , the first sum in (9) can be bounded by