# Improved Concentration Bounds for Gaussian Quadratic Forms

For a wide class of monotonic functions f, we develop a Chernoff-style concentration inequality for quadratic forms Q_f ∼∑_i=1^n f(η_i) (Z_i + δ_i)^2, where Z_i ∼ N(0,1). The inequality is expressed in terms of traces that are rapid to compute, making it useful for bounding p-values in high-dimensional screening applications. The bounds we obtain are significantly tighter than those that have been previously developed, which we illustrate with numerical examples.

## Authors

• 1 publication
• 6 publications
• 6 publications
• 1 publication
• ### Concentration of quadratic forms under a Bernstein moment assumption

A concentration result for quadratic form of independent subgaussian ran...
01/25/2019 ∙ by Pierre C. Bellec, et al. ∙ 0

• ### Two theorems on distribution of Gaussian quadratic forms

New results on comparison of distributions of Gaussian quadratic forms a...
02/22/2018 ∙ by Marat V. Burnashev, et al. ∙ 0

• ### Hanson-Wright inequality in Banach spaces

We discuss two-sided bounds for moments and tails of quadratic forms in ...

• ### A Bernstein-type inequality for stochastic processes of quadratic forms of Gaussian variables

We introduce a Bernstein-type inequality which serves to uniformly contr...
09/19/2009 ∙ by Ikhlef Bechar, et al. ∙ 0

• ### Sharper convergence bounds of Monte Carlo Rademacher Averages through Self-Bounding functions

We derive sharper probabilistic concentration bounds for the Monte Carlo...
10/22/2020 ∙ by Leonardo Pellegrina, et al. ∙ 0

• ### Tail Bounds for Matrix Quadratic Forms and Bias Adjusted Spectral Clustering in Multi-layer Stochastic Block Models

We develop tail probability bounds for matrix linear combinations with m...
03/18/2020 ∙ by Jing Lei, et al. ∙ 0

• ### A Concentration Inequality for the Facility Location Problem

We give a concentration inequality for a stochastic version of the facil...
12/08/2020 ∙ by Sandeep Silwal, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction and Background

We consider the problem of finding an upper bound for the cumulative distribution function (cdf) of random variables of the form

, where , , and and are deterministic scalars. Many applications lead to this form with

being the eigenvalues of a symmetric matrix

; for example, a quadratic form where and represents applied to the eigenvalues of . As described in Christ (2017) and Christ et al. (2020), results of this kind can be generalized to cases where is asymmetric with careful treatment of .

arises as the limiting distribution of test statistics used in a wide range of applications. These statistics include the Hilbert-Schmidt Information Criterion used for high-dimensional independence testing

Gretton et al. (2005); Zhang et al. (2018)

, score statistics for linear and genearlized linear mixed models commonly used in genomics

Lin (1997); Wu et al. (2011), and the goodness-of-fit statistic proposed by Peña and Rodríguez (2002) for ARMA models in time series analysis. It is easy to see that has mean

 E(Qf)=n∑i=1f(ηi)δ2i+n∑i=1f(ηi)

and variance

 Var(Qf)=2(n∑i=1f(ηi)2δ2i+2n∑i=1f(ηi)2).

Work in Christ et al. (2020) established a concentration inequality to bound the tails of , which yield a set of bounds for different functions. The results of Christ et al. (2020) show that it is possible to find polynomial bounds, but these are not constructed explicitly. We provide here explicit optimal coefficients for bounds of this form in the single-spectrum case. This earlier work yielded the following bound on (by which we designate the base version of , where is the identity function):

###### Theorem 1 (see p.75 in Christ (2017)).

Let and be a real, symmetric matrix. Let . Let and let , where are the eigenvalues of . Then, for all ,

Similarly, for all ,

 P(Q

The proof of this result relies on a Chernoff-style bound involving the cumulant generating function (cgf) of , which has two main types of terms:

 L1(x) =−log(1−2x)/2 and L2(x) =x1−2x.

Each of these is bounded by a quadratic function, leading to an overall bound in terms of easily computable coefficients. We improve on this previous work by constructing a family of quadratics that yield pointwise tighter bounds on and . We then show how these can be incorporated into an optimisation step to yield tighter bounds on the tails of .

In Section 2 we present our main results. First we present Lemma 2, which tightens the quadratic bounds above from Christ (2017). From this lemma, we derive the corresponding improved bounds on the tails of in Theorem 3. Specialisation of these results to some particular functions then follow in corollaries. In Section 3, we empirically demonstrate the improvement provided by these bounds with an application to a simulatd matrix with a exponentially decaying spectrum. Section 4 concludes with discussion of potential future improvements. Proofs for the main results are presented in Section 5.

## 2 Main Results

Our results depend upon elementary upper bounds on and in the form of parabolas passing through the origin. We describe the coefficients of these parabolas in terms of the width of the (symmetric) interval on which the bounds are to be applied, and on the parameter that arises from the cgf. We exploit two openings for improvement: optimising the coefficients of the parabola and optimising the width of the scaled domain over which it bounds and .

###### Lemma 2.

Let be a monotonic increasing function such that . Let be a fixed positive real number, and , where

 t⋆=min{|1/2f(L)|,|1/2f(−L)|}.

Furthermore, suppose that over the region the following inequalities are satisfied for both and :

 x(∂xL(tf(x))/2+tf′(0)) ≥L(tf(x)), (1) L(tf(x))−L(tf(x))2x ≥tf′(0). (2)

For each define

 αf(L,t) =L1(tf(x))/L2−tf′(0)/L, βf(L,t) =L2(tf(x))/L2−tf′(0)/L, γf(t) =tf′(0).

Then for each , among all quadratic function that maintain over the whole region , where

 g1t(x):=L1(tf(x))−(ax2+bx),

the difference is minimised at every point by the choice and ; and among those that maintain over the whole region , where

 g2t(x):=L2(tf(x))−(ax2+bx),

the difference is minimised at every point by the choice and .

This lemma will allow us to build on the existing result from Christ (2017). In the original form of this theorem was restricted so that , avoiding the asymptote at . We remove this boundary at 1/4 and allow to get arbitrarily close to . We also reinterpret , so that it now defines the domain of rather than that of . It also means that for every endpoint along the interval we can obtain optimal coefficients on our quadratic bounds. This yields a new bound on the tails of as follows.

###### Theorem 3.

Let where , and let be set to . Suppose satisfies the conditions in Lemma 2. Then for all ,

 P(Qf>q)≤mint∈(0,1/2d)[exp(νf(t)/2−(q−ξ)t)], (3)

where , and

 νf(t)=2(βf(L,t)n∑i=1η2iδ2i+αf(L,t)n∑i=1η2i). (4)

Furthermore, for all ,

 P(Qf

In the central use case, where arises as , we can apply Theorem 3, where and

 νf(t)=2(βf(L,t)||Mμ||22+αf(L,t)||M||2HS).

This allows us to quickly compute tight tail bounds on . In the following corollaries we address special cases of .

###### Corollary 4.

Let . Then the cdf of is bounded as in equations (3) and (5) where in equation (4) we set and .

Proof: Since , the from Lemma 1 is equal to . The conditions (1) and (2) may be written in terms of the variable , and these inequalities then need to hold for . The two conditions for become

 z(1−2z)+2z ≥−log(1−2z)and −log(1−2z)+log(1+2z) ≥4z,

while the two conditions for become

 z(1−2z)2+2z ≥2z1−2zand z1−2z+z1+2z ≥2z.

All of these inequalities hold for , and so Lemma 1 holds where is the identity function. The result follows by application of Theorem 1. ∎

###### Corollary 5.

Let for some positive integer . Then the cdf of is bounded as in equations (3) and (5) where in equation (4),

 αf(L,t)=L1(tLp)/L2andβf(L,t)=tLp−2/(1−2tLp).

Proof: Since , the from Lemma 1 is equal to . We introduce the variable and note that our original region, and , corresponds to .
Substituting the definitions of into condition (1) yields

 pz1−2z ≥−log(1−2z), pz(1−2z)2 ≥2z1−2z.

The condition (2) is trivial for even

, while for odd

it becomes

 −log(1−2z)+log(1+2z) ≥0, z1−2z+z1+2z ≥2z.

All of these inequalities hold for and , so Lemma 2 holds for . The result follows by application of Theorem 3. ∎

With essentially the same proof used for Corollary 5, we can formulate the result of Theorem 3 for matrix powers. Note that in following case, .

###### Corollary 6.

For any positive integer , for each

 P(X⊤MpX>q)≤mint∈(0,1/2d)\e−qt+νf(t)/2, (6)

and for

 P(X⊤MpX

where is defined in (4) and , .

## 3 Examples

Here we compare the bounds in Corollary 4 and Corollary 6 to the bounds provided in Christ (2017) and Christ et al. (2020) for different matrix powers . For this comparison, we simluated a matrix with an exponentially decaying spectrum of eigenvalues, a case which is relatively common in applications. See Figure 3.1.

For this comparison, we simluated a matrix with an exponentially decaying spectrum of eigenvalues, a case which is relatively common in applications. See Figure 3.1. Note that we have plotted the logarithm (base 10) of the true probability on the axis, and the error in the bounds on the axis. Thus, using the solid red line in Figure 3.1, if the true tail probability of is (), then our new bound for would be approximately of the order .

Particularly of note is that while our bounds show an improvement for all functions satisfying the assumptions of Lemma 2, the improvement is much greater for even functions. This is because our bounds are quadratic, so they must yield the same error bound on both sides of the real line for even functions; however, when bounding an odd function, our bounds will be tight by construction for but may be much looser for . As expected, our bounds perform worse for higher powers , which is effectively a result of attempting to control the higher-order behavior of the matrix given traces that measure the empirical mean and variance of the matrix elements.

## 4 Conclusions

We have placed tighter bounds than were previously available on the tails of . Although our bounds are not available in an explicit form, since we optimise over two parameters that previous results set arbitrarily, our bounds are at least as good, which is seen in practice. We further observe that they tend to be significantly tighter and improve relative to the old bounds as we go further out into the tails.

Although our results do give a significantly tighter bound on the tails of , they only work for a specific class of satisfying the conditions of Lemma 1, which notably excludes functions such as . Future developments could improve on this; one possible way would be to introduce an intercept into our quadratic bounds for and , which would maintain the ease of computability while extending it to a wider range of . A further source of improvement may be achieved by modifying Lemma 2 to account for the asymmetry on vs. . Treating each side of the real line separately could enable one to use both the smallest and largest eigenvalue, rather than just .

Though outside the scope of this paper, it would be possible to achieve similar bounds for sub-Gaussian random variables. This would provide tighter results than currently exist in those cases if the Hanson–Wright inequality argument Rudelson et al. (2013) were reworked in terms of explicit constants.

## 5 Proofs of Main Results

Proof of Lemma 2

In the special case we simply have that , , , , and are all , so the Lemma clearly holds. We assume now .

Since , the choice of is fixed by the need to make 0 a critical point for both of these functions. It remains only to consider the choice of .

Consider being either or . Write , where . Since is fixed, the quadratic functions are strictly increasing in at every point. For define

 ax:=L(tf(x))x2−L(tf)′(0)x.

Then is the minimum such that , and the optimum that we are looking for is . We have

 \difax\difx =L′(h(x))x2−2L(tf(x))x3+2L(tf)′(0)x2 =2x−3(x(∂xL(tf(x))2+tf′(0))−L(tf(x))) ≥0

by assumption 1. Thus is non-decreasing in , and so has its maximum at . This shows that taking makes for any , and it is the smallest such . Note that when , and when .

Assumption 2 tells us that for we have . This implies that , so the same choice of provides a bound — that is, — over the whole interval .∎

Proof of Theorem 3

We credit Pollard (2015) for the proof technique used below.

Using Lemma 3.1.3 in (Christ, 2017, p.75), for

 E[etQf] =n∏i=1(1−2tf(ηi))−1/2exp(δ2itf(ηi)/(1−2tf(ηi))) =exp(n∑i=1δ2itf(ηi)/(1−2tf(ηi))−log(1−2tf(ηi))/2).

By Lemma 2 we know, setting , that for ,

 L1(tf(x)) ≤αf(L,t)x2+tf′(0)x, L2(tf(x)) ≤βf(L,t)x2+tf′(0)x.

We claim that this is the optimal choice of . Smaller will void the inequalities for some and so cannot be considered. On the other hand, we know that both and are increasing in so any larger would simultaneously weaken the quadratic bound and shrink the range of values to which it can be applied, since is decreasing in .

Therefore,

 E[etQf] ≤exp(n∑i=1δ2i(βf(L,t)η2i+cηit)+αf(L,t)η2i+cηit) ≤exp(βf(L,t)n∑i=1η2iδ2i+ctn∑i=1ηiδ2i+αf(L,t)n∑i=1η2i+ctn∑i=1ηi).

Applying the definitions of and we have

 E[\et(Qf−ξ)]≤\eνf(t)/2.

By Markov’s Inequality, for any ,

 P(Qf>q)=P(Qf−ξ>q−ξ)=P(\eQf−ξ>\eq−ξ)≤\e−(q−ξ)t+νf(t)/2  for all t∈(0,1/2d).

For , since is positive we have the trivial bound .

The bound for is derived identically.

## Acknowledgements

The first two authors would like to acknowledge the support of the Engineering and Physical Sciences Research Council [grant number EP/M507854/1]. The last author would like to acknowledge the support of the Summer Opportunities Abroad Program (SOAP) - WUSM Global Health & Medicine and the WUSM Dean’s Fellowship, both from the Washington University School of Medicine in St. Louis.

## References

• Christ (2017) R. Christ, Ancestral trees as weighted networks: scalable screening for genome wide association studies, Ph.D. thesis, University of Oxford, 2017.
• Christ et al. (2020) R. Christ, C. Holmes, D. Steinstaltz, Scalable Screening with Quadratic Statistics, arXiv (forthcoming) .
• Gretton et al. (2005) A. Gretton, O. Bousquet, A. Smola, B. Schölkopf, Measuring statistical dependence with Hilbert-Schmidt norms, in: International conference on algorithmic learning theory, Springer, 63–77, 2005.
• Zhang et al. (2018) Q. Zhang, S. Filippi, A. Gretton, D. Sejdinovic, Large-scale kernel methods for independence testing, Statistics and Computing 28 (1) (2018) 113–130, ISSN 1573-1375.
• Lin (1997) X. Lin, Variance component testing in generalised linear models with random effects, Biometrika 84 (2) (1997) 309–326.
• Wu et al. (2011) M. C. Wu, S. Lee, T. Cai, Y. Li, M. Boehnke, X. Lin, Rare-variant association testing for sequencing data with the sequence kernel association test, The American Journal of Human Genetics 89 (1) (2011) 82–93.
• Peña and Rodríguez (2002) D. Peña, J. Rodríguez, A powerful portmanteau test of lack of fit for time series, Journal of the American Statistical Association 97 (458).
• Rudelson et al. (2013) M. Rudelson, R. Vershynin, et al., Hanson-Wright inequality and sub-gaussian concentration, Electronic Communications in Probability 18.
• Pollard (2015) D. Pollard, A few good inequalities, http://www.stat.yale.edu/~pollard/Books/Mini/Basic.pdf, 2015.