Large deviations for the largest eigenvalues and eigenvectors of spiked random matrices

04/03/2019
by   Giulio Biroli, et al.
0

We consider matrices formed by a random N× N matrix drawn from the Gaussian Orthogonal Ensemble (or Gaussian Unitary Ensemble) plus a rank-one perturbation of strength θ, and focus on the largest eigenvalue, x, and the component, u, of the corresponding eigenvector in the direction associated to the rank-one perturbation. We obtain the large deviation principle governing the atypical joint fluctuations of x and u. Interestingly, for θ>1, in large deviations characterized by a small value of u, i.e. u<1-1/θ, the second-largest eigenvalue pops out from the Wigner semi-circle and the associated eigenvector orients in the direction corresponding to the rank-one perturbation. We generalize these results to the Wishart Ensemble, and we extend them to the first n eigenvalues and the associated eigenvectors.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/13/2022

Largest Eigenvalues of the Conjugate Kernel of Single-Layered Neural Networks

This paper is concerned with the asymptotic distribution of the largest ...
04/26/2021

Generalized heterogeneous hypergeometric functions and the distribution of the largest eigenvalue of an elliptical Wishart matrix

In this study, we derive the exact distributions of eigenvalues of a sin...
08/13/2018

Eigenvectors of Deformed Wigner Random Matrices

We investigate eigenvectors of rank-one deformations of random matrices ...
09/28/2020

Eigenvector distribution in the critical regime of BBP transition

In this paper, we study the random matrix model of Gaussian Unitary Ense...
04/05/2019

Eigenvalue distribution of nonlinear models of random matrices

This paper is concerned with the asymptotic empirical eigenvalue distrib...
05/29/2020

Statistical applications of Random matrix theory: comparison of two populations III

This paper investigates a statistical procedure for testing the equality...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The large deviations theory for the spectral properties of random matrix models is a very active domain of research in probability theory and theoretical physics.


A lot of works have been devoted to the statistics of the eigenvalues. Following Voiculescu pioneering work on non-commutative entropy [1], G. Ben Arous and one of the author derived a large deviation principle for the distribution of the empirical measure of the eigenvalues of Gaussian ensembles in the late nineties (in physics known as Coulomb gas method [2]). The proof is based on the explicit density of the joint law of the eigenvalues and the speed of the large deviation principle is the square of the linear dimension of the random matrix. More than ten years later, C. Bordenave and P. Caputo [3] obtained a large deviations principle for the same empirical measure but for a Wigner matrix with heavy tails entries, in the sense that their tail decays more slowly than a Gaussian variable at infinity. Their approach is totally different as it follows from the ideas that deviations are created by a few big entries: the rate then depends on the speed of decay of the tail. The large deviation principle for the spectral measure in the general sub-Gaussian case is still an open problem.
Instead of considering the deviations of the empirical measure, it is also natural to try to understand the probability of deviations of a single eigenvalue. The deviations of an eigenvalue inside the bulk is closely related to that of the empirical measure but one can seek for the probability of deviations of the extreme eigenvalues. This was achieved for Gaussian ensembles in the Appendix of [4], see also [5], where it was shown that the large deviations are on the scale of the dimension. Again, the proof was based on the explicit joint law of the eigenvalues. The large deviations principle for the largest eigenvalue was derived in [6] for heavy tails. In the case of sharp sub-Gaussian entries, which include Rademacher (binary) entries, it was recently proved that the large deviations of the extreme eigenvalues are the same than in the Gaussian case [7].

The probability of atypical eigenvectors has been much less studied. Again, the only result that we know concerns the Gaussian ensembles: in this case, the invariance by multiplication of the Haar measure implies that each eigenvector is uniformly distributed on the sphere. In

[4], the large deviations for the empirical measure of the properly rescaled entries of an eigenvector was established. The large deviations for the supremum of the entries could also be easily derived.
In this article, we address a different question. We want to investigate the large deviations of the eigenvector in a given fixed direction. In many solvable random matrix models, eigenvectors are uniformly distributed; hence there are no meaningful atypical fluctuations or special directions to focus on. For a spiked GOE matrix, i.e. a random matrix drawn from the Gaussian Orthogonal Ensemble plus a rank-one perturbation, there is instead a special direction: the one related to the perturbation. In this case an interesting phenomenon, called BBP-transition, takes place by varying the strength of the perturbation (called in the following). As shown in [8] and then proved rigorously in [9] the largest eigenvalue, , pops out of the semi-circle if the perturbation is strong enough. More precisely, is almost surely equal to two for and to for . In the latter case, the square of the component of the associated eigenvector in the direction associated to the perturbation, that henceforth we shall denote , is almost surely equal to . In this context the question we raised before becomes meaningful, and it is natural to focus on the good rate function (GRF) that controls the joint atypical fluctuations of and .
This GRF plays an important role for the geometric properties of random high-dimensional energy landscapes, which can exhibit a number of critical points that is exponentially large in the number of dimensions, as obtained in [10, 11, 12] and rigorously proven and extended in [13, 14]. The rigorous method developed to perform those studies is based on a large dimensional version of the Kac-Rice formula [15], and is strongly related to random matrix theory, since the Hessian of the energy function at the critical points—a crucial element in the theoretical analysis—is a random matrix. In order to analyze the dynamics in those rough landscapes it is important to know not only the behavior of typical critical points, but also of atypical ones associated to index one saddles connecting minima [16]. One has therefore to study large deviations of the Hessian, i.e. one needs to condition the critical points to be of index one and to have the eigenvector associated to the negative eigenvalue oriented in the direction connecting the minima, which leads in fact the problem discussed above.
Noise dressing and cleaning of empirical correlation matrices is another context in which the kind of large deviations addressed in this paper are relevant. In this case, a model that is often considered to interpret the data is the one of spiked Wishart random matrices, whose eigenvalue distribution consists in a Marchenko-Pastur law plus a few eigenvalues that pop out from it. Those few eigenvalues correspond to the signal buried in the noise and the associated eigenvectors play an important role in assessing the structure of the correlations, with important applications such as portfolios risk management [17]. A natural question in this context is to characterize the joint atypical fluctuations of the largest eigenvalues and associated eigenvectors that carry the signal. In this work we obtain the large deviation function that governs them.

2 Main results

We consider the matrix

where is from the GOE if (resp. the GUE if ) and is a non-negative real number.

is a fixed unit vector and we may assume without loss of generality that

. Let be the eigenvalues of , with respective eigenvectors . The joint large deviations of the largest eigenvalue and the component of the associated eigenvector along is governed by the following theorem.

Theorem 2.1.

The joint law of satisfies a large deviation principle in the scale and good rate function . In other words, for any closed set of

and for any open set of

Moreover, is a good rate function in the sense that it is non-negative and with compact level sets. More precisely, the function is infinite outside of and otherwise given by where

(1)

where if , then

whereas if ,

Here is the semi-circle distribution and its Cauchy transform.
The second largest eigenvalue converges almost surely to the maximizer, , of the variational problem (1) defined above.

We have more explicit results on the rate function, the behavior of the second largest eigenvalue and the component of the associated eigenvector along :

  • For and : The second eigenvalue pops out of the semicircle for , and is equal to . The minimum on of the large deviation function , for a given , is reached at .

  • For and : The second eigenvalue pops out of the semicircle for , and is equal to , i.e. it increases when decreases until reaching the value . The minimum of the large deviation function is reached at .

  • For and : The second eigenvalue sticks to two. The minimum of the large deviation function is at , if is not positive, or at otherwise. The latter case corresponds to large enough values of .

  • The component of the eigenvector associated to the second largest eigenvalue is different from zero if and only if the associated eigenvalue is larger than two, i.e. the eigenvector orients in the direction of when pops out from the semi-circle.

An example of the large deviation function (GRF) is shown in Fig. 1 for .

Figure 1: Large deviation function plotted for , and . The global minimum is attained at .

The previous results can be extended to the large deviations of the largest eigenvalues and their first components . We denote by the space where these extreme eigenvalues live.

Theorem 2.2.

Let be a fixed integer number. The joint law of and , , satisfies a LDP with speed and GRF which is infinite outside of and otherwise given by where

where the function is the same one of Theorem 2.1. The -th largest eigenvalue is equal almost surely to the maximizer of the variational problem defined above.

Moreover, we can as well extend our results in the case of Wishart matrices with covariance which is a finite dimensional perturbation of the identity. To simplify, let us assume it is one dimensional, and consider the Wishart matrix

where is a

random matrix with i.i.d standard Gaussian entries with variance

. is a non-negative definite matrix : with a unit vector. We assume . We recall that when converges towards , the empirical measure of converges towards the so-called Marchenko-Pastur [18] law with support . We can study the joint large deviation of the largest eigenvalue and the strength of the eigenvector in the direction for as well. We find that

Theorem 2.3.

The joint law of satisfies a large deviation principle in the scale and good rate function . The function is infinite outside of and otherwise given by where

where is the rate function for the large deviation of the largest eigenvalue of a Gaussian Wishart matrix with covariance equal to the identity.

3 Strategy of the proof

We next focus on the pertubed Wigner matrix . The law of is given by

Therefore, since when , the joint law of is given by

where

if

is the expectation on conditionaly to . is the distribution of .

Our main goal is to estimate the density of

when is large and to apply Laplace’s method. We infer from concentration inequalities [19] that with

(2)

We deduce that for (the singularity of the log can be overcome as in [4])

(3)

i.e. we can replace the empirical distribution with its average. To estimate the other terms, we first observe that since is uniformly distributed on the sphere, we can represent as

(4)

with independent standard Gaussian variables , which are real when and complex when . As a consequence, we see that the distribution of

is the Beta-distribution

(5)

Hence the main point of the proof is to estimate

(6)

where the expectation is over with given . Thanks to (4), if we fix and denote , we have

so that follows the uniform law on the sphere and

Observe that is independent of (as the computation of the joint law reveals). Hence

(7)

4 Asymptotic of spherical integrals

Recall the definition of spherical integrals:

where is uniformly sampled on the sphere with radius one. The asymptotics of

were studied in [20] where the following result was proved.

Theorem 4.1.

[20, Theorem 6] Let be a sequence of real symmetric matrices such that :

  • The sequence of empirical measures converges weakly to a compactly supported measure .

  • There are two real numbers such that

For any ,

The limit is defined as follows. For a compactly supported probability measure we define its Stieltjes transform by,

where denotes the support of . In the sequel, for any compactly supported probability measure we denote by the right edge of the support of Then is a bijection from to , with

We denote by its inverse on and let be the -transform of as defined by Voiculescu in [21] (defined on ).

In order to define the rate function, we now introduce, for any and ,

with

In the case where , the semi-circular law, then,

Therefore,

Lemma 4.2.

If , then

whereas if ,

5 Proof of Theorem 2.1

Remark that Theorem 2.1 implies the weak large deviation principle which states that for small enough,

Indeed, the weak large deviation principle is simply the restriction of the full large deviation principle to small balls. To recover the full large deviation principle from its weak version, it is enough to show that the probability is exponentially tight in the sense that deviations mostly occur in a compact set. The latter is easy to check since lives in a compact set and

where it is known that with probability greater than [4]. We refer the reader to [22] for more details. Hence, we only need to prove the weak large deviation principle, that is estimate the probability that is close to some .

By Theorem 4.1, if we assume addtionally that is close to y, and close to , we deduce using (2) and (7) that

But satisfies a LDP under [4] with good rate function which is infinite above and below , and otherwise given by

where . Hence we deduce by continuity of the limiting spherical integrals [23] that

and therefore, plugging (3),(5) in the above estimate, we deduce that the joint law of is approximately given by

The final result follows by Laplace’s method.

6 Proof of Theorem 2.2

The law of is given by

and therefore, since , the joint law of for is given by

where equals

Here we have denoted

if

is the expectation on conditionally to . is the distribution of .
The analysis of the expressions above allows to extend straightforwardly Theorem 2.1 to 2.2. This follows from four remarks:

  1. The term does not lead to any contribution to the GRF as long as the s are distinct.

  2. The spherical integral is performed on that are uniform on the sphere and such that

    As a consequence, one can write

    where is uniform on the sphere . Therefore the spherical integral is the same one evaluated for Theorem 2.1 with replaced by .

  3. Because of rotational symmetry depends only on . Moreover, calling the distribution of , we remark that and have the same GRF, since the integral over the s at fixed does not lead to a term exponential in . Furthermore, and the distribution , introduced for Theorem 2.1, also have the same GRF. In conclusion, the contribution to the total GRF due to is the same one of Theorem 2.1 with replaced by .

  4. The above implies by Laplace method the weak large deviation principle at any strictly ordered sequence of points , that is

    and the same when the limsup are replaced by a liminf. We deduce the same case for ordered and eventually equal by taking approximating sequences and which are strictly ordered and such that

    By the previous bounds we deduce that

    where . The continuity of in the allows to conclude by letting going to zero.

7 Study of the rate function

We can give a more explicit formula of the rate function by noticing that the supremum

was already studied in [23]. In the notations of [23], we are maximizing on . According to [23, Section 3.2] of this paper we find that

  • If , the maximum is achieved at

    or at , if is smaller than , and

  • If , we are optimizing a decreasing function and therefore the maximum is taken at and

    with

Note that these two cases correspond to different asymptotic behaviors of when goes to and goes to : if is larger than , goes to , and otherwise to .

We can therefore study the optimizer in of for a given .

  • For , the contribution to the GRF that depends on and that we have to minimize reads:

    which is independent of if . If