1 Introduction
The large deviations theory for the spectral properties of random matrix models is a very active domain of research in probability theory and theoretical physics.
A lot of works have been devoted to the statistics of the eigenvalues. Following Voiculescu pioneering work on noncommutative entropy [1], G. Ben Arous and one of the author derived a large deviation principle for the distribution of the empirical measure of the eigenvalues of Gaussian ensembles in the late nineties (in physics known as Coulomb gas method [2]). The proof is based on the explicit density of the joint law of the eigenvalues and the speed of the large deviation principle is the square of the linear dimension of the random matrix. More than ten years later, C. Bordenave and P. Caputo [3] obtained a large deviations principle for the same empirical measure but for a Wigner matrix with heavy tails entries, in the sense that their tail decays more slowly than a Gaussian variable at infinity. Their approach is totally different as it follows from the ideas that deviations are created by a few big entries: the rate then depends on the speed of decay of the tail. The large deviation principle for the spectral measure in the general subGaussian case is still an open problem.
Instead of considering the deviations of the empirical measure, it is also natural to try to understand the probability of deviations of a single eigenvalue. The deviations of an eigenvalue inside the bulk is closely related to that of the empirical measure but one can seek for the probability of deviations of the extreme eigenvalues. This was achieved for Gaussian ensembles in the Appendix of [4], see also [5], where it was shown that the large deviations are on the scale of the dimension. Again, the proof was based on the explicit joint law of the eigenvalues. The large deviations principle for the largest eigenvalue was derived in [6] for heavy tails. In the case of sharp subGaussian entries, which include Rademacher (binary) entries, it was recently proved that the large deviations of the extreme eigenvalues are the same than in the Gaussian case [7].
The probability of atypical eigenvectors has been much less studied. Again, the only result that we know concerns the Gaussian ensembles: in this case, the invariance by multiplication of the Haar measure implies that each eigenvector is uniformly distributed on the sphere. In
[4], the large deviations for the empirical measure of the properly rescaled entries of an eigenvector was established. The large deviations for the supremum of the entries could also be easily derived.In this article, we address a different question. We want to investigate the large deviations of the eigenvector in a given fixed direction. In many solvable random matrix models, eigenvectors are uniformly distributed; hence there are no meaningful atypical fluctuations or special directions to focus on. For a spiked GOE matrix, i.e. a random matrix drawn from the Gaussian Orthogonal Ensemble plus a rankone perturbation, there is instead a special direction: the one related to the perturbation. In this case an interesting phenomenon, called BBPtransition, takes place by varying the strength of the perturbation (called in the following). As shown in [8] and then proved rigorously in [9] the largest eigenvalue, , pops out of the semicircle if the perturbation is strong enough. More precisely, is almost surely equal to two for and to for . In the latter case, the square of the component of the associated eigenvector in the direction associated to the perturbation, that henceforth we shall denote , is almost surely equal to . In this context the question we raised before becomes meaningful, and it is natural to focus on the good rate function (GRF) that controls the joint atypical fluctuations of and .
This GRF plays an important role for the geometric properties of random highdimensional energy landscapes, which can exhibit a number of critical points that is exponentially large in the number of dimensions, as obtained in [10, 11, 12] and rigorously proven and extended in [13, 14]. The rigorous method developed to perform those studies is based on a large dimensional version of the KacRice formula [15], and is strongly related to random matrix theory, since the Hessian of the energy function at the critical points—a crucial element in the theoretical analysis—is a random matrix. In order to analyze the dynamics in those rough landscapes it is important to know not only the behavior of typical critical points, but also of atypical ones associated to index one saddles connecting minima [16]. One has therefore to study large deviations of the Hessian, i.e. one needs to condition the critical points to be of index one and to have the eigenvector associated to the negative eigenvalue oriented in the direction connecting the minima, which leads in fact the problem discussed above.
Noise dressing and cleaning of empirical correlation matrices is another context in which the kind of large deviations addressed in this paper are relevant. In this case, a model that is often considered to interpret the data is the one of spiked Wishart random matrices, whose eigenvalue distribution consists in a MarchenkoPastur law plus a few eigenvalues that pop out from it. Those few eigenvalues correspond to the signal buried in the noise and the associated eigenvectors play an important role in assessing the structure of the correlations, with important applications such as portfolios risk management [17]. A natural question in this context is to characterize the joint atypical fluctuations of the largest eigenvalues and associated eigenvectors that carry the signal. In this work we obtain the large deviation function that governs them.
2 Main results
We consider the matrix
where is from the GOE if (resp. the GUE if ) and is a nonnegative real number.
is a fixed unit vector and we may assume without loss of generality that
. Let be the eigenvalues of , with respective eigenvectors . The joint large deviations of the largest eigenvalue and the component of the associated eigenvector along is governed by the following theorem.Theorem 2.1.
The joint law of satisfies a large deviation principle in the scale and good rate function . In other words, for any closed set of
and for any open set of
Moreover, is a good rate function in the sense that it is nonnegative and with compact level sets. More precisely, the function is infinite outside of and otherwise given by where
(1) 
where if , then
whereas if ,
Here is the semicircle distribution and its Cauchy transform.
The second largest eigenvalue converges almost surely to the maximizer, , of the variational problem (1) defined above.
We have more explicit results on the rate function, the behavior of the second largest eigenvalue and the component of the associated eigenvector along :

For and : The second eigenvalue pops out of the semicircle for , and is equal to . The minimum on of the large deviation function , for a given , is reached at .

For and : The second eigenvalue pops out of the semicircle for , and is equal to , i.e. it increases when decreases until reaching the value . The minimum of the large deviation function is reached at .

For and : The second eigenvalue sticks to two. The minimum of the large deviation function is at , if is not positive, or at otherwise. The latter case corresponds to large enough values of .

The component of the eigenvector associated to the second largest eigenvalue is different from zero if and only if the associated eigenvalue is larger than two, i.e. the eigenvector orients in the direction of when pops out from the semicircle.
An example of the large deviation function (GRF) is shown in Fig. 1 for .
The previous results can be extended to the large deviations of the largest eigenvalues and their first components . We denote by the space where these extreme eigenvalues live.
Theorem 2.2.
Let be a fixed integer number. The joint law of and , , satisfies a LDP with speed and GRF which is infinite outside of and otherwise given by where
where the function is the same one of Theorem 2.1. The th largest eigenvalue is equal almost surely to the maximizer of the variational problem defined above.
Moreover, we can as well extend our results in the case of Wishart matrices with covariance which is a finite dimensional perturbation of the identity. To simplify, let us assume it is one dimensional, and consider the Wishart matrix
where is a
random matrix with i.i.d standard Gaussian entries with variance
. is a nonnegative definite matrix : with a unit vector. We assume . We recall that when converges towards , the empirical measure of converges towards the socalled MarchenkoPastur [18] law with support . We can study the joint large deviation of the largest eigenvalue and the strength of the eigenvector in the direction for as well. We find thatTheorem 2.3.
The joint law of satisfies a large deviation principle in the scale and good rate function . The function is infinite outside of and otherwise given by where
where is the rate function for the large deviation of the largest eigenvalue of a Gaussian Wishart matrix with covariance equal to the identity.
3 Strategy of the proof
We next focus on the pertubed Wigner matrix . The law of is given by
Therefore, since when , the joint law of is given by
where
if
is the expectation on conditionaly to . is the distribution of .
Our main goal is to estimate the density of
when is large and to apply Laplace’s method. We infer from concentration inequalities [19] that with(2) 
We deduce that for (the singularity of the log can be overcome as in [4])
(3) 
i.e. we can replace the empirical distribution with its average. To estimate the other terms, we first observe that since is uniformly distributed on the sphere, we can represent as
(4) 
with independent standard Gaussian variables , which are real when and complex when . As a consequence, we see that the distribution of
is the Betadistribution
(5) 
Hence the main point of the proof is to estimate
(6) 
where the expectation is over with given . Thanks to (4), if we fix and denote , we have
so that follows the uniform law on the sphere and
Observe that is independent of (as the computation of the joint law reveals). Hence
(7) 
4 Asymptotic of spherical integrals
Recall the definition of spherical integrals:
where is uniformly sampled on the sphere with radius one. The asymptotics of
were studied in [20] where the following result was proved.
Theorem 4.1.
[20, Theorem 6] Let be a sequence of real symmetric matrices such that :

The sequence of empirical measures converges weakly to a compactly supported measure .

There are two real numbers such that
For any ,
The limit is defined as follows. For a compactly supported probability measure we define its Stieltjes transform by,
where denotes the support of . In the sequel, for any compactly supported probability measure we denote by the right edge of the support of Then is a bijection from to , with
We denote by its inverse on and let be the transform of as defined by Voiculescu in [21] (defined on ).
In order to define the rate function, we now introduce, for any and ,
with
In the case where , the semicircular law, then,
Therefore,
Lemma 4.2.
If , then
whereas if ,
5 Proof of Theorem 2.1
Remark that Theorem 2.1 implies the weak large deviation principle which states that for small enough,
Indeed, the weak large deviation principle is simply the restriction of the full large deviation principle to small balls. To recover the full large deviation principle from its weak version, it is enough to show that the probability is exponentially tight in the sense that deviations mostly occur in a compact set. The latter is easy to check since lives in a compact set and
where it is known that with probability greater than [4]. We refer the reader to [22] for more details. Hence, we only need to prove the weak large deviation principle, that is estimate the probability that is close to some .
By Theorem 4.1, if we assume addtionally that is close to y, and close to , we deduce using (2) and (7) that
But satisfies a LDP under [4] with good rate function which is infinite above and below , and otherwise given by
where . Hence we deduce by continuity of the limiting spherical integrals [23] that
and therefore, plugging (3),(5) in the above estimate, we deduce that the joint law of is approximately given by
The final result follows by Laplace’s method.
6 Proof of Theorem 2.2
The law of is given by
and therefore, since , the joint law of for is given by
where equals
Here we have denoted
if
is the expectation on conditionally to .
is the distribution of .
The analysis of the expressions above allows to extend straightforwardly Theorem 2.1 to 2.2.
This follows from four remarks:

The term does not lead to any contribution to the GRF as long as the s are distinct.

The spherical integral is performed on that are uniform on the sphere and such that
As a consequence, one can write
where is uniform on the sphere . Therefore the spherical integral is the same one evaluated for Theorem 2.1 with replaced by .

Because of rotational symmetry depends only on . Moreover, calling the distribution of , we remark that and have the same GRF, since the integral over the s at fixed does not lead to a term exponential in . Furthermore, and the distribution , introduced for Theorem 2.1, also have the same GRF. In conclusion, the contribution to the total GRF due to is the same one of Theorem 2.1 with replaced by .

The above implies by Laplace method the weak large deviation principle at any strictly ordered sequence of points , that is
and the same when the limsup are replaced by a liminf. We deduce the same case for ordered and eventually equal by taking approximating sequences and which are strictly ordered and such that
By the previous bounds we deduce that
where . The continuity of in the allows to conclude by letting going to zero.
7 Study of the rate function
We can give a more explicit formula of the rate function by noticing that the supremum
was already studied in [23]. In the notations of [23], we are maximizing on . According to [23, Section 3.2] of this paper we find that

If , the maximum is achieved at
or at , if is smaller than , and

If , we are optimizing a decreasing function and therefore the maximum is taken at and
with
Note that these two cases correspond to different asymptotic behaviors of when goes to and goes to : if is larger than , goes to , and otherwise to .
We can therefore study the optimizer in of for a given .

For , the contribution to the GRF that depends on and that we have to minimize reads:
which is independent of if . If
Comments
There are no comments yet.