1 Introduction
Extracting lowrank information from a data matrix corrupted with noise is a fundamental statistical task. Spiked random matrix models have attracted considerable attention in statistics, probability and machine learning as rich testbeds for theoretical investigation on this problem
[Joh01, Péc14, Péc06, BAP05]. A basic such model is the spiked Wigner model in which one observes a rankone deformation of a Wigner matrix :(1) 
where and are independent for all . The spikevector represents the signal to be recovered, and plays the role of a SignaltoNoise Ratio (SNR) parameter. The entries of come i.i.d. from a (Borel) prior on with bounded support, so that the scaling in the above model puts the problem in a highnoise regime where only partial recovery of the spike is possible. A basic statistical question about this model is for what values of the SNR is it possible to estimate the spike with nontrivial accuracy?
Spectral methods, or more precisely, estimation using the top eigenvector of
, are know to succeed above a spectral threshold and fail below [BGN11]. Since the posterior mean is the estimator with minimal mean squared error, this question boils down to the study of the posterior distribution of given , which by Bayes’ rule, can be written as(2) 
where is the (random) Hamiltonian
(3)  
Let us define the free entropy^{1}^{1}1The term “free energy” is also used, although the physics convention requires to put a minus sign in front of the expression in this case. of the model as the expected logpartition function (i.e., normalizing constant) of the posterior :
(4) 
By heuristically analyzing an approximate message passing (AMP) algorithm for this problem, Lesieur et al.
[LKZ15] derived an asymptotic—socalled replicasymmetric ()—formula for the above quantity. This formula is defined as follows: for , letwhere and are mutually independent. Define the potential
The conjectured limit of (4) is the formula
This conjecture was then proved shortly after in a series of papers [KXZ16, BDM16, DAM16, LM16] (see also [KM09]):
Theorem 1.
For all ,
The above statement contains precious statistical information. It can be written in at least two other equivalent ways, in terms of the mutual information between and :
or, denoting by
the probability distribution of the matrix
as per (1), in terms of the KullbackLiebler divergence between and :Furthermore, the point achieving the maximum in the formula (which can be shown to be unique and finite for almost every ) can be interpreted as the best overlap any estimator can have with the spike . Indeed, the overlap of a draw from the posterior with concentrates about . See [BDM16, LM16, EKJ17] for various forms of this statement.
2 Comment on the existing proofs
The proof of the lower bound relies on an application of Guerra’s interpolation method [Gue01, GT02], and is fairly short and transparent. (See [KXZ16].) Available proofs of the converse bound (as well as overlap concentration) are on the other hand highly involved. Barbier et al. [BDM16] and Deshpande et al. [DAM16] adopt an algorithmic approach: they analyze an approximate message passing procedure and show that the produced estimator asymptotically achieves an overlap of
with the spike. Thus the posterior mean, being the optimal estimator, must also achieve the same overlap. This allows to prove overlap convergence and thus show the converse bound. A difficulty one has to overcome with this method is that AMP (and supposedly any other algorithm) may fail to achieve the optimal overlap in the presence of firstorder phase transitions, which traps the algorithm in a bad local optimum of the
potential. Spatial coupling, an idea from coding theory, is used in [BDM16] to overcome this problem. Lelarge and Miolane [LM16] on the other hand use the AizenmanSimsStarr scheme [ASS03], a relative of the cavity method developed within spinglass theory, to prove the upper bound. Barbier and Macris [BM17] prove the upper bound via a adaptive version of the interpolation method that proceeds via a sequence of intermediate interpolation steps. Recently, the optimal rate of convergence and constant order corrections to the formula were proved in [EKJ17] using a rigorous incarnation of the cavity method due to Talagrand [Tal11]. However, all the current approaches (perhaps to a lesser extent for [BM17]) require the execution of long and technical arguments.In this note, we show that the upper bound in Theorem 1 admits a fairly simple proof based on the same interpolation idea that yielded the lower bound, combined with an application of the Laplace method and concentration of measure. The main idea is to consider a version of the free entropy (4) of a subsystem of configurations having a fixed overlap with the spike . We then proceed by applying the Guerra bound and optimize over this fixed overlap (which is a free parameter) to obtain an upper bound in the form of a saddle (maxmin) formula. A small extra effort is needed to show that this last formula is another representation of the formula. The idea of restricting the overlap dates back to Franz and Parisi [FP95, FP98] who introduced it in order to study the relaxation properties of dynamics in spinglass models. The free entropy at fixed overlap bears the name of the FranzParisi potential. Our proof thus hinges on a upper bound on this potential, which is may be of independent interest. We first start by presenting the proof of the lower bound, which is a starting point for our argument. We present the proof in the case , i.e., we omit the diagonal terms of . This is only done to keep the displays concise; recovering the general case is straightforward since the diagonal has vanishing contribution to the overall free entropy. Finally, the method presented here can be easily generalized to all spiked tensor models of even order [RM14], thus recovering the main results of [LML17].
3 Proof of Theorem 1
Let and consider an interpolating Hamiltonian
(5) 
where the ’s are i.i.d. standard Gaussian r.v.’s independent of everything else. For , we define the Gibbs average of as
(6) 
This is the average of with respect to the posterior distribution of copies of given the augmented set of observations
(7) 
The variables are called replicas
, and are interpreted as random variables independently drawn from the posterior. When
we simply write instead of . We shall denote the overlaps between two replicas as follows: for , we letA simple consequence of Bayes’ rule is that the tuples and have the same law under (see Proposition 16 in [LM16]). This bears the name of the Nishimori property in the spin glass literature [Nis01].
3.1 The lower bound
Reproducing the argument of [KXZ16], we prove using Guerra’s interpolation [Gue01] and the Nishimori property that
We let in the definition of and let
A short calculation based on Gaussian integration by parts shows that
By the Nishimori property, the expressions involving the pairs on the one hand and on the other in the brackets are equal. We then obtain
Observe that the last term is since the variables are bounded. Moreover, the first term is always nonnegative so we obtain
Since and , integrating over , we obtain for all , and this yields the lower bound.
3.2 The upper bound
We prove the converse bound
We introduce the FranzParisi potential [FP95, FP98]. For fixed, and we define
where the expectation is over . This is the free entropy of a subsystem of configurations having an overlap close to a fixed value with a planted signal . It is clear that . We will argue via the Laplace method and concentration of measure that , then use Guerra’s interpolation to upper bound (notice that this method yielded a lower bound on due to the Nishimori property). Let us define a bit of more notation. For , let
where , and where . Moreover, let
and similarly define .
Proposition 2.
There exist such that for all , we have
Now we upper bound in terms of :
Proposition 3 (Interpolation upper bound).
There exist depending on such that for all and we have
Remark: This simple upper bound on the FranzParisi potential—which may be of independent interest—can be straightforwardly generalized to spiked tensor models of even order. Indeed, as will be apparent from the proof in the present matrix case, a crucial step in obtaining the inequality is the positivity of a certain hardtocontrol remainder term^{2}^{2}2We note that the adaptive interpolation method of Barbier and Macris [BM17] is able to bypass this issue of positivity of the remainder term along the interpolation path, as long as this interpolation “stays on the Nishimori line”, i.e., corresponds to an inference problem for every
(this is however not true in the case of the FP potential.) They are thus able to compute the free entropy of (asymmetric) spiked tensor models of odd order. See
[BMM17, BKM17].. Tensor models of even order enjoy a convexity property that ensures the positivity of this remainder.From Propositions 2 and 3, an upper bound on in the form of a saddle formula begins to emerge. For a fixed let be any minimizer of on . (By differentiating , we can check that is bounded uniformly in .) Then we have
(8) 
At this point we need to push the expectation inside the supremum. This will be done using a concentration argument.
Lemma 4.
There exists such that for all , , and ,
It is a routine computation to deduce from Lemma 4 (and boundedness of both and ) that the expected supremum is bounded by the supremum of the expectation plus a small term (a similar argument is given in the proof of Proposition 2):
where . Since is a minimizer of , it follows from (8) that
(9) 
We now let , and conclude by noticing that the above saddle formula is another expression for :
Proposition 5.
Proof.
One inequality follows from (9) and the lower bound . For the converse inequality, we notice that for all
Now we use the fact that the function is largest when its second argument is positive:
Lemma 6.
For all we have
This implies Taking the supremum over yields the converse bound.
Proof of Proposition 2. Let . Since the prior has bounded support, we can grid the set of the overlap values by many intervals of size for some . This allows the following discretization, where runs over the finite range :
(10) 
In the above, is w.r.t. both and . We use concentration of measure to push the expectation over to the left of the maximum. Let
We show that each term concentrates about its expectation (in the randomness of ). Let denote the expectation w.r.t. .
Lemma 7.
There exists a constant such that for all and all ,
Therefore, the expectation of the maximum concentrates as well:
We set and obtain
Therefore, plugging the above estimates into (10), we obtain
Proof of Proposition 3. Let and consider a slightly modified interpolating Hamiltonian that has two parameters and :
(11)  
where the ’s are i.i.d. standard Gaussian r.v.’s independent of everything else. Let
where is over the Gaussian disorder and ( is fixed). Let be the corresponding Gibbs average, similarly to (6). By differentiation and Gaussian integration by parts,
Notice that by the overlap restriction, . Moreover, the last terms in the first and second lines in the above are of order since the variables are bounded. Next, since has nonnegative sign (this is a crucial fact), we can ignore it and obtain an upper bound:
Integrating over , we obtain
Now we use a trivial upper bound on :
Hence,
Proof of lemma 4. The random part of is the average of i.i.d. terms . Since , and , where is a bound on the support of , we have . For bounded and , the claim follows from concentration of the average of i.i.d. bounded r.v.’s.
Proof of Lemma 7. We notice that seen as a function of is Lipschitz with constant . By Gaussian concentration of Lipschitz functions (the TsirelsonIbragimovSudakov inequality [BLM13]), there exist a constant depending only on such that for all ,
Then we conclude by means of the identity
and integrate the tail.
Proof of Lemma 6. Let , and let be the symmetric part of , i.e., for all Borel . Observe that is absolutely continuous with respect to . The argument relies on a linearly interpolating between the two measures and . Let and let . Further, let be fixed, and
where . Now let . We have on the one hand, and since is a symmetric distribution, on the other. We will show that is a convex increasing function on the interval . Then we deduce that . First, we have
and
Similar expressions holds for where is replaced by inside the exponentials. We see from the expression of the first derivative at that . This is because is symmetric about the origin, so a sign change (of for the first term, and for the second term) does not affect the value of the integrals. Hence . Now, we focus on the second derivative. Observe that since is the symmetric part of , is antisymmetric. This implies that the first term in the expression of the second derivative changes sign under a sign change in and keeps the same modulus. As for the second term, a sign change in induces integration against . Hence we can write the difference as
For any Borel , we have . Therefore the second term in the above expression becomes
Since both and are absolutely continuous with respect to for all we can write
where the Gibbs average is with respect to the posterior of given under the Gaussian channel , and the expectation is under and . By the Nishimori property, we simplify the above expression to
where the expression is valid for all . From here we see that the function is convex on . Since , is also increasing on .
Acknowledgments. Florent Krzakala acknowledges funding from the ERC under the European Union 7th Framework Programme Grant Agreement 307087SPARCS.
References
 [ASS03] Michael Aizenman, Robert Sims, and Shannon L Starr. Extended variational principle for the SherringtonKirkpatrick spinglass model. Physical Review B, 68(21):214403, 2003.

[BAP05]
Jinho Baik, Gérard Ben Arous, and Sandrine Péché.
Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices.
Annals of Probability, 33(5):1643–1697, 2005.  [BDM16] Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Thibault Lesieur, and Lenka Zdeborová. Mutual information for symmetric rankone matrix estimation: A proof of the replica formula. In Advances in Neural Information Processing Systems (NIPS), pages 424–432, 2016.
 [BGN11] Florent BenaychGeorges and Raj Rao Nadakuditi. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics, 227(1):494–521, 2011.
 [BKM17] Jean Barbier, Florent Krzakala, Nicolas Macris, Léo Miolane, and Lenka Zdeborová. Phase transitions, optimal errors and optimality of messagepassing in generalized linear models. arXiv preprint arXiv:1708.03395, 2017.
 [BLM13] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
 [BM17] Jean Barbier and Nicolas Macris. The stochastic interpolation method: A simple scheme to prove replica formulas in bayesian inference. arXiv preprint arXiv:1705.02780, 2017.
 [BMM17] Jean Barbier, Nicolas Macris, and Léo Miolane. The layered structure of tensor estimation and its mutual information. arXiv preprint arXiv:1709.10368, 2017.
 [DAM16] Yash Deshpande, Emmanuel Abbé, and Andrea Montanari. Asymptotic mutual information for the binary stochastic block model. In IEEE International Symposium on Information Theory (ISIT), pages 185–189. IEEE, 2016.
 [EKJ17] Ahmed El Alaoui, Florent Krzakala, and Michael I Jordan. Finite size corrections and likelihood ratio fluctuations in the spiked Wigner model. arXiv preprint arXiv:1710.02903, 2017.
 [FP95] Silvio Franz and Giorgio Parisi. Recipes for metastable states in spin glasses. Journal de Physique I, 5(11):1401–1415, 1995.
 [FP98] Silvio Franz and Giorgio Parisi. Effective potential in glassy systems: theory and simulations. Physica A: Statistical Mechanics and its Applications, 261(34):317–339, 1998.
 [GT02] Francesco Guerra and Fabio Lucio Toninelli. The thermodynamic limit in mean field spin glass models. Communications in Mathematical Physics, 230(1):71–79, 2002.
 [Gue01] Francesco Guerra. Sum rules for the free energy in the mean field spin glass model. Fields Institute Communications, 30:161–170, 2001.
 [Joh01] Iain M Johnstone. On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics, pages 295–327, 2001.
 [KM09] Satish Babu Korada and Nicolas Macris. Exact solution of the gauge symmetric pspin glass model on a complete graph. Journal of Statistical Physics, 136(2):205–230, 2009.
 [KXZ16] Florent Krzakala, Jiaming Xu, and Lenka Zdeborová. Mutual information in rankone matrix estimation. In Information Theory Workshop (ITW), pages 71–75. IEEE, 2016.
 [LKZ15] Thibault Lesieur, Florent Krzakala, and Lenka Zdeborová. Phase transitions in sparse PCA. In IEEE International Symposium on Information Theory (ISIT), pages 1635–1639. IEEE, 2015.
 [LM16] Marc Lelarge and Léo Miolane. Fundamental limits of symmetric lowrank matrix estimation. arXiv preprint arXiv:1611.03888, 2016.
 [LML17] Thibault Lesieur, Léo Miolane, Marc Lelarge, Florent Krzakala, and Lenka Zdeborová. Statistical and computational phase transitions in spiked tensor estimation. In IEEE International Symposium on Information Theory (ISIT), pages 511–515, June 2017.
 [Nis01] Hidetoshi Nishimori. Statistical physics of spin glasses and information processing: an introduction, volume 111. Clarendon Press, 2001.
 [Péc06] Sandrine Péché. The largest eigenvalue of small rank perturbations of Hermitian random matrices. Probability Theory and Related Fields, 134(1):127–173, 2006.
 [Péc14] Sandrine Péché. Deformed ensembles of random matrices. In Proceedings of the International Congress of Mathematicians, Seoul, pages 1059–1174. ICM, 2014.
 [RM14] Emile Richard and Andrea Montanari. A statistical model for tensor PCA. In Advances in Neural Information Processing Systems (NIPS), pages 2897–2905, 2014.
 [Tal11] Michel Talagrand. Mean field models for spin glasses. Volume I: Basic examples, volume 54. Springer Science & Business Media, 2011.
Comments
There are no comments yet.