# Estimation in the Spiked Wigner Model: A Short Proof of the Replica Formula

We consider the problem of estimating the rank-one perturbation of a Wigner matrix in a setting of low signal-to-noise ratio. This serves as a simple model for principal component analysis in high dimensions. The mutual information per variable between the spike and the observed matrix, or equivalently, the normalized Kullback-Leibler divergence between the planted and null models are known to converge to the so-called replica-symmetric formula, the properties of which determine the fundamental limits of estimation in this model. We provide in this note a short and transparent proof of this formula, based on simple executions of Gaussian interpolations and standard concentration-of-measure arguments. The Franz-Parisi potential, that is, the free entropy at a fixed overlap, plays an important role in our proof. Our proof can be generalized straightforwardly to spiked tensor models of even order.

## Authors

• 14 publications
• 64 publications
11/04/2014

### A statistical model for tensor PCA

We consider the Principal Component Analysis problem for large tensors o...
06/26/2020

### Tensor estimation with structured priors

We consider rank-one symmetric tensor estimation when the tensor is corr...
05/30/2020

### An Analytical Formula for Spectrum Reconstruction

We study the spectrum reconstruction technique. As is known to all, eige...
08/02/2018

### Algorithmic thresholds for tensor PCA

We study the algorithmic thresholds for principal component analysis of ...
05/16/2020

### Information-theoretic limits of a multiview low-rank symmetric spiked matrix model

We consider a generalization of an important class of high-dimensional i...
06/25/2018

### Fundamental limits of detection in the spiked Wigner model

We study the fundamental limits of detecting the presence of an additive...
06/25/2019

### A note on Bianchi-Donà's proof to the variance formula of von Neumann entropy

Bianchi and Donà [1] have recently reported a proof to the variance form...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Extracting low-rank information from a data matrix corrupted with noise is a fundamental statistical task. Spiked random matrix models have attracted considerable attention in statistics, probability and machine learning as rich testbeds for theoretical investigation on this problem

[Joh01, Péc14, Péc06, BAP05]. A basic such model is the spiked Wigner model in which one observes a rank-one deformation of a Wigner matrix :

 Y=√λNx∗x∗⊤+W, (1)

where and are independent for all . The spikevector represents the signal to be recovered, and plays the role of a Signal-to-Noise Ratio (SNR) parameter. The entries of come i.i.d. from a (Borel) prior on with bounded support, so that the scaling in the above model puts the problem in a high-noise regime where only partial recovery of the spike is possible. A basic statistical question about this model is for what values of the SNR is it possible to estimate the spike with non-trivial accuracy?

Spectral methods, or more precisely, estimation using the top eigenvector of

, are know to succeed above a spectral threshold and fail below [BGN11]. Since the posterior mean is the estimator with minimal mean squared error, this question boils down to the study of the posterior distribution of given , which by Bayes’ rule, can be written as

 dPλ(x|Y)=e−H(x)dP⊗Nx(x)∫e−H(x)dP⊗Nx(x), (2)

where is the (random) Hamiltonian

 −H(x) :=∑i

Let us define the free entropy111The term “free energy” is also used, although the physics convention requires to put a minus sign in front of the expression in this case. of the model as the expected log-partition function (i.e., normalizing constant) of the posterior :

 FN=1NElog ∫e−H(x)dP⊗Nx(x), (4)

By heuristically analyzing an approximate message passing (AMP) algorithm for this problem, Lesieur et al.

[LKZ15] derived an asymptotic—so-called replica-symmetric ()—formula for the above quantity. This formula is defined as follows: for , let

 ψ(r):=Ex∗,zlog∫exp(√rzx+rxx∗−r2x2)dPx(x),

where and are mutually independent. Define the potential

 F(λ,q):=ψ(λq)−λq24.

The conjectured limit of (4) is the formula

 ϕRS(λ):=supq≥0F(λ,q).

This conjecture was then proved shortly after in a series of papers [KXZ16, BDM16, DAM16, LM16] (see also [KM09]):

###### Theorem 1.

For all ,

 limN→∞FN=ϕRS(λ).

The above statement contains precious statistical information. It can be written in at least two other equivalent ways, in terms of the mutual information between and :

 limN→∞ 1NI(Y,x∗) = λ4(EPx[X2])2−ϕRS(λ),

or, denoting by

the probability distribution of the matrix

as per (1), in terms of the Kullback-Liebler divergence between and :

 limN→∞ 1NDKL(Pλ,P0) = ϕRS(λ).

Furthermore, the point achieving the maximum in the formula (which can be shown to be unique and finite for almost every ) can be interpreted as the best overlap any estimator can have with the spike . Indeed, the overlap of a draw from the posterior with concentrates about . See [BDM16, LM16, EKJ17] for various forms of this statement.

## 2 Comment on the existing proofs

The proof of the lower bound relies on an application of Guerra’s interpolation method [Gue01, GT02], and is fairly short and transparent. (See [KXZ16].) Available proofs of the converse bound (as well as overlap concentration) are on the other hand highly involved. Barbier et al. [BDM16] and Deshpande et al. [DAM16] adopt an algorithmic approach: they analyze an approximate message passing procedure and show that the produced estimator asymptotically achieves an overlap of

with the spike. Thus the posterior mean, being the optimal estimator, must also achieve the same overlap. This allows to prove overlap convergence and thus show the converse bound. A difficulty one has to overcome with this method is that AMP (and supposedly any other algorithm) may fail to achieve the optimal overlap in the presence of first-order phase transitions, which traps the algorithm in a bad local optimum of the

potential. Spatial coupling, an idea from coding theory, is used in [BDM16] to overcome this problem. Lelarge and Miolane [LM16] on the other hand use the Aizenman-Sims-Starr scheme [ASS03], a relative of the cavity method developed within spin-glass theory, to prove the upper bound. Barbier and Macris [BM17] prove the upper bound via a adaptive version of the interpolation method that proceeds via a sequence of intermediate interpolation steps. Recently, the optimal rate of convergence and constant order corrections to the formula were proved in [EKJ17] using a rigorous incarnation of the cavity method due to Talagrand [Tal11]. However, all the current approaches (perhaps to a lesser extent for [BM17]) require the execution of long and technical arguments.

In this note, we show that the upper bound in Theorem 1 admits a fairly simple proof based on the same interpolation idea that yielded the lower bound, combined with an application of the Laplace method and concentration of measure. The main idea is to consider a version of the free entropy (4) of a subsystem of configurations having a fixed overlap with the spike . We then proceed by applying the Guerra bound and optimize over this fixed overlap (which is a free parameter) to obtain an upper bound in the form of a saddle (max-min) formula. A small extra effort is needed to show that this last formula is another representation of the formula. The idea of restricting the overlap dates back to Franz and Parisi [FP95, FP98] who introduced it in order to study the relaxation properties of dynamics in spin-glass models. The free entropy at fixed overlap bears the name of the Franz-Parisi potential. Our proof thus hinges on a upper bound on this potential, which is may be of independent interest. We first start by presenting the proof of the lower bound, which is a starting point for our argument. We present the proof in the case , i.e., we omit the diagonal terms of . This is only done to keep the displays concise; recovering the general case is straightforward since the diagonal has vanishing contribution to the overall free entropy. Finally, the method presented here can be easily generalized to all spiked tensor models of even order [RM14], thus recovering the main results of [LML17].

## 3 Proof of Theorem 1

Let and consider an interpolating Hamiltonian

 −Ht(x) :=∑i

where the ’s are i.i.d. standard Gaussian r.v.’s independent of everything else. For , we define the Gibbs average of as

 (6)

This is the average of with respect to the posterior distribution of copies of given the augmented set of observations

 ⎧⎨⎩Yij=√tλNx∗ix∗j+Wij,1≤i≤j≤N,yi=√(1−t)rx∗i+zi,1≤i≤N. (7)

The variables are called replicas

, and are interpreted as random variables independently drawn from the posterior. When

we simply write instead of . We shall denote the overlaps between two replicas as follows: for , we let

 Rl,l′:=x(l)⋅x(l′)=1NN∑i=1x(l)ix(l′)i.

A simple consequence of Bayes’ rule is that the -tuples and have the same law under (see Proposition 16 in [LM16]). This bears the name of the Nishimori property in the spin glass literature [Nis01].

### 3.1 The lower bound

Reproducing the argument of [KXZ16], we prove using Guerra’s interpolation [Gue01] and the Nishimori property that

 FN≥ϕRS(λ)−KN.

We let in the definition of and let

 φ(t):=1NElog∫e−Ht(x)dP⊗Nx(x).

A short calculation based on Gaussian integration by parts shows that

 φ′(t)= −λ4E⟨(R1,2−q)2⟩t+λ4q2+λ4N2N∑i=1E⟨x(1)i2x(2)i2⟩t

By the Nishimori property, the expressions involving the pairs on the one hand and on the other in the brackets are equal. We then obtain

 φ′(t)=λ4E⟨(R1,∗−q)2⟩t−λ4q2−λ4NE⟨xN2x∗N2⟩t.

Observe that the last term is since the variables are bounded. Moreover, the first term is always non-negative so we obtain

 φ′(t)≥−λ4q2−KN.

Since and , integrating over , we obtain for all , and this yields the lower bound.

### 3.2 The upper bound

We prove the converse bound

 FN≤ϕRS(λ)+O(logN√N).

We introduce the Franz-Parisi potential [FP95, FP98]. For fixed, and we define

 Φϵ(m,x∗):=1NElog∫1{R1,∗∈[m,m+ϵ)}e−H(x)dP⊗Nx(x),

where the expectation is over . This is the free entropy of a subsystem of configurations having an overlap close to a fixed value with a planted signal . It is clear that . We will argue via the Laplace method and concentration of measure that , then use Guerra’s interpolation to upper bound (notice that this method yielded a lower bound on due to the Nishimori property). Let us define a bit of more notation. For , let

 ˆψ(r,s):=Ezlog∫exp(√rzx+sx−r2x2)dPx(x),

where , and where . Moreover, let

 ˆF(λ,m,q,x∗):=1NN∑i=1ˆψ(λq,λmx∗i)−λm22+λq24,

and similarly define .

###### Proposition 2.

There exist such that for all , we have

 FN≤Ex∗[maxl∈Z,|l|≤K/ϵΦϵ(lϵ,x∗)]+log(K/ϵ)√N.

Now we upper bound in terms of :

###### Proposition 3 (Interpolation upper bound).

There exist depending on such that for all and we have

 Φϵ(m,x∗)≤infq≥0ˆF(λ,m,q,x∗)+λ2ϵ2+KN.

Remark: This simple upper bound on the Franz-Parisi potential—which may be of independent interest—can be straightforwardly generalized to spiked tensor models of even order. Indeed, as will be apparent from the proof in the present matrix case, a crucial step in obtaining the inequality is the positivity of a certain hard-to-control remainder term222We note that the adaptive interpolation method of Barbier and Macris [BM17] is able to bypass this issue of positivity of the remainder term along the interpolation path, as long as this interpolation “stays on the Nishimori line”, i.e., corresponds to an inference problem for every

(this is however not true in the case of the FP potential.) They are thus able to compute the free entropy of (asymmetric) spiked tensor models of odd order. See

[BMM17, BKM17].. Tensor models of even order enjoy a convexity property that ensures the positivity of this remainder.

From Propositions 2 and 3, an upper bound on in the form of a saddle formula begins to emerge. For a fixed let be any minimizer of on . (By differentiating , we can check that is bounded uniformly in .) Then we have

 FN≤Ex∗[maxm=lϵ|l|≤K/ϵˆF(λ,m,¯q(λ,m),x∗)]+λ2ϵ2+log(K/ϵ)√N. (8)

At this point we need to push the expectation inside the supremum. This will be done using a concentration argument.

###### Lemma 4.

There exists such that for all , , and ,

 Prx∗(∣∣ˆF(λ,m,q,x∗)−\makebox[0.0pt][l]F(λ,m,q)∣∣≥t)≤2e−Nt2λ2K(|m|+q)2.

It is a routine computation to deduce from Lemma 4 (and boundedness of both and ) that the expected supremum is bounded by the supremum of the expectation plus a small term (a similar argument is given in the proof of Proposition 2):

 Esupm=lϵ|l|≤K/ϵˆF(λ,m,¯q(λ,m),x∗)

where . Since is a minimizer of , it follows from (8) that

 FN≤supm∈Rinfq≥0\makebox[0.0pt][l]F(λ,m,q)+λ2ϵ2+Klog(K/ϵ)√N. (9)

We now let , and conclude by noticing that the above saddle formula is another expression for :

###### Proof.

One inequality follows from (9) and the lower bound . For the converse inequality, we notice that for all

Now we use the fact that the function is largest when its second argument is positive:

###### Lemma 6.

For all we have

This implies Taking the supremum over yields the converse bound.

Proof of Proposition 2. Let . Since the prior has bounded support, we can grid the set of the overlap values by many intervals of size for some . This allows the following discretization, where runs over the finite range :

 FN =1NElog∑l∫1{R1,∗∈[lϵ,(l+1)ϵ)}e−H(x)dP⊗Nx(x) ≤1NElog2Kϵmaxl∫1{R1,∗∈[lϵ,(l+1)ϵ)}e−H(x)dP⊗Nx(x) =1NEmaxllog∫1{R1,∗∈[lϵ,(l+1)ϵ)}e−H(x)dP⊗Nx(x)+log(2K/ϵ)N. (10)

In the above, is w.r.t. both and . We use concentration of measure to push the expectation over to the left of the maximum. Let

 Zl:=∫1{R1,∗∈[lϵ,(l+1)ϵ)}e−H(x)dP⊗Nx(x).

We show that each term concentrates about its expectation (in the randomness of ). Let denote the expectation w.r.t. .

###### Lemma 7.

There exists a constant such that for all and all ,

 E′eγ(Xl−E′[Xl])≤Kγ√NeKγ2/N.

Therefore, the expectation of the maximum concentrates as well:

 E′maxl(Xl−E′[Xl]) ≤1γlogE′exp(γmaxl(Xl−E′[Xl])) =1γlogE′maxleγ(Xl−E′[Xl]) ≤1γlogE′∑leγ(Xl−E′[Xl]) ≤1γlog(2KϵγK√Neγ2K/N) =log(2K/ϵ)γ+1γlogγK√N+γKN.

We set and obtain

 E′maxl(Xl−E′[Xl])≤log(K/ϵ)√N.

Therefore, plugging the above estimates into (10), we obtain

 FN ≤Ex∗maxlE′Xl+log(K/ϵ)√N+log(K/ϵ)N ≤Ex∗maxlΦϵ(lϵ,x∗)+2log(K/ϵ)√N.

Proof of Proposition 3. Let and consider a slightly modified interpolating Hamiltonian that has two parameters and :

 −Ht(x) :=∑i

where the ’s are i.i.d. standard Gaussian r.v.’s independent of everything else. Let

 φ(t):=1NElog∫1{R1,∗∈[m,m+ϵ)}e−Ht(x)dP⊗Nx(x),

where is over the Gaussian disorder and ( is fixed). Let be the corresponding Gibbs average, similarly to (6). By differentiation and Gaussian integration by parts,

 φ′(t)= +λ2E⟨(R1,∗−m)2⟩t−λ2m2−λ2N2N∑i=1E⟨(xix∗i)2⟩t,

Notice that by the overlap restriction, . Moreover, the last terms in the first and second lines in the above are of order since the variables are bounded. Next, since has non-negative sign (this is a crucial fact), we can ignore it and obtain an upper bound:

 φ′(t)≤−λ2m2+λ4q2+λ2ϵ2+KN.

Integrating over , we obtain

 Φϵ(m,x∗)≤−λ2m2+λ4q2+λ2ϵ2+φ(0)+KN.

Now we use a trivial upper bound on :

 φ(0) =1NElog∫1{R1,∗∈[m,m+ϵ)}e−H0(x)dP⊗Nx(x) ≤1NElog∫e−H0(x)dP⊗Nx(x) =1NN∑i=1ˆψ(λq,λmx∗i).

Hence,

 Φϵ(m,x∗)≤ˆF(λ,m,q,x∗)+λ2ϵ2+KN.

Proof of lemma 4. The random part of is the average of i.i.d. terms . Since , and , where is a bound on the support of , we have . For bounded and , the claim follows from concentration of the average of i.i.d. bounded r.v.’s.

Proof of Lemma 7. We notice that seen as a function of is Lipschitz with constant . By Gaussian concentration of Lipschitz functions (the Tsirelson-Ibragimov-Sudakov inequality [BLM13]), there exist a constant depending only on such that for all ,

 Pr(Xl−E′Xl≥t)≤e−Nt2/K.

Then we conclude by means of the identity

and integrate the tail.

Proof of Lemma 6. Let , and let be the symmetric part of , i.e., for all Borel . Observe that is absolutely continuous with respect to . The argument relies on a linearly interpolating between the two measures and . Let and let . Further, let be fixed, and

 φ±(t):=Ez∫(log∫exp(√rzx±rxx∗−r2x2)dρt(x))dρt(x∗),

where . Now let . We have on the one hand, and since is a symmetric distribution, on the other. We will show that is a convex increasing function on the interval . Then we deduce that . First, we have

 ddtφ+(t) =Ez∫log∫e√rzx+rxx∗−r2x2dρt(x) d(ν−μ)(x∗) +Ez∫∫e√rzx+rxx∗−r2x2d(ν−μ)(x)∫e√rzx+rxx∗−r2x2dρt(x) dρt(x∗),

and

 d2dt2φ+(t) =2Ez∫∫e√rzx+rxx∗−r2x2d(ν−μ)(x)∫e√rzx+rxx∗−r2x2dρt(x)d(ν−μ)(x∗) −2Ez∫⎛⎝∫e√rzx+rxx∗−r2x2d(ν−μ)(x)∫e√rzx+rxx∗−r2x2dρt(x)⎞⎠2dρt(x∗).

Similar expressions holds for where is replaced by inside the exponentials. We see from the expression of the first derivative at that . This is because is symmetric about the origin, so a sign change (of for the first term, and for the second term) does not affect the value of the integrals. Hence . Now, we focus on the second derivative. Observe that since is the symmetric part of , is anti-symmetric. This implies that the first term in the expression of the second derivative changes sign under a sign change in and keeps the same modulus. As for the second term, a sign change in induces integration against . Hence we can write the difference as

 ϕ′′(t) =4Ez∫∫e√rzx+rxx∗−r2x2d(ν−μ)(x)∫e√rzx+rxx∗−r2x2dρt(x) d(ν−μ)(x∗) −2Ez∫⎛⎝∫e√rzx+rxx∗−r2x2d(ν−μ)(x)∫e√rzx+rxx∗−r2x2dρt(x)⎞⎠2(dρt(x∗)−dρt(−x∗)).

For any Borel , we have . Therefore the second term in the above expression becomes

 −4tEz∫⎛⎝∫e√rzx+rxx∗−r2x2d(ν−μ)(x)∫e√rzx+rxx∗−r2x2dρt(x)⎞⎠2d(ν−μ)(x∗).

Since both and are absolutely continuous with respect to for all we can write

where the Gibbs average is with respect to the posterior of given under the Gaussian channel , and the expectation is under and . By the Nishimori property, we simplify the above expression to

 ϕ′′(t)=4(1−t)E⎡⎣⟨d(ν−μ)dρt(x)⟩2⎤⎦≥0,

where the expression is valid for all . From here we see that the function is convex on . Since , is also increasing on .

Acknowledgments. Florent Krzakala acknowledges funding from the ERC under the European Union 7th Framework Programme Grant Agreement 307087-SPARCS.

## References

• [ASS03] Michael Aizenman, Robert Sims, and Shannon L Starr. Extended variational principle for the Sherrington-Kirkpatrick spin-glass model. Physical Review B, 68(21):214403, 2003.
• [BAP05] Jinho Baik, Gérard Ben Arous, and Sandrine Péché.

Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices.

Annals of Probability, 33(5):1643–1697, 2005.
• [BDM16] Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Thibault Lesieur, and Lenka Zdeborová. Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula. In Advances in Neural Information Processing Systems (NIPS), pages 424–432, 2016.
• [BGN11] Florent Benaych-Georges and Raj Rao Nadakuditi. The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Advances in Mathematics, 227(1):494–521, 2011.
• [BKM17] Jean Barbier, Florent Krzakala, Nicolas Macris, Léo Miolane, and Lenka Zdeborová. Phase transitions, optimal errors and optimality of message-passing in generalized linear models. arXiv preprint arXiv:1708.03395, 2017.
• [BLM13] Stéphane Boucheron, Gábor Lugosi, and Pascal Massart. Concentration inequalities: A nonasymptotic theory of independence. Oxford university press, 2013.
• [BM17] Jean Barbier and Nicolas Macris. The stochastic interpolation method: A simple scheme to prove replica formulas in bayesian inference. arXiv preprint arXiv:1705.02780, 2017.
• [BMM17] Jean Barbier, Nicolas Macris, and Léo Miolane. The layered structure of tensor estimation and its mutual information. arXiv preprint arXiv:1709.10368, 2017.
• [DAM16] Yash Deshpande, Emmanuel Abbé, and Andrea Montanari. Asymptotic mutual information for the binary stochastic block model. In IEEE International Symposium on Information Theory (ISIT), pages 185–189. IEEE, 2016.
• [EKJ17] Ahmed El Alaoui, Florent Krzakala, and Michael I Jordan. Finite size corrections and likelihood ratio fluctuations in the spiked Wigner model. arXiv preprint arXiv:1710.02903, 2017.
• [FP95] Silvio Franz and Giorgio Parisi. Recipes for metastable states in spin glasses. Journal de Physique I, 5(11):1401–1415, 1995.
• [FP98] Silvio Franz and Giorgio Parisi. Effective potential in glassy systems: theory and simulations. Physica A: Statistical Mechanics and its Applications, 261(3-4):317–339, 1998.
• [GT02] Francesco Guerra and Fabio Lucio Toninelli. The thermodynamic limit in mean field spin glass models. Communications in Mathematical Physics, 230(1):71–79, 2002.
• [Gue01] Francesco Guerra. Sum rules for the free energy in the mean field spin glass model. Fields Institute Communications, 30:161–170, 2001.
• [Joh01] Iain M Johnstone. On the distribution of the largest eigenvalue in principal components analysis. Annals of Statistics, pages 295–327, 2001.
• [KM09] Satish Babu Korada and Nicolas Macris. Exact solution of the gauge symmetric p-spin glass model on a complete graph. Journal of Statistical Physics, 136(2):205–230, 2009.
• [KXZ16] Florent Krzakala, Jiaming Xu, and Lenka Zdeborová. Mutual information in rank-one matrix estimation. In Information Theory Workshop (ITW), pages 71–75. IEEE, 2016.
• [LKZ15] Thibault Lesieur, Florent Krzakala, and Lenka Zdeborová. Phase transitions in sparse PCA. In IEEE International Symposium on Information Theory (ISIT), pages 1635–1639. IEEE, 2015.
• [LM16] Marc Lelarge and Léo Miolane. Fundamental limits of symmetric low-rank matrix estimation. arXiv preprint arXiv:1611.03888, 2016.
• [LML17] Thibault Lesieur, Léo Miolane, Marc Lelarge, Florent Krzakala, and Lenka Zdeborová. Statistical and computational phase transitions in spiked tensor estimation. In IEEE International Symposium on Information Theory (ISIT), pages 511–515, June 2017.
• [Nis01] Hidetoshi Nishimori. Statistical physics of spin glasses and information processing: an introduction, volume 111. Clarendon Press, 2001.
• [Péc06] Sandrine Péché. The largest eigenvalue of small rank perturbations of Hermitian random matrices. Probability Theory and Related Fields, 134(1):127–173, 2006.
• [Péc14] Sandrine Péché. Deformed ensembles of random matrices. In Proceedings of the International Congress of Mathematicians, Seoul, pages 1059–1174. ICM, 2014.
• [RM14] Emile Richard and Andrea Montanari. A statistical model for tensor PCA. In Advances in Neural Information Processing Systems (NIPS), pages 2897–2905, 2014.
• [Tal11] Michel Talagrand. Mean field models for spin glasses. Volume I: Basic examples, volume 54. Springer Science & Business Media, 2011.