# High-temperature Expansions and Message Passing Algorithms

Improved mean-field technics are a central theme of statistical physics methods applied to inference and learning. We revisit here some of these methods using high-temperature expansions for disordered systems initiated by Plefka, Georges and Yedidia. We derive the Gibbs free entropy and the subsequent self-consistent equations for a generic class of statistical models with correlated matrices and show in particular that many classical approximation schemes, such as adaptive TAP, Expectation-Consistency, or the approximations behind the Vector Approximate Message Passing algorithm all rely on the same assumptions, that are also at the heart of high-temperature expansions. We focus on the case of rotationally invariant random coupling matrices in the `high-dimensional' limit in which the number of samples and the dimension are both large, but with a fixed ratio. This encapsulates many widely studied models, such as Restricted Boltzmann Machines or Generalized Linear Models with correlated data matrices. In this general setting, we show that all the approximation schemes described before are equivalent, and we conjecture that they are exact in the thermodynamic limit in the replica symmetric phases. We achieve this conclusion by resummation of the infinite perturbation series, which generalizes a seminal result of Parisi and Potters. A rigorous derivation of this conjecture is an interesting mathematical challenge. On the way to these conclusions, we uncover several diagrammatical results in connection with free probability and random matrix theory, that are interesting independently of the rest of our work.

## Authors

• 11 publications
• 1 publication
• 2 publications
• 74 publications
• 9 publications
• 67 publications
05/06/2021

### The replica-symmetric free energy for Ising spin glasses with orthogonally invariant couplings

We study the mean-field Ising spin glass model with external field, wher...
02/11/2020

### Asymptotic errors for convex penalized linear regression beyond Gaussian matrices

We consider the problem of learning a coefficient vector x_0 in R^N from...
02/03/2020

### Understanding the dynamics of message passing algorithms: a free probability heuristics

We use freeness assumptions of random matrix theory to analyze the dynam...
03/10/2021

### Mean-field methods and algorithmic perspectives for high-dimensional machine learning

The main difficulty that arises in the analysis of most machine learning...
12/14/2021

### The high-dimensional asymptotics of first order methods with random data

We study a class of deterministic flows in ℝ^d× k, parametrized by a ran...
06/11/2020

### Asymptotic Errors for Teacher-Student Convex Generalized Linear Models (or : How to Prove Kabashima's Replica Formula)

There has been a recent surge of interest in the study of asymptotic rec...
10/20/2021

### Simulating Ising and Potts models at critical and cold temperatures using auxiliary Gaussian variables

Ising and Potts models are an important class of discrete probability di...
##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

### 1.1 Background and overview of related works

Many inference and learning tasks can be formulated as a statistical physics problem, where one needs to compute or approximate the marginal distributions of single variables in an interacting model. This is, for instance, the basis behind the popular variational mean-field approach [WJ08]

. Going beyond the naive mean-field theory has been a constant goal in both physics and machine learning. One approach, for instance, has been very effective on tree-like structures: the Bethe approximation, or Belief-Propagation. Its development in the statistical physics of disordered systems can be traced back to Thouless-Anderson-Palmer (TAP)

[TAP77] and has seen many developments since then [MPV87, YFW03, MM09, ZK16]. Over the last decades, in particular, there has been many works on densely connected models, leading to a myriad of different approximation schemes. In many disordered problems with i.i.d. couplings, a classical approach has been to write the TAP equations as an iterative scheme. Iterative algorithms based on this scheme are often called Approximate Message Passing (AMP) [DMM09, KMS12] in this context.

AMP, or TAP, is an especially powerful approach when the coupling constants in the underlying statistical model are distributed as i.i.d. variables. This is, of course, a strong limitation and many inference schemes have been designed to improve on it: the adaptive TAP (adaTAP) method [OW01a, OW01b, HK13], approximation schemes such as Expectation-Consistency (EC) [Min01, OW05a] and the recent improvements of AMP such as Vector approximate Message Passing (VAMP) and its variants [MP17, RSF17, SRF16, OCW16, ÇOFW16]. Given all these approaches, one may wonder how different they are, and when they actually lead to asymptotically exact inference. In this paper, we wish to address this question using two main tools: high-temperature expansions and random matrix theory.

High-temperature expansions at fixed order parameters (denoted in this paper as “Plefka expansions”) are an important tool of the study of disordered systems. In the context of spin glass models, they have been introduced by Plefka [Ple82] for the Sherrington-Kirkpatrick (SK) model, and have been subsequently generalized, in particular by Georges-Yedidia [GY91]. This latter paper provides a systematic way to compute high-temperature (or high-dimension) expansions of the Gibbs free entropy for a fixed value of the order parameters (that is Plefka expansions).

One aim of the present paper is to apply this method to a general class of inference problems with pairwise interactions, in which the coupling constants are not i.i.d., but they can have strong correlations, while keeping a rotational invariance that will be made explicit below. In particular, we generalize earlier and inspirational work by Parisi and Potters [PP95], who computed the self-consistent equations for the marginals in Ising models with orthogonal couplings via a resummation of the infinite series given by the high-temperature expansion. We shall show that a similar resummation yields the EC, adaTAP and VAMP formalisms.

### 1.2 Structure of the paper, and summary of our contributions

In this paper, we perform Plefka expansions for a generic class of models of pairwise interactions with correlated matrices. We provide a detailed derivation of the method, inspired by the work of Georges-Yedidia [GY91] for Ising models, and we include new results on the diagrammatics of the expansions, leveraging rigorous results of random matrix theory. This yields a general framework that encapsulates many known properties of systems sharing this pairwise structure. The main message of this work is that the three successful approximation-schemes that have been developed in the last two decades, Expectation-Consistency, adaTAP or Vector Approximate Message Passing, are equivalent and rely on the same hidden hypothesis. A careful analysis of the Plefka expansion reveals this hypothesis, as it identifies the class of high-temperature expansion diagrams that are effectively kept in these three schemes. A careful diagrammatic analysis leads us to conjecture that all these methods are asymptotically exact for rotationally-invariant models, in the high-temperature phase. It is also worth noting that although all four methods (Expectation-Consistency, adaTAP, Vector Approximate Message Passing, Plefka expansion) lead to the same mean-field equations, the (most recent) VAMP approach presents the advantage of generating a “natural” way to iterate these equations in order to find a fixed point, which turns them into efficient algorithms. We now turn to a more precise description of the content of the paper. Throughout the paper, we will use two random matrix ensembles that we will both refer to as being rotationally invariant. The first one is defined as a measure over the set of symmetric matrices:

###### Model S (Symmetric rotationally invariant matrix).

Let . is generated as , in which is drawn uniformly from the (compact) orthogonal group , and is a random diagonal matrix, such that its empirical spectral distribution converges (almost surely) as

with compact support. The smallest and largest eigenvalue of

are assumed to converge almost surely to the infimum and supremum of the support of .

In a similar way, we define an ensemble of rectangular rotationally invariant matrices:

###### Model R (Rectangular rotationally invariant matrix).

Let , and such that as . is generated via its SVD decomposition , in which and are drawn uniformly from their respective orthogonal group. is a diagonal matrix, such that its empirical spectral distribution converges (almost surely) as to a probability distribution , which has compact support. The smallest and largest eigenvalue of are assumed to converge almost surely to the infimum and supremum of the support of .

##### Examples

Examples of such random matrix ensembles include matrices generated via a potential : one can generate with a probability density proportional to , and this kind of matrix satisfies the hypotheses of Model S. These ensembles also include the following well-known examples:

• The Gaussian Orthogonal Ensemble (GOE), in the case of Model S with a potential .

• The Wishart ensemble with a ratio . This corresponds to a random matrix , with an i.i.d. standard Gaussian matrix, and with . This ensemble satisfies Model S, with a potential .

• Standard Gaussian i.i.d. rectangular matrices, for Model R. One can also think of them as generated via a potential, as the probability density of such a matrix is .

• Generically, consider a random matrix from Model R. Then, both and satisfy the hypotheses of Model S.

The structure of our work is as follows:

• [itemsep=-2pt]

• Spherical models with rotationally invariant couplings In Sec. 2, we focus on spherical models and we generalize the seminal works of [MPR94a, MPR94b, PP95]. While they studied Ising models with orthogonal couplings, we consider spherical models, just assuming the coupling matrix to be rotationally invariant. We consider two types of models: “symmetric” models with an interaction of the type , in which follows Model S, and “bipartite” models with interactions of the type , in which follows Model R. This encapsulates orthogonal couplings, but can also be applied to other random matrix ensembles such as the Gaussian Orthogonal Ensemble (GOE), the Wishart ensemble, and many others. Using diagrammatic results that we derive with random matrix theory, we conjecture a resummation of the Plefka expansion giving the Gibbs free entropy in these models. Our results are in particular consistent with the findings of classical works for Gaussian couplings [Ple82] and orthogonal couplings [PP95].

• Plefka expansion for statistical models with correlated couplings Sec. 3 is devoted to the description of the Plefka expansion for different statistical models and inference problems which possess a coupling or data matrix that has rotation invariance properties. We consider models similar to the spherical models of Sec. 2, but with generic prior distributions on the underlying variables. In Sec. 3.1, we recall the Expectation-Consistency (EC), adaTAP and VAMP approximations and comment briefly on their respective history, before showing that they are equivalent. As a consequence, we will generically refer to these approximations as the Expectation-Consistency approximations (EC). We hope that our paper will help providing a unifying presentation of these works, generalizing them by leveraging random matrix theory. Our main conjecture for this part can be stated as the following:

###### Conjecture 1.

[Informal] For statistical models of symmetric or bipartite interactions with coupling matrices that satisfy respectively Model S or Model R, the three equivalent approximations, Expectation-Consistency, adaTAP and VAMP (generically denoted EC approximations), are exact in the large size limit in the high temperature phase.

We believe that the validity of the above conjecture extends beyond the high temperature phase. In particular that it is correct for inference problems in the Bayes-optimal setting, and more generally anytime the system is in a replica symmetric phase as defined in [MPV87].

The approximation behind EC approximations can be checked order by order using our high-temperature Plefka expansions technique and its resummation. We then derive Plefka expansions for these generic models, and we apply it to different situations, namely:

• In Sec. 3.2.1 we perform a Plefka expansion for a generic symmetric rotationally invariant model with pairwise interactions. Using this method and our diagrammatic results, we show then in Sec. 3.2.2 that the EC approximations are exact for these models in the large size limit.

• In Sec. 3.2.3 we apply our general result to the TAP free energy of the Hopfield model [Hop82]

, an Ising spin model with a correlated matrix of the Wishart ensemble, used as a basic model of neural network. In particular, we find back straightforwardly the results of

[NT97] and [Méz17].

• In Sec. 3.3 we extend our Plefka expansion and the corresponding diagrammatic techniques to the study of a replicated system, in which we constraint the overlap between different replicas. The interest for such systems comes as a consequence of the celebrated replica method of theoretical physics [MPV87].

• Finally, we show in Sec. 3.4 how we can use these results to derive the Plefka-expanded free entropy for a very broad class of bipartite models, which includes the Generalized Linear Models (GLMs) with correlated data matrices, and the Compressed Sensing problem.

We emphasize that we were able to derive the free entropy of all these models using very generic arguments relying only on the rotational invariance of the problem.

• The TAP equations and message passing algorithms Finally, we show in Sec. 4 that the TAP (or EC) equations that we derived by maximizing the Gibbs free entropy of rotationally invariant models can strikingly be understood as the fixed point equations of message passing algorithms. In the converse way, many message-passing algorithms can be seen as an iteration scheme of the TAP equations. This was known in many models in which the underlying data matrix was assumed to be i.i.d. For instance, the Generalized Approximate Message Passing (GAMP) algorithm [Ran11] was shown in [KMS12] to be equivalent to the TAP equations, a result that we find back in Sec. 4.1, while TAP equations were already iterated for Restricted Boltzmann Machines, see [TGM18]. In the Plefka expansion language, these results relied on the early stopping of the expansion at order (in powers of the couplings) as a consequence of the i.i.d. hypothesis. Using our resummation results to deal with the series at infinite orders, we were able to generalize these correspondences to correlated models. We argue that the stationary limit of the Vector Approximate Message Passing (VAMP) algorithm [RSF17] for compressed sensing with correlated matrices gives back our TAP equations derived via Plefka expansion, see Sec. 4.2. Even more generally, the Generalized Vector Approximate Passing (G-VAMP) algorithm [SRF16], defined for the very broad class of Generalized Linear Model with correlated matrices can be derived as an iteration of our Plefka-expanded TAP equations, see Sec. 4.3. Combined with the results of Sec. 3, this shows that the VAMP algorithm is an example of an approximation scheme that follows Conjecture 1.

• Diagrammatics of the expansion and random matrix theory Our results are largely based on a better control on the diagrammatics of the Plefka expansions for rotationally invariant random matrices, which are presented in Sec. 5. We leverage mathematically rigorous results on Harish-Chandra-Itzykson-Zuber (HCIZ) integrals [HC57, IZ80, GM05, CŚ07], involving transforms of the asymptotic spectrum of the coupling matrix, to argue that only a very specific class of diagrams contributes to the high-temperature expansion of a system with rotationally invariant couplings. These results are used throughout our study, and are detailed in Sec. 5. Some generalizations are postponed to Appendix D.

## 2 Symmetric and bipartite spherical models with rotationally-invariant couplings

In this section we consider two spherical models that will serve both as guidelines and building blocks for our subsequent analysis. We show in details how to perform the Plefka-Georges-Yedidia high-temperature expansion in this context, and the precise diagrammatic results that allow us to resum the Plefka series for rotationally invariant couplings. These results will be useful to clarify our subsequent derivation of the TAP equations in more involved models, and are also interesting by themselves from a random matrix theory point of view.

### 2.1 Symmetric spherical model

In this section , , and we define the following pairwise interaction Hamiltonian on , the -th dimensional sphere of radius :

 HJ(x) =−12x⊺Jx=−12∑1≤i,j≤NJijxixj,x∈SN−1(σ√N). (1)

The coupling matrix is a symmetric random matrix drawn from Model S.

#### 2.1.1 Direct free entropy computation

The Gibbs measure for our model at inverse temperature is defined as:

 Pβ,J(dx) ≡1Zβ,Jeβ2∑i,jJijxixjdx, (2)

in which is the usual surface measure on the sphere . We write the partition function of the model introducing a Lagrange multiplier to enforce the condition . We will write to denote that . At leading exponential order, one has:

 Zβ,J ≡∫SN−1(σ√N)d%xeβ2∑i,jJijxixj, (3) ≃∫dγ N∏i=1∫Rdxi eβ2∑i,jJijxixj+γ2(Nσ2−∑ix2i), ≃exp[supγ{log[∫N∏i=1dxi eβ2∑i,jJijxixj+γ2(Nσ2−∑ix2i)]}]. (4)

Denoting the solution to the saddle-point equation in eq. (4), we have effectively defined a new Gibbs measure:

 Pβ,J(x) ≡1Zβ,J(γ)eβ2∑i,jJijxixje−γ(β)2||x||22dx, (5)

where now is the usual Euclidian measure on . Following [KTJ76] we diagonalize the Hamiltonian and we integrate over the spins in this new basis, which yields:

 Zβ,J≃exp[supγ{N2(log2π+γσ2−1N∑λlog(γ−βλ))}], (6)

in which the sum over runs over the set of eigenvalues of . Taking the limit, the saddle point equation reads:

 limN→∞1N∑λ1γ−βλ=σ2, (7)

which we can write as a function of the limiting spectral law of the matrix (defined in Model S):

 ∫ρD(dλ)γ−βλ=σ2. (8)

We assumed (see Model S) that the support of is compact so that we can define its maximum . Under these assumptions, eq. (8) has the solution:

 γ=βRρD(βσ2)+1σ2=βS−1ρD(−βσ2), (9)

as long as , where is the -transform of and its Stieltjes transform (see Appendix C for their definitions). In the opposite case (if ), ‘sticks’ to the solution . The intensive free entropy is defined as:

 ΦJ(β)≡limN→∞1NlogZβ,J. (10)

In the end, we can compute the free entropy in the high-temperature phase :

 ΦJ(β) =12(1+log2πσ2)+βσ22RρD(βσ2)−12∫ρD(dλ)log[βσ2RρD(βσ2)−βσ2λ+1]. (11)

By taking the derivative of this expression with respect to it is easy to show that this simplifies to:

 ΦJ(β)=12(1+log2πσ2)+12∫βσ20RρD(x)dx. (12)

In the low temperature phase (for ) one has

 ΦJ(β)=12(log2π+λmaxβσ2−logβ−∫ρD(dλ) log(λmax−λ)). (13)

Note that both in the high and low temperature phases the free entropy can formally be expressed as:

 ΦJ(β)=12log2π+12supγ[γσ2−∫ρD(dλ)log(γ−βλ)], (14)

a formulation which is both more compact and easier to implement algorithmically for generic matrices .

##### Remark

The free entropy is usually defined as an average over the quenched disorder , but here it is clear that the free entropy is self-averaging as a function of , so that taking this average is trivial. Moreover, only depends on via , its asymptotic eigenvalue distribution.

##### Remark

The derivation of the free entropy both in the high and low temperature phase has been made rigorous in [GM05], and the method of proof also essentially consists in fixing a Lagrange multiplier to enforce the condition .

#### 2.1.2 Plefka expansion and the Georges-Yedidia formalism

A more generic way to compute the free entropy is to follow the formalism of [GY91] to perform a high-temperature Plefka expansion [Ple82]. The goal is to expand the free entropy at low , in the high-temperature phase. In order to do so, we introduce the very useful operator defined in Appendix A of [GY91]. We will compute the free entropy given the constraints on the means

and on the variances

. The notation indicates an average over the Gibbs measure of our system at inverse temperature , see eq. (5). A set of parameters will thus determine a free entropy value, and the comparison with the direct calculation of Sec. 2.1.1 will be made by maximizing the free entropy with respect to . We can enforce the spherical constraint by constraining our choice of parameters to satisfy the identity:

 σ2=1NN∑i=1[vi+m2i]. (15)

The Lagrange parameters introduced to fix the magnetizations are denoted , and the ones used to fix the variances are denoted . For clarity we will keep their dependency on explicit only when needed. For a given and a given one defines the operator of Georges-Yedidia:

 (16)

The derivation of as well as its (many) useful properties are briefly recalled in Appendix A. We are now ready to compute the first orders of the expansion of the free entropy in terms of . In this expansion the Lagrange parameters are always considered at , so we drop their -dependency. We detail the first orders of the expansion, following Appendix A (cft. Appendix A of [GY91]).

##### Order 0

First of all, taking one has easily:

 ΦJ(β=0) =12NN∑i=1γi(vi+m2i)+1NN∑i=1λimi+1Nlog∫RNe−12∑iγix2i−∑iλixidx% , =12log2π+1NN∑i=1[γi2(vi+m2i)−12logγi+λimi+λ2i2γi].

This yields after extremization over :

 ΦJ(β=0)=12[1+log2π]+12NN∑i=1logvi. (17)
##### Order 1

At order , one easily derives:

 (18)

We can now make use of the Maxwell-type relations which are valid at any :

 γi(β) =2N∂ΦJ(β)∂vi, (19) miγi(β)+λi(β) =N∂ΦJ(β)∂mi. (20)

These relations plugged in eq. (18) lead to and . We then obtain the operator at from eq. (16):

 U(β=0,J) =−12∑i≠jJij(xi−mi)(xj−mj). (21)
##### Order 2

Following eq. (190) in Appendix A, we have the relation:

 12(∂2ΦJ∂β2)β=0 (22)
##### Order 3 and 4

For the order , we obtain:

 13!(∂3ΦJ∂β3)β=0 (23)

in which the sum is made over pairwise distinct indices. Applying eq. (A) we reach:

 14!(∂4ΦJ∂β4)β=0 =18N∑i,j,k,lJijJjkJklJlivivjvkvl+ON(1), (24)

where again, are pairwise distinct indices. For pedagogical purposes (and since it will be useful for the following sections), we detail this calculation in Appendix B.

##### Larger orders

By its very nature, the perturbative expansion of Georges-Yedidia [GY91] can not (somehow disappointingly) give an analytic result for an arbitrary perturbation order . However, the results up to order of eqs. (17), (18), (22), (23), (24) lead to the following natural conjecture for the free entropy at a given realization of the disorder:

 ΦJ(β) =12[1+log2π]+12NN∑i=1logvi+β2N∑i≠jJijmimj +1N∞∑p=1βp2p∑i1,⋯,ippairwise distincts Ji1i2Ji2i3⋯Jip−1ipJipi1p∏α=1viα+ON(1). (25)

Note that in order to obtain this formula, we took the limit at every perturbation order in , which is part of the implicit assumptions of the Plefka expansion. The terms of this perturbative expansion can be represented diagrammatically as simple cycles of order , see Fig. 0(a).

In general, at any order in the expansion one can construct a diagrammatic representation of the contributing terms, and one expects that only strongly irreducible diagrams contribute to the free entropy. Strongly irreducible diagrams are those that cannot be split into two pieces by removing a vertex [GY91] (examples are given in Fig. 0(a) and 0(b)). However we retain only simple cycles as the one depicted in Fig. 0(a) because other diagrams as in Fig. 0(b) are negligible when for rotationally invariant models, as we argue in Section 5.2. For the case of orthogonal couplings, this dominance of simple cycles was already noted in [PP95]. On the other hand, generic cactus diagrams like the one pictured in Fig. 0(c) are not negligible, but they cancel out and do not appear in the final form of the expansion (at order , this is shown in Appendix  B).

We shall now prove the dominance of simple cycles, and the correctness of eq. (2.1.2), in the high-temperature phase. In this phase, the solution to the maximization of eq. (2.1.2) under is the paramagnetic solution . Furthermore, we expect that the that maximize the free entropy of eq. (2.1.2) are homogeneous, that is . The constraint of eq. (15) thus gives .

We can compare the result of the resummation of simple cycles, eq. (2.1.2) with the exact results of eq. (12) in the paramagnetic phase. For these two results to agree, we need the generating function for simple cycles to be related to the -transform of by:

 E⎡⎢ ⎢ ⎢⎣1N∞∑p=1βpσ2p2p∑i1,⋯,ippairwise distincts Ji1i2Ji2i3⋯Jip−1ipJipi1⎤⎥ ⎥ ⎥⎦ =12∫βσ20RρD(x)dx, (26)

in which the outer expectation is with respect to the distribution of . In particular, an order-by-order comparison yields that the free cumulants (see Appendix C for their definition) must satisfy:

 ∀p∈N⋆,cp(ρD) =limN→∞E⎡⎢ ⎢ ⎢⎣1N∑i1,⋯,ippairwise distincts Ji1i2Ji2i3⋯Jip−1ipJipi1⎤⎥ ⎥ ⎥⎦. (27)

Using rigorous results of [GM05], we were able to prove a stronger version of eq. (27), namely convergence in norm, so we state it as a theorem:

###### Theorem 1.

For a matrix generated by Model S, one has for every :

 limN→∞E∣∣ ∣ ∣ ∣∣1N∑i1,⋯,ippairwise distincts Ji1i2Ji2i3⋯Jip−1ipJipi1−cp(ρD)∣∣ ∣ ∣ ∣∣2 =0.

We postpone the proof to Sec. 5. We assume that we can invert the summation over and the limit in eq. (2.1.2), so Theorem 1 implies that eq. (26) is true not only in expectation but that we can write:

 limN→∞1N∞∑p=1βpσ2p2p∑i1,⋯,ippairwise distincts Ji1i2Ji2i3⋯Jip−1ipJipi1 =12∫βσ20RρD(x)dx, (28)

in which the limit here means convergence in norm as . This is important, as it allows to “resum” the free entropy of eq. (2.1.2), which is valid for a given instance of . As a final note, we can use the results of Sec. 2.1.1 to write this result in an alternative form (dropping terms):

 1N∞∑p=1βpσ2p2p∑i1,⋯,ippairwise distincts Ji1i2⋯Jipi1 =12supγ[γσ2−∫ρD(dλ)log(γ−βλ)]−1+logσ22. (29)

#### 2.1.3 Stability of the paramagnetic phase

If Theorem 1 implies that our Plefka expansion is exact up to , we can actually check whether the paramagnetic solution is stable exactly up to the same critical temperature . Recall that in this model we do not optimize the free entropy simultaneously over and the , because the norm is fixed, yielding the constraint . Solely as a function of the , the free entropy therefore reads, up to terms:

 ΦJ(β) =1+log2π2+12log[σ2−1NN∑i=1m2i]+β2N∑i≠jJijmimj+GρD(β[σ2−1NN∑i=1m2i]), (30)

in which is the integrated -transform of , see Appendix C for its definition. The Hessian of the extensive free entropy at the paramagnetic solution is:

 N(∂2ΦJ∂mi∂mj)m=0 =−δijσ2[1+βσ2RρD(βσ2)]+βJij+ON(1). (31)

The paramagnetic solution is stable as long as the Hessian is a negative matrix. This is true as long as , because at the spectrum of touches zero. For the Hessian is again negative, giving the impression of stability of the paramagnetic phase, however is evaluated in the non physical solution, so the solution has to be discarded. Our Plefka expansion allows thus to compute the free entropy in the whole paramagnetic phase, coherently with the results of Sec. 2.1.1.

### 2.2 Bipartite spherical model

In this section we consider . We let , and we will take the limit (sometimes referred to as the thermodynamic limit) in which with a fixed ratio . We let . Let us consider the following Hamiltonian, which is a function of two fields and :

 HL(h,x) =−h⊺Lx=−M∑μ=1N∑i=1Lμihμxi,h∈SM−1(σh√M),x∈SN−1(σx√N). (32)

The coupling matrix is assumed to be drawn from Model R.

#### 2.2.1 Direct free entropy computation

The calculation for this bipartite case is very similar to the calculation performed in Sec. 2.1.1, although one can not always express the result as a well-known transform of the measure of Model S. For all values of , the result can be expressed as:

 ΦL(β) ≡limN→∞1Nlog∫dh∫dxeβh⊺Lx, =1+α2log2π+12supγh,γx[αγhσ2h+γxσ2x−(α−1)logγh−∫ρD(dλ)log(γxγh−β2λ)] , (33)

where is the asymptotic eigenvalue distribution of (see the definition of Model R).

#### 2.2.2 Plefka expansion

The Plefka expansion for this model is a straightforward generalization of Sec. 2.1.2. We will fix the averages to be and

, and the second moments

and , again with the constraints and . In this problem, the operator of [GY91] at is given by:

 U(β=0,L) =−∑μ,iLμi(hμ−mhμ)(xi−mxi). (34)

Once again, as in Sec. 2.1.2, one can study all the diagrams that appear in the Plefka expansion. We show again the concentration of the simple cycles, and the negligibility of other strongly irreducible diagrams that can be constructed from the rectangular matrix. We state in more details these results for the bipartite case in Sec. 5.5.1. We obtain the following result, a counterpart to eq. (2.1.2) for this bipartite model:

 ΦL(β) =1+α2[1+log2π]+α2MM∑μ=1logvhμ+12NN∑i=1logvxi+βNM∑μ=1N∑i=1Lμimhμmxi (35) +1N∞∑p=1β2p2p∑μ1,⋯,μppairwise distincts ∑i1,⋯,ippairwise distincts Lμ1i1Lμ1i2Lμ2i2⋯LμpipLμpi1p∏α=1vhμαvxiα+ON(1),

in which indices run from to and indices run from to . We make again an assumption of uniform variances at the maximum: . Comparing to eq. (33) in the paramagnetic phase, we obtain the correspondence, similar to eq. (28) and valid a priori for any given realization of , in the high temperature phase:

 α2logσ2h +12logσ2x+1N∞∑p=1β2pσ2phσ2px2p∑μ1,⋯,μppairwise distincts ∑i1,⋯,ippairwise distincts Lμ1i1Lμ1i2Lμ2i2⋯LμpipLμpi1 (36)

## 3 Plefka expansion and Expectation Consistency approximations

In this section we perform Plefka expansions for generic models of pairwise interactions, both symmetric and bipartite. In Sec. 3.1 we recall some known facts on the Expectation Consistency (also called Expectation Propagation [Min01]), adaTAP and VAMP approximations to compute the free entropy of such models. In Sec. 3.2 and Sec. 3.4 we generalize the results of the Plefka expansions of Sec. 2 to these models and highlight the main differences and assumptions of our method. This yields a very precise and systematic justification of the TAP equations for rotationally invariant models. We apply these results to retrieve the TAP free entropy of the Hopfield model, Compressed Sensing, as well as different variations of high-dimensional inference models called Generalized Linear Models (GLMs). Sec. 3.3 is devoted to the study of a generic replicated system using these approximations, and the Plefka expansion. We show that they can be used in the celebrated replica method [MPV87] of theoretical physics, to compute the Gibbs free entropy of a generic pairwise inference model.

### 3.1 Expectation Consistency, adaptive TAP, and Vector Approximate Message Passing approximations

Expectation Consistency (EC) [OW05a, OW05b], is an approximation scheme for a generic class of disordered systems that can also be applied to many inference problems. In this section we show how this scheme is derived and is closely related to the adaTAP approximation [OW01a, OW01b], and the VAMP approximation [RSF17]. Let us shortly comment on the history of these methods. The adaTAP scheme was developed and presented in in [OW01b, OW01a], and was discussed in details in the review [OS01] for systems close to the SK model. The same year, Thomas Minka’s Expectation Propagation (EP) approach was presented [Min01]. Opper and Winther used an alternative view of local-consistency approximations of the EP–type which they call Expectation Consistent (EC) approximations in [OW05a, OW05b], effectively rederiving their adaTAP scheme from this new point of view. The VAMP approach is more recent [SRF16], and is again another take EP for a different problem (compressed sensing) but it has the advantage that, compared with other EP-like approaches [ÇOFW16] it leads to a practical converging algorithm, and a rigorous treatement of its time evolution. The connection between these approaches and the Parisi-Potters formulation for inference problems [JR16] was hinted several times for SK-like problems, see e.g. [OCW16, ÇO19]. We hope that our paper will help providing a unifying presentation of these works, generalizing them way beyond the SK model alone by leveraging random matrix theory. We recall briefly the main arguments of these papers which are useful for our discussion.

#### 3.1.1 Expectation Consistency approximation

Consider a model in which the density of a vector is given by a probability distribution of the form:

 P(x) =1ZP0(x)PJ(x). (37)

Such distributions typically appear in Bayesian approaches to inference problems. We will use the Bayesian language and denote as a prior distribution on x, which will be typically factorized (all the components of x are assumed to be independent under ) ; The distribution is responsible for the interactions between the . In this paper we are interested in pairwise interactions, which means that the of is a quadratic form in the variables. An example of such a model is the infinite-range Ising model of statistical physics at inverse temperature , with a binary prior and a quadratic interaction governed by a coupling matrix . In this specific model, we have:

 P0(x) =N∏i=1[12cosh(βhi)(δ(xi−1)e−βhi+δ(xi+1)eβhi)], (38) PJ(x) =exp{β2∑i,jJijxixj}, (39)

for some . Our goal is to compute the large limit of the free entropy in the model of eq. (37). Each of the two distributions and allows for tractable computations of physical quantities (like averages), but the difficulty arises when considering their product. The idea behind EC is to simultaneously approximate and by a tractable family of distributions. For the sake of the presentation we will consider the family of Gaussian probability distributions, although this can be generalized to different families, see the general framework of [OW05a]. We define the first approximation as:

 μ0(x) ≡1Z0(Γ0,λ0)P0(% x)e−12x⊺Γ0x+λ0⊺x. (40)

Here, the parameter is a symmetric positive matrix and is a vector. We will denote the averages with respect to . We can write the trivial identity:

 Z

The idea of EC is to replace, when one computes the average , the distribution

by an approximate Gaussian distribution, that we can write as:

 μS(x) ≡1ZSe−12x⊺(ΓJ+Γ0)x+(λ0+λJ)⊺x. (41)

Performing this replacement yields the expectation-consistency approximation to the free entropy:

 logZEC(Γ0,ΓJ,λ0,λJ) =log[∫dxP0(x)e−12x⊺Γ0x+λ0⊺x]+log[∫dxPJ(x)e−12x⊺ΓJx+λJ⊺x] −log[∫dxe−12x⊺(Γ0+ΓJ)x+(λ0+λJ)⊺x]. (42)

Note that all three parts of this free entropy are tractable. In order to symmetrize the result we can define a third measure:

 μJ(x) ≡1ZJ(ΓJ,λJ)PJ(% x)e−12x⊺ΓJx+λJ⊺x. (43)

The final free entropy should not depend on the values of the parameters, so we expect that the best values for make stationary. This is a strong hypothesis, and the reader can refer to [OW05a] for more details and justifications. This yields the Expectation Consistency conditions, giving their name to the procedure:

 (44)

#### 3.1.2 Adaptive TAP approximation

The adaptive TAP approximation (or adaTAP) [OW01a, OW01b] provides an equivalent way to derive the free entropy of eq. (42) for models with pairwise interactions. Let us briefly sketch its derivation and the main arguments behind it. We follow the formulation of [HK13] and we consider again the infinite-range Ising model of eq. (38). The extensive Gibbs free entropy at fixed values of the magnetizations and can be written using Lagrange parameters: a vector and a symmetric matrix .

 Φ(β,m,v) =extrλ,Γ[−λ⊺m+12∑i,jΓij(vij+mimj)+log∫dxP0(x)eβ2x⊺Jx−12x⊺Γx+λ⊺x]. (45)

The adaTAP approximation consists in writing:

 NΦ(β,m,v) =Φ(0,m,v)+∫β0dl∂Φ(l,m,v)∂l, ≃Φ(0,m,v)+Φg(β,m,% v)−Φg(0,m,v). (46)

In this expression, denotes the free entropy of the same system, but where the spins have a Gaussian statistics. The idea behind the adaTAP approximation is as follows. The derivative is an expectation of a sum over a large number of terms; therefore it is reasonable to assume that this expectation is the same as if the underlying variables were Gaussian. This assumption of adaTAP, although reasonable, is a priori hard to justify more rigorously and systematically. It is important to notice that the free entropy (46) of adaTAP is equivalent to the one derived using Expectation Consistency in eq. (42). Indeed, using Lagrange parameters we can write the three terms of eq. (46) as: