Generalized Approximate Survey Propagation for High-Dimensional Estimation

05/13/2019
by   Luca Saglietti, et al.
0

In Generalized Linear Estimation (GLE) problems, we seek to estimate a signal that is observed through a linear transform followed by a component-wise, possibly nonlinear and noisy, channel. In the Bayesian optimal setting, Generalized Approximate Message Passing (GAMP) is known to achieve optimal performance for GLE. However, its performance can significantly degrade whenever there is a mismatch between the assumed and the true generative model, a situation frequently encountered in practice. In this paper, we propose a new algorithm, named Generalized Approximate Survey Propagation (GASP), for solving GLE in the presence of prior or model mis-specifications. As a prototypical example, we consider the phase retrieval problem, where we show that GASP outperforms the corresponding GAMP, reducing the reconstruction threshold and, for certain choices of its parameters, approaching Bayesian optimal performance. Furthermore, we present a set of State Evolution equations that exactly characterize the dynamics of GASP in the high-dimensional limit.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

page 19

page 20

12/08/2021

Estimation in Rotationally Invariant Generalized Linear Models via Approximate Message Passing

We consider the problem of signal estimation in generalized linear model...
07/01/2020

Multi-Layer Bilinear Generalized Approximate Message Passing

In this paper, we extend the bilinear generalized approximate message pa...
08/10/2017

Phase Transitions, Optimal Errors and Optimality of Message-Passing in Generalized Linear Models

We consider generalized linear models (GLMs) where an unknown n-dimensio...
07/03/2018

Approximate Survey Propagation for Statistical Inference

Approximate message passing algorithm enjoyed considerable attention in ...
12/08/2020

Construction of optimal spectral methods in phase retrieval

We consider the phase retrieval problem, in which the observer wishes to...
07/04/2019

Bayesian Nonlinear Function Estimation with Approximate Message Passing

In many areas, massive amounts of data are collected and analyzed in ord...
12/21/2018

Marvels and Pitfalls of the Langevin Algorithm in Noisy High-dimensional Inference

Gradient-descent-based algorithms and their stochastic versions have wid...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Approximate message passing (AMP) algorithms have become a well established tool in the study of inference problems (Donoho et al., 2009; Donoho & Montanari, 2016; Advani & Ganguli, 2016) that can be represented by dense graphical models. An important feature of AMP is that its dynamical behavior in the large system limit can be exactly predicted through a dynamical system involving only scalar quantities called State Evolution (SE) (Bayati & Montanari, 2011). This relationship paved the way for a series of rigorous results (Rangan & Fletcher, 2012; Deshpande & Montanari, 2014; Deshpande et al., 2016). It also helps clarify the connection to several fascinating predictions obtained through the replica analysis in statistical physics (Mézard et al., 1987). In the optimal Bayesian setting, where one has perfect information on the process underlying data generation, AMP has been empirically shown to achieve optimal performances among polynomial algorithms for many different problems. However, in the more realistic case of mismatch between the assumed and the true generative model, i.e. when AMP is not derived on the true posterior distribution, it may become sub-optimal. A possible source of problems for the AMP class of algorithms is the outbreak of Replica Symmetry Breaking (Mézard et al., 1987), a scenario where an exponential number of fixed point and algorithmic barriers dominate the free energy landscape explored by AMP. This phenomena can be accentuated in case of model mismatch: a notable example is maximum likelihood estimation (as opposed to estimation by the posterior mean, which corresponds to the low temperature limit of a statistical physics model.

These considerations are well known within the physics community of disordered systems (Krzakala et al., 2016), where the problem of signal estimation is informally referred to as “crystal hunting”. Estimation problems in high dimensions are characterized by a complex energy-entropy competition where the true signal is hidden in a vast and potentially rough landscape. In a wide class of problems, one observes the presence of a algorithmically “hard” phase for some range of values for the parameters defining the problem (e.g. signal-to-noise ration). In this regime, all known polynomial complexity algorithms fail to saturate the information theoretic bound (Ricci-Tersenghi et al., 2019). While reconstruction is possible in principle, algorithms are trapped in a region of the configuration space with low overlap with the signal and many local minima (Antenucci et al., 2019a; Ros et al., 2019).

In a recent work (Antenucci et al., 2019b), a novel message-passing algorithm, Approximate Survey Propagation (ASP), was introduced in the context of low-rank. The algorithm is based on the 1-step Replica Symmetry Breaking (1RSB) ansatz from spin glass theory (Mézard et al., 1987), which was specifically developed to deal with landscapes populated by exponentially many local minima. It was shown that ASP on the mismatched model could reach the performance of (but not improve on) matched AMP and do far better than mismatched AMP (Antenucci et al., 2019a, b). In the present paper, we build upon these previous works and derive the ASP algorithm for Generalized Linear Estimation (GLE) models. Since the extension of AMP to GLE problems is commonly known as GAMP, we call Generalized Approximate Survey Propagation (GASP), our extension of ASP. We will show that also in this case, in presence of model mismatch, (G)ASP improves over the corresponding (G)AMP.

2 Model specification

An instance of the general class of models to which GASP can be applied is defined, for some integer and , by an observed signal and an observation matrix . Clearly, this scenario encompasses also GLE. We denote with , the rows of and refer to the ratio as the sampling ratio of

. We consider a probability density distribution

on a (possibly discrete) space , , defined as:

(1)

where, following statistical physics jargon, plays the role of an inverse temperature, is a normalization factor called partition function (both and implicitly depend on and ), and is the Hamiltonian of the model, that in our setting takes the form:

(2)

Here denotes the scalar product and we call and

the loss function and the regularizer of the problem respectively.

In this quite general context, the purpose of GASP is to approximately compute the marginal distribution , along with some expected quantities such as e.g. . The approximation entailed in GASP turns out to be exact under some assumptions in the large limit, as we shall later see. A crucial assumption in the derivation of the GASP algorithm (and of GAMP as well), is that the entries of

are independently generated according to some zero mean and finite variance distribution.

Although the general formulation of GASP, presented in Sec. 2 of the SM, is able to deal with any model of the form (1), we will here restrict the setting to discuss Generalized Linear Estimation (Rangan, 2011).

In GLE problems, is sensibly chosen in order to infer a true signal , whose components are assumed to be independently extracted from some prior , . The observations are independently produced by a (probabilistic) scalar channel : .

It is then reasonable to choose , and , so that the probability density corresponds to the true posterior , where denotes equality up to a normalization factor. We refer to this setting as to the Bayesian-optimal or matched setting (Barbier et al., 2018). Notice that in the limit concentrates around the maximum-a-posteriori (MAP) estimate. If or if the Hamiltonian doesn’t correspond to the minus log posterior (e.g, when and used in the Hamiltonian do not correspond to true ones) we talk about model mismatch.

As a testing ground for GASP, and the corresponding State Evolution, we here consider the phase retrieval problem, which has undergone intense investigation in recent years (Candes et al., 2015; Dhifallah & Lu, 2017; Chen et al., 2018; Goldstein & Studer, 2018; Mondelli & Montanari, 2018; Sun et al., 2018; Mukherjee & Seelamantula, 2018). We examine its noiseless and real-valued formulation, where observations are generated according to the process

(3)
(4)
(5)

for some and , such that . For such generative model, we will focus on the problem of recovering by minimizing the energy function of Eq. (2), in the case

(6)
(7)

Since the setting assumed for inference corresponds to MAP estimation in presence of a noisy channel, we are dealing with a case of model mismatch. The effect of the parameter on the estimation shall be explored in Sec. 7, but we assume until then. The optimization procedure will be performed using the zero-temperature (i.e. ) version of the GASP algorithm.

3 Previous work on Approximate Message Passing for Phase Retrieval

Generalized approximate message passing (GAMP) was developed and rigorously analyzed in Refs. (Rangan, 2011) and (Javanmard & Montanari, 2013). It was then applied for the first time to the (complex-valued) phase retrieval problem in Ref. (Schniter & Rangan, 2015). In Ref. (Barbier et al., 2018) the authors report an algorithmic threshold for the perfect recovery of , when using matched AMP on the real-valued version of the problem. This is to be compared to the information theoretic bound .

The performance of GAMP in the MAP estimation setting, instead, was investigated in Ref. (Ma et al., 2018, 2019). A “vanilla” implementation of the zero temperature GAMP equations for the absolute value channel was reported to achieve perfect recovery for real-valued signals above . The authors were able to show that the algorithmic threshold of GAMP in the mismatched case can however be drastically lowered by introducing regularization a regularization term ultimately continued to zero. The AMP.A algorithm proposed in (Ma et al., 2018, 2019) uses an adaptive regularization that improves the estimation threshold and also makes the algorithm more numerically robust compensating a problematic divergence that appears in the message-passing equations (see Sec. 1.3 in the SM for further details).

Another important ingredient for AMP.A’s performance is initialization: in order to achieve perfect recovery one has to start from a configuration that falls within the basin of attraction of the true signal, which rapidly shrinks as the sampling ratio decreases. A well-studied method for obtaining a configuration correlated with the signal is spectral initialization, introduced and studied in Refs. (Jain et al., 2013; Candes et al., 2015; Chen & Candes, 2015)

: in this case the starting condition is given by the principal eigenvector of a matrix obtained from the data matrix

and the labels passed through a nonlinear processing function. The asymptotic performance of this method was analyzed in (Lu & Li, 2017), while the form of the optimal processing function was described in (Mondelli & Montanari, 2018; Luo et al., 2019). However, since the SE description is based on the assumption of the initial condition being uncorrelated with the data, in AMP.A the authors revisited the method, proposing a modification that guarantees “enough independency” while still providing high overlap between the starting point and the signal.

With the combination of these two heuristics, AMP.A is able to reconstruct the signal down

. In the present paper we will show that, with a basic continuation scheme, the 1RSB version of the zero temperature GAMP can reach the Bayes-optimal threshold also in the mismatched case, without the need of spectral initialization.

3.1 GAMP equations at zero temperature

Here we provide a brief summary of the AMP equations for the general graphical model of Eq. (1), in the limit. This is both to allow an easy comparison with our novel GASP algorithm and to introduce some notation that will be useful in the following discussion. There is some degree of model dependence in the scaling of the messages when taking the zero-temperature limit: here we adopt the one appropriate for over-constrained models in continuous space. Details of the derivation can be found in Sec. 1 of the SM, along with the specialization of the equations for phase retrieval.

First, we introduce two free entropy functions associated to the input and output channels  (Rangan, 2011):

(8)
(9)

We define for convenience and . In our notation the GAMP message passing equations read:

(10)
(11)
(12)
(13)
(14)
(15)
(16)
(17)

where . It is clear from the equations that the two free entropy functions are supposed to be twice differentiable. This is not the case for phase retrieval, where GAMP encounters some non-trivial numerical stability issues: during the message-passing iterations one would have to approximately evaluate an empirical average of , containing Dirac’s -function. This is the problem encountered in AMP.A of Ref. (Ma et al., 2018). We will see that this problem is not present in GASP thanks to a Gaussian smoothing of the denoising function.

4 Generalized Approximate Survey Propagation

The (G)ASP algorithm builds on decades of progress within the statistical physics community in understanding and dealing with rough high-dimensional landscapes. The starting point for the derivation of the algorithm is the partition function of replicas (or clones) of the system :

(18)

Note that, while this probability measure factorizes trivially, setting can introduce many important differences with respect to the standard case, both from the algorithmic and from the physics standpoints (Monasson, 1995; Antenucci et al., 2019b).

We write down the Belief Propagation (BP) equations associated to the replicated factor graph, where messages are probability distributions associated to each edge over the single-site replicated variables

. We make the assumption that the messages are symmetric under the group of replica indexes permutations. This allows for a parametrization of the message passing that can be continued analytically to any real value of

. The resulting algorithm goes under the name of 1RSB Cavity Method or, more loosely speaking, of Survey Propagation (with reference in particular to a zero temperature version of the 1RSB cavity method in discrete constraint satisfaction problems), and led to many algorithmic breakthroughs in combinatorial optimization on sparse graphical models

(Mézard et al., 2002; Braunstein et al., 2005; Krzakała et al., 2007)

. One possible derivation of the (G)ASP algorithm is as the dense graph limit of the Survey Propagation equations, in the same way as AMP is obtained starting from BP. The derivation requires two steps. First, BP messages are projected by moment-matching onto (replica-symmetric) multivariate Gaussian distributions on the replicated variables

, which we express in the form

(19)

Then, messages on the edges are conveniently expressed in term of single site quantities. We note that, some statistical independence assumptions on the entries of the measurement matrix are crucial for the derivation, as goes for AMP as well. While the starting point of the derivation assumed integer , the resulting message passing can be analytically continued to any real . Applying this procedure to the GLE graphical model of Eq. (1) we obtain the GASP equations. Here we consider the limit to deal with the MAP estimation problem. Details of the GASP derivation and the finite GASP equations are given in Sec. 2 of the SM. Particular care has to be taken in the limit procedure, as a proper rescaling with is needed for each parameter. For instance, as the range of sensible choices for shrinks towards zero for increasing , we rescale through the substitution .

Relying on the definitions given Eqs. (8) and (9), we introduce the two 1RSB free entropies:

(20)
(21)

Here denotes the standard Gaussian integration . Using the shorthand notations and (notice the shift in the time indexes), and using again the definition (hence in our setting), the GASP equations read:

(22)
(23)
(24)
(25)
(26)
(27)
(28)
(29)
(30)
(31)
(32)
(33)
  initialize
  initialize to some values
  for  to  do
     compute using ( 22,232425)
     compute using (26,27)
     compute using ( 28293031)
     compute using (3233)
  end for
Algorithm 1 GASP() for MAP

The computational time and memory complexity per iteration of the algorithm is the same of GAMP and is determined by the linear operations in Eqs. (22) and (28). With respect to GAMP, we have the additional (but sub-leading) complexity due to the integrals in the input and output channels. In some special cases, the integrals in Eqs. (20) and (21) can be carried out analytically (e.g. in the phase retrieval problem).

Notice that GASP iteration reduces to standard GAMP iterations if and are initialized (or shrink) to zero, but can produce non-trivial fixed points depending on the initialization condition and on the value of .

We remark the importance of setting the time-indices correctly in order to allow convergence (Caltagirone et al., 2014). The full algorithm is detailed in Alg. 1.

The expressions for the special case of the absolute value channel (6) and regularization (7) can be found in Sec. 2.4 of the SM. An important comment is that the divergence issue arising in AMP.A, in the same setting, does not affect GASP: the discontinuity in the expression for the minimizer of Eq. (9) is smoothed out in the 1RSB version by the Gaussian integral in Eq. (20). We also note that, in phase retrieval, a problematic initialization can be obtained by choosing configurations that are exactly orthogonal to the signal, since the message-passing will always be trapped in the uninformative fixed-point (due to the symmetry of the problem). However, for finite size instances, a random Gaussian initial condition will have an overlap of order with the signal, which allows to escape the uninformative fixed point whenever it is unstable (i.e. for high ).

Figure 1: (Top) Probability of perfect recovery of the true signal using GASP (Alg. 1), as a function of the sampling ratio . (Middle) GASP and SE result after iterations. Start at with for SE and for GASP (, averaged over samples). (Bottom) Overlap with the true signal predicted by SE dynamics at and initial overlap (black lines) compared to 10 GASP trajectories for each value of . Here .

In Fig. 1 (Top and Middle), we show the probability of a perfect recovery and convergence times of GASP for the real-valued phase retrieval problem, for different sampling ratios and values of the symmetry-breaking parameter , with . The initial condition is given by and . Notice that standard Gaussian initialization is able to break the symmetry of the channel and, at large , GASP matches the fixed points predicted by SE (see next Section) with a small initial overlap with the true signal (). In order to achieve signal recovery at low , the symmetry-breaking parameter has to be increased. In correspondence of values , we report an algorithmic threshold around . This threshold is comparable to the one of AMP.A, without exploiting adaptive regularization and spectral initialization as AMP.A (and which could be employed also for GASP).

We report that, at fixed , when is increased above a certain value the message-passing will stop converging. The oscillating/diverging behavior of the messages can however be exploited for hand-tuning , in the absence of a replica analysis to support the selection of its most appropriate value. More details can be found in Sec. 3 of the SM.

We presented here the zero-temperature limit of the GASP message-passing to solve the MAP problem. Refer to Sec. 2 of the SM for a more general formulation dealing with the class of graphical models in the form of Eq. 1.

5 State Evolution for GASP

State Evolution (SE) is a set of iterative equations involving a few scalar quantities, that were rigorously proved to track the (G)AMP dynamics, in the sense of almost sure convergence of empirical averages (Javanmard & Montanari, 2013) in the large limit and with fixed sampling ratio . Following the analysis of Ref. (Rangan, 2011) for GAMP, in order to present the SE equations for GASP we assume that the observation model is such that can be expressed in the form for some function , with

a scalar- or vector-valued random variable modeling the noise and sampled according to some distribution

. We also set i.i.d.. The recursion is a closed set of equations over the variables and Initializing at time the variables and , the SE equations for :

(34)
(35)
(36)
(37)

where the expectation is over the process , , and . Also, we have a second set of equations that read:

(38)
(39)
(40)
(41)

where the expectation is over the Markov chain

, .

The trajectories of and in GASP concentrate for large on their expected value given by the SE dynamics. In order to frame the GASP State Evolution in the rigorous setting of Ref.(Javanmard & Montanari, 2013), we define a slightly different message-passing by replacing their GASP values for a given realization of the problem with the correspondent sample-independent SE values. Also, we replace with the expected value . Let us define the denoising functions:

(42)
(43)

and their vectorized extensions and . The modified GASP message-passing then reads

(44)
(45)

where the divergence terms are given by

(46)

Message-passing (44, 45) falls within the class of AMP algorithms analyzed in Ref. (Javanmard & Montanari, 2013) (under some further technical assumptions, see Proposition 5 there). Therefore, it can be rigorously tracked by the SE Eqs. (34,41) in the sense specified in that work. In particular, denoting here , we have have almost sure converge in the large system limit of the overlap with the true signal and of the norm of to their SE estimates:

(47)
(48)

In Fig. 1(Bottom), we compare the SE dynamics to the original GASP one (Alg. 1). We compare SE prediction for the evolution of the overlap to that observed in sample trajectories of GASP at , for a sampling ratio of and different values of . The initial estimate in GASP was set to be a mixture . Therefore we initialize SE with , and . Moreover, we set for both. As expected, we observe a good agreement between the two dynamics.

6 Effective Landscape and Message-Passing Algorithms

The posterior distribution of statistical models in the hard phase is known to be riddled with glassy states (Antenucci et al., 2019a) preventing the retrieval of the true signal, a situation which is exacerbated in the low temperature limit corresponding to MAP estimation.

Within the replica formalism, the 1RSB free energy provides a description of this complex landscape. The Parisi parameter allows to select the contributions of different families of states. More specifically acts as an inverse temperature coupled to the internal free energy of the states: increasing selects families of states with lower complexity (i.e., states that are less numerous) and lower free energy.

The fixed points of the State Evolution of GASP are in one-to-one correspondence to the stationary points of the 1RSB free energy, and while the role of in the dynamics of SE is difficult to analyze, some insights can be gained from the static description given by the free energy.

For phase retrieval in the MAP setting without regularization, a stable fixed-point of GAMP can be found in the space orthogonal to the signal (i.e. at overlap ) for values of the sampling ratio below (Ma et al., 2018), which is the algorithmic threshold for GAMP. For GASP instead, it is possible to see that the uninformative fixed-point is stable only below , a noticeable improvement of the threshold with respect to GAMP. This is obtained by choosing the corresponding to lowest complexity states according to the 1RSB free energy (see Sec. 3 of the SM for further details). As we will see in the following, both these thresholds can be lowered by employing a continuation strategy for the regularizer.

A thorough description of the results of the replica analysis and of the landscape properties for GLE models will be presented in a more technical future work.

7 MAP estimation with an regularizer

The objective function introduced in Eq. (2) contains a regularization term weighted by an intensity parameter .

Regularization plays and important role in reducing the variance of the inferred estimator, and can be crucial when the observations are noise-affected, since it lowers the sensitivity of the learned model to deviations in the training set. However, as observed in (Ma et al., 2018, 2019; Balan, 2016), regularization is also useful for its smoothing effect, and can be exploited in non-convex optimization problems even in the noiseless setting. When the regularization term is turned up, the optimization landscape gradually simplifies and it becomes easier to reach a global optimizer. However, the problem of getting stuck in bad local minima is avoided at the cost of introducing a bias. The continuation strategy is based on the fact that such biased estimator might be closer than the random initial configuration to the global optimizer of the unregularized objective : in a multi-stage approach, regularization is decreased (down to zero) after each warm restart.

Among the many possible continuation schedules for (a little decrease after each minimization, or, as in AMP.A, at the end of each iteration) in this paper we choose a simple two-stage approach: first we run GASP till convergence with a given value of , then we set in the successive iterations.

Figure 2:

Phase diagrams corresponding to the SE asymptotic analysis of GAMP (top) and GASP (bottom). The color maps indicate the overlap

reached at convergence in the presence of an regularizer of intensity .

In Fig.2, we compare the asymptotic performance (tracked by SE) of GAMP and GASP for the phase retrieval problem with an regularization. The color map indicates the overlap with the signal reached at the end of the first stage of our continuation strategy (with ), while the black curves delimit the perfect retrieval regions, where the overlap reached at the end of stage two (with ) is .

In both cases we set the initial variances to , and consider an initial condition with a small positive overlap with the signal, . An assumption of this kind is indeed needed to ensure that we avoid lingering on the fixed-point at ; however, the specific value of can be chosen arbitrarily (e.g., it could be taken much smaller without affecting the phase diagrams). Even in real-world applications, it is often the case that the non-orthogonality requirement is easily met, for example in many imaging applications the signal is known to be real non-negative. As explained in the previous section, we also set in the initialization of the self-overlap parameter.

In the GASP phase diagram, for each and , the value of was set to the thermodynamic optimum value (obtained at ), and was kept fixed throughout the two stages of our continuation strategy. This can be obtained by optimizing the 1RSB free energy over the symmetry-breaking parameter; the numerical values of , corresponding to the points in the plot, can be found in Sec. 3 of the SM, in Fig. 1. It is not strictly necessary to fix to this specific value, as any value in a broad range of around will still be effective (see for example Fig. 2 in the SM). As expected from the numerical experiments at , we can see from Fig. 2 that when the regularizer becomes too small an uninformative fixed-point (in ) becomes attractive for the dynamics of GASP and signal recovery becomes impossible below (we expect also the recovery region with at to shrink and close when the regularizer is further decreased).

It is clear that the introduction of an -norm is crucial for reducing the algorithmic gap of both GAMP and GASP (the information theoretic threshold is ), as previously observed in (Ma et al., 2018, 2019). In this work we find that also in GLE problems, when the mismatched setting is considered (and inference happens off the Nishimori line (Nishimori, 2001; Antenucci et al., 2019b)), the more fitting geometrical picture provided by the 1RSB ansatz can be exploited algorithmically: with a simple continuation strategy it is possible to lower the algorithmic threshold of GASP down to the Bayes-optimal value .

8 Discussion

We presented Generalized Approximate Survey Propagation, a novel algorithm designed to improve over AMP in the context of GLE inference problems, when faced with a mismatch between assumed and true generative model. The algorithm, parametrized by the symmetry-breaking parameter , allows one to go beyond some symmetry assumptions at the heart of the previous algorithms, and proves to be more suited for the MAP estimation task considered in this work.

In the prototypical case of real-valued phase retrieval, we have shown that with little tuning of it is possible to modify the effective landscape explored during the message-passing dynamics and avoid getting stuck in otherwise attractive uninformative fixed points. Furthermore, we have seen that, even in the noiseless case, a simple continuation strategy, based on the introduction of an regularizer, can guide GASP close enough to the signal and allow its recovery, extending the region of parameters where GASP is more effective than GAMP. In some cases we observed that GASP can achieve perfect retrieval until the Bayes-optimal threshold, at the sampling ratio . We also derived the 1RSB State Evolution equations, and showed that they can be used as a simple tool for tracking the asymptotic behaviour of GASP.

We delay a comprehensive analysis of the landscape associated to GLE models to a more technical publication, where we will also deal with the case of noisy observation channels. A straightforward follow-up of the present work could focus on the search for an adaptation scheme for the regularizer, possibly extending the work of Refs. (Ma et al., 2018, 2019), and more importantly, for a criterion to identify the best setting for the symmetry-breaking parameter. Another possible future line of work could go in the direction of relaxing some of the assumptions made in deriving the GASP algorithm over the observation matrix. This could motivate the derivation of a 1RSB version of the Vector Approximate Message Passing equations (Schniter et al., 2016). Also, the extension of GASP to deep non-linear inference model, along the lines of Ref. (Manoel et al., 2017; Fletcher et al., 2018) seems to be promising and technically feasible.

CL thanks Junjie Ma for sharing and explaining the code of their AMP.A algorithm.

References

  • Advani & Ganguli (2016) Advani, M. and Ganguli, S. Statistical mechanics of optimal convex inference in high dimensions. Physical Review X, 6(3):031034, 2016.
  • Antenucci et al. (2019a) Antenucci, F., Franz, S., Urbani, P., and Zdeborová, L. Glassy nature of the hard phase in inference problems. Physical Review X, 9(1):011020, 2019a.
  • Antenucci et al. (2019b) Antenucci, F., Krzakala, F., Urbani, P., and Zdeborová, L. Approximate survey propagation for statistical inference. Journal of Statistical Mechanics: Theory and Experiment, 2019(2):023401, 2019b.
  • Balan (2016) Balan, R. Reconstruction of signals from magnitudes of redundant representations: The comple case. Foundations of Computational Mathematics, 16(3):677–721, 2016.
  • Barbier et al. (2018) Barbier, J., Krzakala, F., Macris, N., Miolane, L., and Zdeborová, L.

    Optimal errors and phase transitions in high-dimensional generalized linear models.

    In Conference On Learning Theory, pp. 728–731, 2018.
  • Bayati & Montanari (2011) Bayati, M. and Montanari, A. The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Transactions on Information Theory, 57(2):764–785, 2011.
  • Braunstein et al. (2005) Braunstein, A., Mézard, M., and Zecchina, R. Survey propagation: An algorithm for satisfiability. Random Structures & Algorithms, 27(2):201–226, 2005.
  • Caltagirone et al. (2014) Caltagirone, F., Zdeborová, L., and Krzakala, F. On convergence of approximate message passing. In Information Theory (ISIT), 2014 IEEE International Symposium on, pp. 1812–1816. IEEE, 2014.
  • Candes et al. (2015) Candes, E. J., Li, X., and Soltanolkotabi, M. Phase retrieval via wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory, 61(4):1985–2007, 2015.
  • Charbonneau et al. (2017) Charbonneau, P., Kurchan, J., Parisi, G., Urbani, P., and Zamponi, F. Glass and jamming transitions: From exact results to finite-dimensional descriptions. Annual Review of Condensed Matter Physics, 8:265–288, 2017.
  • Chen & Candes (2015) Chen, Y. and Candes, E. Solving random quadratic systems of equations is nearly as easy as solving linear systems. In Advances in Neural Information Processing Systems, pp. 739–747, 2015.
  • Chen et al. (2018) Chen, Y., Chi, Y., Fan, J., and Ma, C. Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval. Mathematical Programming, pp. 1–33, 2018.
  • Deshpande & Montanari (2014) Deshpande, Y. and Montanari, A. Information-theoretically optimal sparse pca. In 2014 IEEE International Symposium on Information Theory, pp. 2197–2201. IEEE, 2014.
  • Deshpande et al. (2016) Deshpande, Y., Abbe, E., and Montanari, A. Asymptotic mutual information for the binary stochastic block model. In Information Theory (ISIT), 2016 IEEE International Symposium on, pp. 185–189. IEEE, 2016.
  • Dhifallah & Lu (2017) Dhifallah, O. and Lu, Y. M. Fundamental limits of phasemax for phase retrieval: A replica analysis. In Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP), 2017 IEEE 7th International Workshop on, pp. 1–5. IEEE, 2017.
  • Donoho & Montanari (2016) Donoho, D. and Montanari, A. High dimensional robust m-estimation: Asymptotic variance via approximate message passing. Probability Theory and Related Fields, 166(3-4):935–969, 2016.
  • Donoho et al. (2009) Donoho, D. L., Maleki, A., and Montanari, A. Message-passing algorithms for compressed sensing. Proceedings of the National Academy of Sciences, 106(45):18914–18919, 2009.
  • Fletcher et al. (2018) Fletcher, A. K., Rangan, S., and Schniter, P. Inference in deep networks in high dimensions. In 2018 IEEE International Symposium on Information Theory (ISIT), pp. 1884–1888. IEEE, 2018.
  • Goldstein & Studer (2018) Goldstein, T. and Studer, C. Phasemax: Convex phase retrieval via basis pursuit. IEEE Transactions on Information Theory, 2018.
  • Guo & Wang (2006) Guo, D. and Wang, C.-C. Asymptotic mean-square optimality of belief propagation for sparse linear systems. In Information Theory Workshop, 2006. ITW’06 Chengdu. IEEE, pp. 194–198. IEEE, 2006.
  • Jain et al. (2013) Jain, P., Netrapalli, P., and Sanghavi, S. Low-rank matrix completion using alternating minimization. In

    Proceedings of the forty-fifth annual ACM symposium on Theory of computing

    , pp. 665–674. ACM, 2013.
  • Javanmard & Montanari (2013) Javanmard, A. and Montanari, A. State evolution for general approximate message passing algorithms, with applications to spatial coupling. Information and Inference: A Journal of the IMA, 2(2):115–144, 2013.
  • Kabashima et al. (2016) Kabashima, Y., Krzakala, F., Mézard, M., Sakata, A., and Zdeborová, L. Phase transitions and sample complexity in bayes-optimal matrix factorization. IEEE Transactions on Information Theory, 62(7):4228–4265, 2016.
  • Krzakała et al. (2007) Krzakała, F., Montanari, A., Ricci-Tersenghi, F., Semerjian, G., and Zdeborová, L. Gibbs states and the set of solutions of random constraint satisfaction problems. Proceedings of the National Academy of Sciences, 104(25):10318–10323, 2007.
  • Krzakala et al. (2016) Krzakala, F., Ricci-Tersenghi, F., Zdeborova, L., Zecchina, R., Tramel, E. W., and Cugliandolo, L. F. Statistical Physics, Optimization, Inference, and Message-Passing Algorithms: Lecture Notes of the Les Houches School of Physics-Special Issue, October 2013. Oxford University Press, 2016.
  • Lu & Li (2017) Lu, Y. M. and Li, G. Phase transitions of spectral initialization for high-dimensional nonconvex estimation. arXiv preprint arXiv:1702.06435, 2017.
  • Luo et al. (2019) Luo, W., Alghamdi, W., and Lu, Y. M. Optimal spectral initialization for signal recovery with applications to phase retrieval. IEEE Transactions on Signal Processing, 2019.
  • Ma et al. (2018) Ma, J., Xu, J., and Maleki, A. Approximate message passing for amplitude based optimization. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 3365–3374, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR. URL http://proceedings.mlr.press/v80/ma18e.html.
  • Ma et al. (2019) Ma, J., Xu, J., and Maleki, A. Optimization-based amp for phase retrieval: The impact of initialization and l2-regularization. IEEE Transactions on Information Theory, 2019.
  • Manoel et al. (2017) Manoel, A., Krzakala, F., Mézard, M., and Zdeborová, L. Multi-layer generalized linear estimation. In Information Theory (ISIT), 2017 IEEE International Symposium on, pp. 2098–2102. IEEE, 2017.
  • Mézard (2017) Mézard, M. Mean-field message-passing equations in the hopfield model and its generalizations. Physical Review E, 95(2):022117, 2017.
  • Mezard & Montanari (2009) Mezard, M. and Montanari, A. Information, physics, and computation. Oxford University Press, 2009.
  • Mézard et al. (1987) Mézard, M., Parisi, G., and Virasoro, M. Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company, 1987.
  • Mézard et al. (2002) Mézard, M., Parisi, G., and Zecchina, R. Analytic and algorithmic solution of random satisfiability problems. Science, 297(5582):812–815, 2002.
  • Monasson (1995) Monasson, R. Structural glass transition and the entropy of the metastable states. Physical review letters, 75(15):2847, 1995.
  • Mondelli & Montanari (2018) Mondelli, M. and Montanari, A. Fundamental limits of weak recovery with applications to phase retrieval. Foundations of Computational Mathematics, pp. 1–71, 2018.
  • Mukherjee & Seelamantula (2018) Mukherjee, S. and Seelamantula, C. S. Phase retrieval from binary measurements. In IEEE Signal Processing Letters, volume 25, pp. 348–352. IEEE, 2018.
  • Nishimori (2001) Nishimori, H. Statistical physics of spin glasses and information processing: an introduction, volume 111. Clarendon Press, 2001.
  • Rangan (2010) Rangan, S. Estimation with random linear mixing, belief propagation and compressed sensing. In Information Sciences and Systems (CISS), 2010 44th Annual Conference on, pp. 1–6. IEEE, 2010.
  • Rangan (2011) Rangan, S. Generalized approximate message passing for estimation with random linear mixing. In Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on, pp. 2168–2172. IEEE, 2011.
  • Rangan & Fletcher (2012) Rangan, S. and Fletcher, A. K. Iterative estimation of constrained rank-one matrices in noise. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, pp. 1246–1250. IEEE, 2012.
  • Ricci-Tersenghi et al. (2019) Ricci-Tersenghi, F., Semerjian, G., and Zdeborová, L.

    Typology of phase transitions in bayesian inference problems.

    Physical Review E, 99(4):042109, 2019.
  • Ros et al. (2019) Ros, V., Arous, G. B., Biroli, G., and Cammarota, C.

    Complex energy landscapes in spiked-tensor and simple glassy models: Ruggedness, arrangements of local minima, and phase transitions.

    Physical Review X, 9(1):011003, 2019.
  • Schniter & Rangan (2015) Schniter, P. and Rangan, S. Compressive phase retrieval via generalized approximate message passing. IEEE Transactions on Signal Processing, 63(4):1043–1055, 2015.
  • Schniter et al. (2016) Schniter, P., Rangan, S., and Fletcher, A. K. Vector approximate message passing for the generalized linear model. In Signals, Systems and Computers, 2016 50th Asilomar Conference on, pp. 1525–1529. IEEE, 2016.
  • Sun et al. (2018) Sun, J., Qu, Q., and Wright, J. A geometric analysis of phase retrieval. Foundations of Computational Mathematics, 18(5):1131–1198, 2018.

Appendix A A recap on Generalized Approximate Message Passing

a.1 Derivation of GAMP

For the reader’s convenience and for familiarizing with the notation adopted throughout this work, we sketch the derivation of the Generalized Approximate Message Passing (GAMP) equations for Generalized Linear Estimation (GLE) models. For a longer discussion, we refer the reader to Refs. (Rangan, 2011; Ma et al., 2018; Kabashima et al., 2016). We assume the setting of Eq. (1) of the Main Text, that is a graphical model defined by the Hamiltonian:

(49)

with the further assumption that the entries of are i.i.d. zero-mean Gaussian variables with variance , i.e (but the derivation also applies to non-Gaussian variables with the same mean and variance). The configuration space is assumed to be some subset of . For discrete spaces, integrals should be replace with summations. Also, we consider the regime of large and , with finite . The starting point for the derivation of GAMP equations is the Belief Propagation (BP) algorithm (Mezard & Montanari, 2009), characterized by the exchange of two sets of messages:

(50)
(51)

For the dense graphical model we are considering, by virtue of central limit arguments, we can relax the resulting identities among probability densities to relations among their first and second moments. The resulting approximated version of BP goes under the name of relaxed Belief Propagation (rBP) (Guo & Wang, 2006; Rangan, 2010; Mézard, 2017).

We define the expectations over the measure in Eq.(50) as , and its moments as and . In high dimensions we can see that the scalar product in Eq.(51) becomes Gaussian distributed according to .

In order to obtain the relationship between the moments of the two sets of distributions it is useful to introduce two scalar estimation functions, the input and output channels, that fully characterize the problem. The associated free entropies (Barbier et al., 2018) (i.e., log-normalization factors) can be expressed as:

(52)
(53)

Then, defining and , both evaluated in and , we can express through them the approximate message-passing, obtained at the second order of the Taylor expansion of the messages:

(54)

Next, we close the equations on single site quantities, discarding terms which are sub-leading for large and assuming zero mean and 1/N variance i.i.d entries in . Thus, we can remove the cavities and approximate the parameters of the (non-cavity) estimation channels as follows:

(55)
(56)
(57)
(58)

Finally, the expectations introduced above can be obtained via the derivatives:

(59)
(60)
(61)
(62)

where we used the shorthand notation and .

A slight simplification of the message passing (which involves operations per iteration), relies on the observation that due to the statistical properties of the quantities and do not depend on their indexes (Rangan, 2011), so we can define their scalar counterparts:

(63)
(64)

where . Therefore we obtain:

(65)
(66)
(67)
(68)
(69)
(70)
(71)
(72)

Eqs. (65-72) are known as the GAMP iterations, and are valid for , given some initial condition and , along with .

a.2 Zero-temperature limit of GAMP

In order to apply the GAMP algorithm to MAP estimation or MAP + regularizer, we have to consider the zero-temperature limit . The limiting form of the equations depends on the model and on the regime (e.g. low or high ). Here we consider models defined on continuous spaces and in the high regime (e.g. for phase retrieval). In this case, while taking the limit, the message have to be rescaled appropriately in order for them to stay finite. Therefore we rescale the messages through the substitutions:

(73)
(74)
(75)
(76)
(77)

With these rescalings, the GAMP equations (65-72) are left unaltered, but the expressions for the free entropies of the scalar channels become

(78)
(79)

as it is easy to verify.

a.3 GAMP equations for real-valued phase retrieval and AMP.A equations

In the special case of the phase retrieval problem, with a loss and -norm and at zero temperature, the two scalar estimation channels of Eqs.(78) and (79) become:

(80)
(81)

Thus, Eqs. (66, 70, 71, 72) simply yield:

(82)
(83)
(84)
(85)

Eq. (67) is instead singular, since it involves the derivative of the function. Since we have