1 Introduction
Approximate message passing (AMP) algorithms have become a well established tool in the study of inference problems (Donoho et al., 2009; Donoho & Montanari, 2016; Advani & Ganguli, 2016) that can be represented by dense graphical models. An important feature of AMP is that its dynamical behavior in the large system limit can be exactly predicted through a dynamical system involving only scalar quantities called State Evolution (SE) (Bayati & Montanari, 2011). This relationship paved the way for a series of rigorous results (Rangan & Fletcher, 2012; Deshpande & Montanari, 2014; Deshpande et al., 2016). It also helps clarify the connection to several fascinating predictions obtained through the replica analysis in statistical physics (Mézard et al., 1987). In the optimal Bayesian setting, where one has perfect information on the process underlying data generation, AMP has been empirically shown to achieve optimal performances among polynomial algorithms for many different problems. However, in the more realistic case of mismatch between the assumed and the true generative model, i.e. when AMP is not derived on the true posterior distribution, it may become suboptimal. A possible source of problems for the AMP class of algorithms is the outbreak of Replica Symmetry Breaking (Mézard et al., 1987), a scenario where an exponential number of fixed point and algorithmic barriers dominate the free energy landscape explored by AMP. This phenomena can be accentuated in case of model mismatch: a notable example is maximum likelihood estimation (as opposed to estimation by the posterior mean, which corresponds to the low temperature limit of a statistical physics model.
These considerations are well known within the physics community of disordered systems (Krzakala et al., 2016), where the problem of signal estimation is informally referred to as “crystal hunting”. Estimation problems in high dimensions are characterized by a complex energyentropy competition where the true signal is hidden in a vast and potentially rough landscape. In a wide class of problems, one observes the presence of a algorithmically “hard” phase for some range of values for the parameters defining the problem (e.g. signaltonoise ration). In this regime, all known polynomial complexity algorithms fail to saturate the information theoretic bound (RicciTersenghi et al., 2019). While reconstruction is possible in principle, algorithms are trapped in a region of the configuration space with low overlap with the signal and many local minima (Antenucci et al., 2019a; Ros et al., 2019).
In a recent work (Antenucci et al., 2019b), a novel messagepassing algorithm, Approximate Survey Propagation (ASP), was introduced in the context of lowrank. The algorithm is based on the 1step Replica Symmetry Breaking (1RSB) ansatz from spin glass theory (Mézard et al., 1987), which was specifically developed to deal with landscapes populated by exponentially many local minima. It was shown that ASP on the mismatched model could reach the performance of (but not improve on) matched AMP and do far better than mismatched AMP (Antenucci et al., 2019a, b). In the present paper, we build upon these previous works and derive the ASP algorithm for Generalized Linear Estimation (GLE) models. Since the extension of AMP to GLE problems is commonly known as GAMP, we call Generalized Approximate Survey Propagation (GASP), our extension of ASP. We will show that also in this case, in presence of model mismatch, (G)ASP improves over the corresponding (G)AMP.
2 Model specification
An instance of the general class of models to which GASP can be applied is defined, for some integer and , by an observed signal and an observation matrix . Clearly, this scenario encompasses also GLE. We denote with , the rows of and refer to the ratio as the sampling ratio of
. We consider a probability density distribution
on a (possibly discrete) space , , defined as:(1) 
where, following statistical physics jargon, plays the role of an inverse temperature, is a normalization factor called partition function (both and implicitly depend on and ), and is the Hamiltonian of the model, that in our setting takes the form:
(2) 
Here denotes the scalar product and we call and
the loss function and the regularizer of the problem respectively.
In this quite general context, the purpose of GASP is to approximately compute the marginal distribution , along with some expected quantities such as e.g. . The approximation entailed in GASP turns out to be exact under some assumptions in the large limit, as we shall later see. A crucial assumption in the derivation of the GASP algorithm (and of GAMP as well), is that the entries of
are independently generated according to some zero mean and finite variance distribution.
Although the general formulation of GASP, presented in Sec. 2 of the SM, is able to deal with any model of the form (1), we will here restrict the setting to discuss Generalized Linear Estimation (Rangan, 2011).
In GLE problems, is sensibly chosen in order to infer a true signal , whose components are assumed to be independently extracted from some prior , . The observations are independently produced by a (probabilistic) scalar channel : .
It is then reasonable to choose , and , so that the probability density corresponds to the true posterior , where denotes equality up to a normalization factor. We refer to this setting as to the Bayesianoptimal or matched setting (Barbier et al., 2018). Notice that in the limit concentrates around the maximumaposteriori (MAP) estimate. If or if the Hamiltonian doesn’t correspond to the minus log posterior (e.g, when and used in the Hamiltonian do not correspond to true ones) we talk about model mismatch.
As a testing ground for GASP, and the corresponding State Evolution, we here consider the phase retrieval problem, which has undergone intense investigation in recent years (Candes et al., 2015; Dhifallah & Lu, 2017; Chen et al., 2018; Goldstein & Studer, 2018; Mondelli & Montanari, 2018; Sun et al., 2018; Mukherjee & Seelamantula, 2018). We examine its noiseless and realvalued formulation, where observations are generated according to the process
(3)  
(4)  
(5) 
for some and , such that . For such generative model, we will focus on the problem of recovering by minimizing the energy function of Eq. (2), in the case
(6)  
(7) 
Since the setting assumed for inference corresponds to MAP estimation in presence of a noisy channel, we are dealing with a case of model mismatch. The effect of the parameter on the estimation shall be explored in Sec. 7, but we assume until then. The optimization procedure will be performed using the zerotemperature (i.e. ) version of the GASP algorithm.
3 Previous work on Approximate Message Passing for Phase Retrieval
Generalized approximate message passing (GAMP) was developed and rigorously analyzed in Refs. (Rangan, 2011) and (Javanmard & Montanari, 2013). It was then applied for the first time to the (complexvalued) phase retrieval problem in Ref. (Schniter & Rangan, 2015). In Ref. (Barbier et al., 2018) the authors report an algorithmic threshold for the perfect recovery of , when using matched AMP on the realvalued version of the problem. This is to be compared to the information theoretic bound .
The performance of GAMP in the MAP estimation setting, instead, was investigated in Ref. (Ma et al., 2018, 2019). A “vanilla” implementation of the zero temperature GAMP equations for the absolute value channel was reported to achieve perfect recovery for realvalued signals above . The authors were able to show that the algorithmic threshold of GAMP in the mismatched case can however be drastically lowered by introducing regularization a regularization term ultimately continued to zero. The AMP.A algorithm proposed in (Ma et al., 2018, 2019) uses an adaptive regularization that improves the estimation threshold and also makes the algorithm more numerically robust compensating a problematic divergence that appears in the messagepassing equations (see Sec. 1.3 in the SM for further details).
Another important ingredient for AMP.A’s performance is initialization: in order to achieve perfect recovery one has to start from a configuration that falls within the basin of attraction of the true signal, which rapidly shrinks as the sampling ratio decreases. A wellstudied method for obtaining a configuration correlated with the signal is spectral initialization, introduced and studied in Refs. (Jain et al., 2013; Candes et al., 2015; Chen & Candes, 2015)
: in this case the starting condition is given by the principal eigenvector of a matrix obtained from the data matrix
and the labels passed through a nonlinear processing function. The asymptotic performance of this method was analyzed in (Lu & Li, 2017), while the form of the optimal processing function was described in (Mondelli & Montanari, 2018; Luo et al., 2019). However, since the SE description is based on the assumption of the initial condition being uncorrelated with the data, in AMP.A the authors revisited the method, proposing a modification that guarantees “enough independency” while still providing high overlap between the starting point and the signal.With the combination of these two heuristics, AMP.A is able to reconstruct the signal down
. In the present paper we will show that, with a basic continuation scheme, the 1RSB version of the zero temperature GAMP can reach the Bayesoptimal threshold also in the mismatched case, without the need of spectral initialization.3.1 GAMP equations at zero temperature
Here we provide a brief summary of the AMP equations for the general graphical model of Eq. (1), in the limit. This is both to allow an easy comparison with our novel GASP algorithm and to introduce some notation that will be useful in the following discussion. There is some degree of model dependence in the scaling of the messages when taking the zerotemperature limit: here we adopt the one appropriate for overconstrained models in continuous space. Details of the derivation can be found in Sec. 1 of the SM, along with the specialization of the equations for phase retrieval.
First, we introduce two free entropy functions associated to the input and output channels (Rangan, 2011):
(8)  
(9) 
We define for convenience and . In our notation the GAMP message passing equations read:
(10)  
(11)  
(12)  
(13)  
(14)  
(15)  
(16)  
(17) 
where . It is clear from the equations that the two free entropy functions are supposed to be twice differentiable. This is not the case for phase retrieval, where GAMP encounters some nontrivial numerical stability issues: during the messagepassing iterations one would have to approximately evaluate an empirical average of , containing Dirac’s function. This is the problem encountered in AMP.A of Ref. (Ma et al., 2018). We will see that this problem is not present in GASP thanks to a Gaussian smoothing of the denoising function.
4 Generalized Approximate Survey Propagation
The (G)ASP algorithm builds on decades of progress within the statistical physics community in understanding and dealing with rough highdimensional landscapes. The starting point for the derivation of the algorithm is the partition function of replicas (or clones) of the system :
(18) 
Note that, while this probability measure factorizes trivially, setting can introduce many important differences with respect to the standard case, both from the algorithmic and from the physics standpoints (Monasson, 1995; Antenucci et al., 2019b).
We write down the Belief Propagation (BP) equations associated to the replicated factor graph, where messages are probability distributions associated to each edge over the singlesite replicated variables
. We make the assumption that the messages are symmetric under the group of replica indexes permutations. This allows for a parametrization of the message passing that can be continued analytically to any real value of. The resulting algorithm goes under the name of 1RSB Cavity Method or, more loosely speaking, of Survey Propagation (with reference in particular to a zero temperature version of the 1RSB cavity method in discrete constraint satisfaction problems), and led to many algorithmic breakthroughs in combinatorial optimization on sparse graphical models
(Mézard et al., 2002; Braunstein et al., 2005; Krzakała et al., 2007). One possible derivation of the (G)ASP algorithm is as the dense graph limit of the Survey Propagation equations, in the same way as AMP is obtained starting from BP. The derivation requires two steps. First, BP messages are projected by momentmatching onto (replicasymmetric) multivariate Gaussian distributions on the replicated variables
, which we express in the form(19) 
Then, messages on the edges are conveniently expressed in term of single site quantities. We note that, some statistical independence assumptions on the entries of the measurement matrix are crucial for the derivation, as goes for AMP as well. While the starting point of the derivation assumed integer , the resulting message passing can be analytically continued to any real . Applying this procedure to the GLE graphical model of Eq. (1) we obtain the GASP equations. Here we consider the limit to deal with the MAP estimation problem. Details of the GASP derivation and the finite GASP equations are given in Sec. 2 of the SM. Particular care has to be taken in the limit procedure, as a proper rescaling with is needed for each parameter. For instance, as the range of sensible choices for shrinks towards zero for increasing , we rescale through the substitution .
Relying on the definitions given Eqs. (8) and (9), we introduce the two 1RSB free entropies:
(20)  
(21) 
Here denotes the standard Gaussian integration . Using the shorthand notations and (notice the shift in the time indexes), and using again the definition (hence in our setting), the GASP equations read:
(22)  
(23)  
(24)  
(25)  
(26)  
(27)  
(28)  
(29)  
(30)  
(31)  
(32)  
(33) 
The computational time and memory complexity per iteration of the algorithm is the same of GAMP and is determined by the linear operations in Eqs. (22) and (28). With respect to GAMP, we have the additional (but subleading) complexity due to the integrals in the input and output channels. In some special cases, the integrals in Eqs. (20) and (21) can be carried out analytically (e.g. in the phase retrieval problem).
Notice that GASP iteration reduces to standard GAMP iterations if and are initialized (or shrink) to zero, but can produce nontrivial fixed points depending on the initialization condition and on the value of .
We remark the importance of setting the timeindices correctly in order to allow convergence (Caltagirone et al., 2014). The full algorithm is detailed in Alg. 1.
The expressions for the special case of the absolute value channel (6) and regularization (7) can be found in Sec. 2.4 of the SM. An important comment is that the divergence issue arising in AMP.A, in the same setting, does not affect GASP: the discontinuity in the expression for the minimizer of Eq. (9) is smoothed out in the 1RSB version by the Gaussian integral in Eq. (20). We also note that, in phase retrieval, a problematic initialization can be obtained by choosing configurations that are exactly orthogonal to the signal, since the messagepassing will always be trapped in the uninformative fixedpoint (due to the symmetry of the problem). However, for finite size instances, a random Gaussian initial condition will have an overlap of order with the signal, which allows to escape the uninformative fixed point whenever it is unstable (i.e. for high ).
In Fig. 1 (Top and Middle), we show the probability of a perfect recovery and convergence times of GASP for the realvalued phase retrieval problem, for different sampling ratios and values of the symmetrybreaking parameter , with . The initial condition is given by and . Notice that standard Gaussian initialization is able to break the symmetry of the channel and, at large , GASP matches the fixed points predicted by SE (see next Section) with a small initial overlap with the true signal (). In order to achieve signal recovery at low , the symmetrybreaking parameter has to be increased. In correspondence of values , we report an algorithmic threshold around . This threshold is comparable to the one of AMP.A, without exploiting adaptive regularization and spectral initialization as AMP.A (and which could be employed also for GASP).
We report that, at fixed , when is increased above a certain value the messagepassing will stop converging. The oscillating/diverging behavior of the messages can however be exploited for handtuning , in the absence of a replica analysis to support the selection of its most appropriate value. More details can be found in Sec. 3 of the SM.
We presented here the zerotemperature limit of the GASP messagepassing to solve the MAP problem. Refer to Sec. 2 of the SM for a more general formulation dealing with the class of graphical models in the form of Eq. 1.
5 State Evolution for GASP
State Evolution (SE) is a set of iterative equations involving a few scalar quantities, that were rigorously proved to track the (G)AMP dynamics, in the sense of almost sure convergence of empirical averages (Javanmard & Montanari, 2013) in the large limit and with fixed sampling ratio . Following the analysis of Ref. (Rangan, 2011) for GAMP, in order to present the SE equations for GASP we assume that the observation model is such that can be expressed in the form for some function , with
a scalar or vectorvalued random variable modeling the noise and sampled according to some distribution
. We also set i.i.d.. The recursion is a closed set of equations over the variables and Initializing at time the variables and , the SE equations for :(34)  
(35)  
(36)  
(37) 
where the expectation is over the process , , and . Also, we have a second set of equations that read:
(38)  
(39)  
(40)  
(41) 
where the expectation is over the Markov chain
, .The trajectories of and in GASP concentrate for large on their expected value given by the SE dynamics. In order to frame the GASP State Evolution in the rigorous setting of Ref.(Javanmard & Montanari, 2013), we define a slightly different messagepassing by replacing their GASP values for a given realization of the problem with the correspondent sampleindependent SE values. Also, we replace with the expected value . Let us define the denoising functions:
(42)  
(43) 
and their vectorized extensions and . The modified GASP messagepassing then reads
(44)  
(45) 
where the divergence terms are given by
(46)  
Messagepassing (44, 45) falls within the class of AMP algorithms analyzed in Ref. (Javanmard & Montanari, 2013) (under some further technical assumptions, see Proposition 5 there). Therefore, it can be rigorously tracked by the SE Eqs. (34,41) in the sense specified in that work. In particular, denoting here , we have have almost sure converge in the large system limit of the overlap with the true signal and of the norm of to their SE estimates:
(47)  
(48) 
In Fig. 1(Bottom), we compare the SE dynamics to the original GASP one (Alg. 1). We compare SE prediction for the evolution of the overlap to that observed in sample trajectories of GASP at , for a sampling ratio of and different values of . The initial estimate in GASP was set to be a mixture . Therefore we initialize SE with , and . Moreover, we set for both. As expected, we observe a good agreement between the two dynamics.
6 Effective Landscape and MessagePassing Algorithms
The posterior distribution of statistical models in the hard phase is known to be riddled with glassy states (Antenucci et al., 2019a) preventing the retrieval of the true signal, a situation which is exacerbated in the low temperature limit corresponding to MAP estimation.
Within the replica formalism, the 1RSB free energy provides a description of this complex landscape. The Parisi parameter allows to select the contributions of different families of states. More specifically acts as an inverse temperature coupled to the internal free energy of the states: increasing selects families of states with lower complexity (i.e., states that are less numerous) and lower free energy.
The fixed points of the State Evolution of GASP are in onetoone correspondence to the stationary points of the 1RSB free energy, and while the role of in the dynamics of SE is difficult to analyze, some insights can be gained from the static description given by the free energy.
For phase retrieval in the MAP setting without regularization, a stable fixedpoint of GAMP can be found in the space orthogonal to the signal (i.e. at overlap ) for values of the sampling ratio below (Ma et al., 2018), which is the algorithmic threshold for GAMP. For GASP instead, it is possible to see that the uninformative fixedpoint is stable only below , a noticeable improvement of the threshold with respect to GAMP. This is obtained by choosing the corresponding to lowest complexity states according to the 1RSB free energy (see Sec. 3 of the SM for further details). As we will see in the following, both these thresholds can be lowered by employing a continuation strategy for the regularizer.
A thorough description of the results of the replica analysis and of the landscape properties for GLE models will be presented in a more technical future work.
7 MAP estimation with an regularizer
The objective function introduced in Eq. (2) contains a regularization term weighted by an intensity parameter .
Regularization plays and important role in reducing the variance of the inferred estimator, and can be crucial when the observations are noiseaffected, since it lowers the sensitivity of the learned model to deviations in the training set. However, as observed in (Ma et al., 2018, 2019; Balan, 2016), regularization is also useful for its smoothing effect, and can be exploited in nonconvex optimization problems even in the noiseless setting. When the regularization term is turned up, the optimization landscape gradually simplifies and it becomes easier to reach a global optimizer. However, the problem of getting stuck in bad local minima is avoided at the cost of introducing a bias. The continuation strategy is based on the fact that such biased estimator might be closer than the random initial configuration to the global optimizer of the unregularized objective : in a multistage approach, regularization is decreased (down to zero) after each warm restart.
Among the many possible continuation schedules for (a little decrease after each minimization, or, as in AMP.A, at the end of each iteration) in this paper we choose a simple twostage approach: first we run GASP till convergence with a given value of , then we set in the successive iterations.
Phase diagrams corresponding to the SE asymptotic analysis of GAMP (top) and GASP (bottom). The color maps indicate the overlap
reached at convergence in the presence of an regularizer of intensity .In Fig.2, we compare the asymptotic performance (tracked by SE) of GAMP and GASP for the phase retrieval problem with an regularization. The color map indicates the overlap with the signal reached at the end of the first stage of our continuation strategy (with ), while the black curves delimit the perfect retrieval regions, where the overlap reached at the end of stage two (with ) is .
In both cases we set the initial variances to , and consider an initial condition with a small positive overlap with the signal, . An assumption of this kind is indeed needed to ensure that we avoid lingering on the fixedpoint at ; however, the specific value of can be chosen arbitrarily (e.g., it could be taken much smaller without affecting the phase diagrams). Even in realworld applications, it is often the case that the nonorthogonality requirement is easily met, for example in many imaging applications the signal is known to be real nonnegative. As explained in the previous section, we also set in the initialization of the selfoverlap parameter.
In the GASP phase diagram, for each and , the value of was set to the thermodynamic optimum value (obtained at ), and was kept fixed throughout the two stages of our continuation strategy. This can be obtained by optimizing the 1RSB free energy over the symmetrybreaking parameter; the numerical values of , corresponding to the points in the plot, can be found in Sec. 3 of the SM, in Fig. 1. It is not strictly necessary to fix to this specific value, as any value in a broad range of around will still be effective (see for example Fig. 2 in the SM). As expected from the numerical experiments at , we can see from Fig. 2 that when the regularizer becomes too small an uninformative fixedpoint (in ) becomes attractive for the dynamics of GASP and signal recovery becomes impossible below (we expect also the recovery region with at to shrink and close when the regularizer is further decreased).
It is clear that the introduction of an norm is crucial for reducing the algorithmic gap of both GAMP and GASP (the information theoretic threshold is ), as previously observed in (Ma et al., 2018, 2019). In this work we find that also in GLE problems, when the mismatched setting is considered (and inference happens off the Nishimori line (Nishimori, 2001; Antenucci et al., 2019b)), the more fitting geometrical picture provided by the 1RSB ansatz can be exploited algorithmically: with a simple continuation strategy it is possible to lower the algorithmic threshold of GASP down to the Bayesoptimal value .
8 Discussion
We presented Generalized Approximate Survey Propagation, a novel algorithm designed to improve over AMP in the context of GLE inference problems, when faced with a mismatch between assumed and true generative model. The algorithm, parametrized by the symmetrybreaking parameter , allows one to go beyond some symmetry assumptions at the heart of the previous algorithms, and proves to be more suited for the MAP estimation task considered in this work.
In the prototypical case of realvalued phase retrieval, we have shown that with little tuning of it is possible to modify the effective landscape explored during the messagepassing dynamics and avoid getting stuck in otherwise attractive uninformative fixed points. Furthermore, we have seen that, even in the noiseless case, a simple continuation strategy, based on the introduction of an regularizer, can guide GASP close enough to the signal and allow its recovery, extending the region of parameters where GASP is more effective than GAMP. In some cases we observed that GASP can achieve perfect retrieval until the Bayesoptimal threshold, at the sampling ratio . We also derived the 1RSB State Evolution equations, and showed that they can be used as a simple tool for tracking the asymptotic behaviour of GASP.
We delay a comprehensive analysis of the landscape associated to GLE models to a more technical publication, where we will also deal with the case of noisy observation channels. A straightforward followup of the present work could focus on the search for an adaptation scheme for the regularizer, possibly extending the work of Refs. (Ma et al., 2018, 2019), and more importantly, for a criterion to identify the best setting for the symmetrybreaking parameter. Another possible future line of work could go in the direction of relaxing some of the assumptions made in deriving the GASP algorithm over the observation matrix. This could motivate the derivation of a 1RSB version of the Vector Approximate Message Passing equations (Schniter et al., 2016). Also, the extension of GASP to deep nonlinear inference model, along the lines of Ref. (Manoel et al., 2017; Fletcher et al., 2018) seems to be promising and technically feasible.
CL thanks Junjie Ma for sharing and explaining the code of their AMP.A algorithm.
References
 Advani & Ganguli (2016) Advani, M. and Ganguli, S. Statistical mechanics of optimal convex inference in high dimensions. Physical Review X, 6(3):031034, 2016.
 Antenucci et al. (2019a) Antenucci, F., Franz, S., Urbani, P., and Zdeborová, L. Glassy nature of the hard phase in inference problems. Physical Review X, 9(1):011020, 2019a.
 Antenucci et al. (2019b) Antenucci, F., Krzakala, F., Urbani, P., and Zdeborová, L. Approximate survey propagation for statistical inference. Journal of Statistical Mechanics: Theory and Experiment, 2019(2):023401, 2019b.
 Balan (2016) Balan, R. Reconstruction of signals from magnitudes of redundant representations: The comple case. Foundations of Computational Mathematics, 16(3):677–721, 2016.

Barbier et al. (2018)
Barbier, J., Krzakala, F., Macris, N., Miolane, L., and Zdeborová, L.
Optimal errors and phase transitions in highdimensional generalized linear models.
In Conference On Learning Theory, pp. 728–731, 2018.  Bayati & Montanari (2011) Bayati, M. and Montanari, A. The dynamics of message passing on dense graphs, with applications to compressed sensing. IEEE Transactions on Information Theory, 57(2):764–785, 2011.
 Braunstein et al. (2005) Braunstein, A., Mézard, M., and Zecchina, R. Survey propagation: An algorithm for satisfiability. Random Structures & Algorithms, 27(2):201–226, 2005.
 Caltagirone et al. (2014) Caltagirone, F., Zdeborová, L., and Krzakala, F. On convergence of approximate message passing. In Information Theory (ISIT), 2014 IEEE International Symposium on, pp. 1812–1816. IEEE, 2014.
 Candes et al. (2015) Candes, E. J., Li, X., and Soltanolkotabi, M. Phase retrieval via wirtinger flow: Theory and algorithms. IEEE Transactions on Information Theory, 61(4):1985–2007, 2015.
 Charbonneau et al. (2017) Charbonneau, P., Kurchan, J., Parisi, G., Urbani, P., and Zamponi, F. Glass and jamming transitions: From exact results to finitedimensional descriptions. Annual Review of Condensed Matter Physics, 8:265–288, 2017.
 Chen & Candes (2015) Chen, Y. and Candes, E. Solving random quadratic systems of equations is nearly as easy as solving linear systems. In Advances in Neural Information Processing Systems, pp. 739–747, 2015.
 Chen et al. (2018) Chen, Y., Chi, Y., Fan, J., and Ma, C. Gradient descent with random initialization: Fast global convergence for nonconvex phase retrieval. Mathematical Programming, pp. 1–33, 2018.
 Deshpande & Montanari (2014) Deshpande, Y. and Montanari, A. Informationtheoretically optimal sparse pca. In 2014 IEEE International Symposium on Information Theory, pp. 2197–2201. IEEE, 2014.
 Deshpande et al. (2016) Deshpande, Y., Abbe, E., and Montanari, A. Asymptotic mutual information for the binary stochastic block model. In Information Theory (ISIT), 2016 IEEE International Symposium on, pp. 185–189. IEEE, 2016.
 Dhifallah & Lu (2017) Dhifallah, O. and Lu, Y. M. Fundamental limits of phasemax for phase retrieval: A replica analysis. In Computational Advances in MultiSensor Adaptive Processing (CAMSAP), 2017 IEEE 7th International Workshop on, pp. 1–5. IEEE, 2017.
 Donoho & Montanari (2016) Donoho, D. and Montanari, A. High dimensional robust mestimation: Asymptotic variance via approximate message passing. Probability Theory and Related Fields, 166(34):935–969, 2016.
 Donoho et al. (2009) Donoho, D. L., Maleki, A., and Montanari, A. Messagepassing algorithms for compressed sensing. Proceedings of the National Academy of Sciences, 106(45):18914–18919, 2009.
 Fletcher et al. (2018) Fletcher, A. K., Rangan, S., and Schniter, P. Inference in deep networks in high dimensions. In 2018 IEEE International Symposium on Information Theory (ISIT), pp. 1884–1888. IEEE, 2018.
 Goldstein & Studer (2018) Goldstein, T. and Studer, C. Phasemax: Convex phase retrieval via basis pursuit. IEEE Transactions on Information Theory, 2018.
 Guo & Wang (2006) Guo, D. and Wang, C.C. Asymptotic meansquare optimality of belief propagation for sparse linear systems. In Information Theory Workshop, 2006. ITW’06 Chengdu. IEEE, pp. 194–198. IEEE, 2006.

Jain et al. (2013)
Jain, P., Netrapalli, P., and Sanghavi, S.
Lowrank matrix completion using alternating minimization.
In
Proceedings of the fortyfifth annual ACM symposium on Theory of computing
, pp. 665–674. ACM, 2013.  Javanmard & Montanari (2013) Javanmard, A. and Montanari, A. State evolution for general approximate message passing algorithms, with applications to spatial coupling. Information and Inference: A Journal of the IMA, 2(2):115–144, 2013.
 Kabashima et al. (2016) Kabashima, Y., Krzakala, F., Mézard, M., Sakata, A., and Zdeborová, L. Phase transitions and sample complexity in bayesoptimal matrix factorization. IEEE Transactions on Information Theory, 62(7):4228–4265, 2016.
 Krzakała et al. (2007) Krzakała, F., Montanari, A., RicciTersenghi, F., Semerjian, G., and Zdeborová, L. Gibbs states and the set of solutions of random constraint satisfaction problems. Proceedings of the National Academy of Sciences, 104(25):10318–10323, 2007.
 Krzakala et al. (2016) Krzakala, F., RicciTersenghi, F., Zdeborova, L., Zecchina, R., Tramel, E. W., and Cugliandolo, L. F. Statistical Physics, Optimization, Inference, and MessagePassing Algorithms: Lecture Notes of the Les Houches School of PhysicsSpecial Issue, October 2013. Oxford University Press, 2016.
 Lu & Li (2017) Lu, Y. M. and Li, G. Phase transitions of spectral initialization for highdimensional nonconvex estimation. arXiv preprint arXiv:1702.06435, 2017.
 Luo et al. (2019) Luo, W., Alghamdi, W., and Lu, Y. M. Optimal spectral initialization for signal recovery with applications to phase retrieval. IEEE Transactions on Signal Processing, 2019.
 Ma et al. (2018) Ma, J., Xu, J., and Maleki, A. Approximate message passing for amplitude based optimization. In Dy, J. and Krause, A. (eds.), Proceedings of the 35th International Conference on Machine Learning, volume 80 of Proceedings of Machine Learning Research, pp. 3365–3374, Stockholmsmässan, Stockholm Sweden, 10–15 Jul 2018. PMLR. URL http://proceedings.mlr.press/v80/ma18e.html.
 Ma et al. (2019) Ma, J., Xu, J., and Maleki, A. Optimizationbased amp for phase retrieval: The impact of initialization and l2regularization. IEEE Transactions on Information Theory, 2019.
 Manoel et al. (2017) Manoel, A., Krzakala, F., Mézard, M., and Zdeborová, L. Multilayer generalized linear estimation. In Information Theory (ISIT), 2017 IEEE International Symposium on, pp. 2098–2102. IEEE, 2017.
 Mézard (2017) Mézard, M. Meanfield messagepassing equations in the hopfield model and its generalizations. Physical Review E, 95(2):022117, 2017.
 Mezard & Montanari (2009) Mezard, M. and Montanari, A. Information, physics, and computation. Oxford University Press, 2009.
 Mézard et al. (1987) Mézard, M., Parisi, G., and Virasoro, M. Spin glass theory and beyond: An Introduction to the Replica Method and Its Applications, volume 9. World Scientific Publishing Company, 1987.
 Mézard et al. (2002) Mézard, M., Parisi, G., and Zecchina, R. Analytic and algorithmic solution of random satisfiability problems. Science, 297(5582):812–815, 2002.
 Monasson (1995) Monasson, R. Structural glass transition and the entropy of the metastable states. Physical review letters, 75(15):2847, 1995.
 Mondelli & Montanari (2018) Mondelli, M. and Montanari, A. Fundamental limits of weak recovery with applications to phase retrieval. Foundations of Computational Mathematics, pp. 1–71, 2018.
 Mukherjee & Seelamantula (2018) Mukherjee, S. and Seelamantula, C. S. Phase retrieval from binary measurements. In IEEE Signal Processing Letters, volume 25, pp. 348–352. IEEE, 2018.
 Nishimori (2001) Nishimori, H. Statistical physics of spin glasses and information processing: an introduction, volume 111. Clarendon Press, 2001.
 Rangan (2010) Rangan, S. Estimation with random linear mixing, belief propagation and compressed sensing. In Information Sciences and Systems (CISS), 2010 44th Annual Conference on, pp. 1–6. IEEE, 2010.
 Rangan (2011) Rangan, S. Generalized approximate message passing for estimation with random linear mixing. In Information Theory Proceedings (ISIT), 2011 IEEE International Symposium on, pp. 2168–2172. IEEE, 2011.
 Rangan & Fletcher (2012) Rangan, S. and Fletcher, A. K. Iterative estimation of constrained rankone matrices in noise. In Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on, pp. 1246–1250. IEEE, 2012.

RicciTersenghi et al. (2019)
RicciTersenghi, F., Semerjian, G., and Zdeborová, L.
Typology of phase transitions in bayesian inference problems.
Physical Review E, 99(4):042109, 2019. 
Ros et al. (2019)
Ros, V., Arous, G. B., Biroli, G., and Cammarota, C.
Complex energy landscapes in spikedtensor and simple glassy models: Ruggedness, arrangements of local minima, and phase transitions.
Physical Review X, 9(1):011003, 2019.  Schniter & Rangan (2015) Schniter, P. and Rangan, S. Compressive phase retrieval via generalized approximate message passing. IEEE Transactions on Signal Processing, 63(4):1043–1055, 2015.
 Schniter et al. (2016) Schniter, P., Rangan, S., and Fletcher, A. K. Vector approximate message passing for the generalized linear model. In Signals, Systems and Computers, 2016 50th Asilomar Conference on, pp. 1525–1529. IEEE, 2016.
 Sun et al. (2018) Sun, J., Qu, Q., and Wright, J. A geometric analysis of phase retrieval. Foundations of Computational Mathematics, 18(5):1131–1198, 2018.
Appendix A A recap on Generalized Approximate Message Passing
a.1 Derivation of GAMP
For the reader’s convenience and for familiarizing with the notation adopted throughout this work, we sketch the derivation of the Generalized Approximate Message Passing (GAMP) equations for Generalized Linear Estimation (GLE) models. For a longer discussion, we refer the reader to Refs. (Rangan, 2011; Ma et al., 2018; Kabashima et al., 2016). We assume the setting of Eq. (1) of the Main Text, that is a graphical model defined by the Hamiltonian:
(49) 
with the further assumption that the entries of are i.i.d. zeromean Gaussian variables with variance , i.e (but the derivation also applies to nonGaussian variables with the same mean and variance). The configuration space is assumed to be some subset of . For discrete spaces, integrals should be replace with summations. Also, we consider the regime of large and , with finite . The starting point for the derivation of GAMP equations is the Belief Propagation (BP) algorithm (Mezard & Montanari, 2009), characterized by the exchange of two sets of messages:
(50)  
(51) 
For the dense graphical model we are considering, by virtue of central limit arguments, we can relax the resulting identities among probability densities to relations among their first and second moments. The resulting approximated version of BP goes under the name of relaxed Belief Propagation (rBP) (Guo & Wang, 2006; Rangan, 2010; Mézard, 2017).
We define the expectations over the measure in Eq.(50) as , and its moments as and . In high dimensions we can see that the scalar product in Eq.(51) becomes Gaussian distributed according to .
In order to obtain the relationship between the moments of the two sets of distributions it is useful to introduce two scalar estimation functions, the input and output channels, that fully characterize the problem. The associated free entropies (Barbier et al., 2018) (i.e., lognormalization factors) can be expressed as:
(52)  
(53) 
Then, defining and , both evaluated in and , we can express through them the approximate messagepassing, obtained at the second order of the Taylor expansion of the messages:
(54) 
Next, we close the equations on single site quantities, discarding terms which are subleading for large and assuming zero mean and 1/N variance i.i.d entries in . Thus, we can remove the cavities and approximate the parameters of the (noncavity) estimation channels as follows:
(55)  
(56)  
(57)  
(58) 
Finally, the expectations introduced above can be obtained via the derivatives:
(59)  
(60)  
(61)  
(62) 
where we used the shorthand notation and .
A slight simplification of the message passing (which involves operations per iteration), relies on the observation that due to the statistical properties of the quantities and do not depend on their indexes (Rangan, 2011), so we can define their scalar counterparts:
(63)  
(64) 
where . Therefore we obtain:
(65)  
(66)  
(67)  
(68)  
(69)  
(70)  
(71)  
(72) 
Eqs. (6572) are known as the GAMP iterations, and are valid for , given some initial condition and , along with .
a.2 Zerotemperature limit of GAMP
In order to apply the GAMP algorithm to MAP estimation or MAP + regularizer, we have to consider the zerotemperature limit . The limiting form of the equations depends on the model and on the regime (e.g. low or high ). Here we consider models defined on continuous spaces and in the high regime (e.g. for phase retrieval). In this case, while taking the limit, the message have to be rescaled appropriately in order for them to stay finite. Therefore we rescale the messages through the substitutions:
(73)  
(74)  
(75)  
(76)  
(77) 
With these rescalings, the GAMP equations (6572) are left unaltered, but the expressions for the free entropies of the scalar channels become
(78)  
(79) 
as it is easy to verify.
a.3 GAMP equations for realvalued phase retrieval and AMP.A equations
In the special case of the phase retrieval problem, with a loss and norm and at zero temperature, the two scalar estimation channels of Eqs.(78) and (79) become:
(80)  
(81) 
Thus, Eqs. (66, 70, 71, 72) simply yield:
(82)  
(83)  
(84)  
(85) 
Eq. (67) is instead singular, since it involves the derivative of the function. Since we have
Comments
There are no comments yet.