On the glassy nature of the hard phase in inference problems

05/15/2018
by   Fabrizio Antenucci, et al.
0

An algorithmically hard phase was described in a range of inference problems: even if the signal can be reconstructed with a small error from an information theoretic point of view, known algorithms fail unless the noise-to-signal ratio is sufficiently small. This hard phase is typically understood as a metastable branch of the dynamical evolution of message passing algorithms. In this work we study the metastable branch for a prototypical inference problem, the low-rank matrix factorization, that presents a hard phase. We show that for noise-to-signal ratios that are below the information theoretic threshold, the posterior measure is composed of an exponential number of metastable glassy states and we compute their entropy, called the complexity. We show that this glassiness extends even slightly below the algorithmic threshold below which the well-known approximate message passing (AMP) algorithm is able to closely reconstruct the signal. Counter-intuitively, we find that the performance of the AMP algorithm is not improved by taking into account the glassy nature of the hard phase.

READ FULL TEXT VIEW PDF

Authors

page 1

page 2

page 3

page 4

06/12/2018

Phase transitions in spiked matrix estimation: information-theoretic analysis

We study here the so-called spiked Wigner and Wishart models, where one ...
06/02/2017

Streaming Bayesian inference: theoretical limits and mini-batch approximate message-passing

In statistical learning for real-world large-scale data problems, one mu...
08/24/2020

Universality of Linearized Message Passing for Phase Retrieval with Structured Sensing Matrices

In the phase retrieval problem one seeks to recover an unknown n dimensi...
04/03/2020

TRAMP: Compositional Inference with TRee Approximate Message Passing

We introduce tramp, standing for TRee Approximate Message Passing, a pyt...
12/06/2018

Rank-one matrix estimation: analysis of algorithmic and information theoretic limits by the spatial coupling method

Factorizing low-rank matrices is a problem with many applications in mac...
05/20/2022

The price of ignorance: how much does it cost to forget noise structure in low-rank matrix estimation?

We consider the problem of estimating a rank-1 signal corrupted by struc...
12/24/2019

Robust Group Synchronization via Cycle-Edge Message Passing

We propose a general framework for group synchronization with adversaria...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Inference problems are ubiquitous in many scientific areas involving data. They can be summarized as follows: a signal is measured or observed in some way and the inference task is to reconstruct the signal from the set of observations. Many practical applications involving data rely on our ability to solve inference problems fast and efficiently. While from the point of view of computational complexity theory many of the practically important inference problems are algorithmically hard in the worst case, practitioners are solving them every day in many cases of interest. It is hence an important research question to know which types of inference problems can be solved efficiently and which cannot. Formally satisfying answer to this question would lead to an entirely new theory of typical computational complexity, and would likely shed new light on the way we develop algorithms.

For a range of inference problems the Bayesian inference naturally leads to statistical physics of systems with disorder, see e.g.

Grassberger and Nadal (1992)

. This connection was explored in a range of recent works and brought a class of models for inference problem in which the Bayes-optimal inference can be analyzed and presents a first order phase transition. As common in physics in high dimension, the first order phase transition is associated to the existence of a metastable region in which known efficient algorithms fail to reach the theoretical optimal performance. This metastable region was coined as the

hard phase, see e.g. Zdeborová and Krzakala (2016). It has been located in error correcting codes Richardson and Urbanke (2008); Mézard and Montanari (2009), compressed sensing Krzakala et al. (2012), community detection Decelle et al. (2011), the hidden-dense submatrix problem Deshpande and Montanari (2015); Montanari (2015)

, low-rank estimation problems including data clustering, sparse PCA or tensor factorization

Richard and Montanari (2014); Lesieur et al. (2017a)

, learning in neural networks

Györgyi (1990). The nature of the hard phase in all these problems is of the same origin, and therefore it is expected that algorithmic improvement in any of them would lead to improvement in all the others as well.

In the current state-of-the-art (including the references above) the hard phase is located as a performance barrier of a class of message passing algorithms. Message passing algorithms can be seen as spin-offs of the cavity method of spin glasses Mézard et al. (1987). In the context of inference on dense graphical models the algorithms is called approximate message passing (AMP) known from the context of compressed sensing Donoho et al. (2009). In the limit of large system size, the dynamical evolution of AMP can be tracked by the so-called state evolution (SE) Donoho et al. (2009); Bayati and Montanari (2011), whose fixed point equations coincide with the saddle point equations describing the thermodynamic of the system under the replica symmetric assumption. The analysis of SE and its comparison to the analysis of the Bayes-optimal performance reveals that there is an interval of noise-to-signal ratio where the signal could be reconstructed by sampling the posterior measure, while AMP is not able to converge to the optimal error. This interval marks the presence of the hard phase.

In this paper we want to attract further attention of the physics community towards the existence of this hard phase related to a 1st order phase transition in the optimal performance in inference problems. The following open questions might use the physics-like approach and insights: Could there be a physics-inspired algorithm that is able to overcome the algorithmic barrier the AMP algorithm encounters? Note that in problems where the corresponding graphical model can be designed, such as compressed sensing or error correcting codes, such a strategy related to nucleation indeed exists Kudekar et al. (2011); Krzakala et al. (2012). But what about the more ubiquitous problems where the graphical model is fixed? Are there some physical principles or laws that can provide further evidence towards the impenetrability of the algorithmic barrier?

The motivation of the present work was to investigate the above questions. We analyze the following physics-motivated strategy: It is known that the metastable part of the posterior measure in the hard phase is glassy Sompolinsky et al. (1990); Franz et al. (2001); Krzakala and Zdeborová (2009). Yet, the AMP algorithm fails to describe this glassiness properly. In some other contexts where message passing algorithms are successfully used, a correct account of glassiness leads to algorithm that improve over simpler ones. Notably this is the case of random constraint satisfaction problems, where the influential work Mézard et al. (2002) has shown that survey propagation, that takes correctly glassiness into account, beats the performance of belief propagation.

We pose therefore the problem whether, in inference tasks, the reconstruction of the signal becomes easier when one uses algorithms in which the glassiness is correctly taken into account. We investigate this strategy thoroughly in the present work. We confirm that the hard phase is glassy in the sense that it consists of an exponential number of local optima at higher free energy than the equilibrium one. However, when it comes to the reconstruction of the signal, our analysis leads us to the remarkable conclusion that, in contrast to constraint satisfaction and optimization problems, in inference problems taking into account the glassiness of the hard phase does not improve upon the performance of the simplest AMP algorithm. We thus provide an additional evidence towards the bold conjecture that in the corresponding inference problems AMP is the best of low-computational-complexity inference algorithms.

Note that such a negative result is very interesting from both physics and computer science point of view. In physics, a common intuitive narrative tells us that the properties of the energy landscape control the algorithmic difficulty of the problem. Yet a solid and physically intuitive explanation of why inference algorithm could not penetrate the hard phase remains open. Our results invite researchers to progress in this question, eventually leading to a precise understanding of the interplay between dynamics and landscape. In computer science, developments that go beyond the traditional worst-case computational complexity results are rare and the hard phase provides an unique and a sharply delimited case that might be computationally hard even for a typical instance. Building a theory that would explain the nature of hard phase might be the next pillar of our understanding of computational complexity.

Our analysis of the glassiness of the hard phase provides new insights on the performance of Monte Carlo or Langevin dynamics. Presence of the glassiness suggests that these sampling-based algorithms are slowed-down and thus their commonly used versions may not be able to match the performance of AMP. While this aligns with some of the the early literature Sompolinsky et al. (1990), more recent literature Decelle et al. (2011)

suggested, based on numerical evidence, that Monte Carlo sampling is as good as the message passing algorithm. Based on conclusion of our work, this question of performance barriers of sampling-based algorithms should be re-opened and investigated more thoroughly. Good understanding of performance of these algorithms is especially important in the view of the fact that some of the most performing systems currently use stochastic gradient descent, that can be seen as a variant of the Langevin dynamics.

This paper is organized as follows. In Section II we introduce the model on which we illustrate the main findings of this paper, we expect this picture to be generic and apply to all the models where the hard phase related to a first order phase transition in the performance of the Bayesian inference was identified. In Section III we remind the basic setting of Bayesian inference. In Section IV we give a summary of the main algorithmic consequences of our work. In Section V we then remind the replica approach to the study of the corresponding posterior measure. Section V.1 then summarized the known replica symmetric diagram and the resulting phase transitions. Section V.2 then includes the main technical results of the paper where we quantitatively analyze the glassiness of the hard phase, giving rise to our conclusions in section VI.

Ii Model

In order to be concrete we concentrate on a prototypical example of an inference problem with a hard phase - the constrained rank-one matrix estimation. This problem is representative of the whole class of inference problems where the hard phase related to a 1st order phase transition was identified Deshpande and Montanari (2015); Lesieur et al. (2015a, 2017b). We choose this example because it is very close to the Sherrington-Kirkpatrick model for which the study of glassy states is the most advanced Mézard et al. (1987). Glassiness was also studied in detail in the spherical or Ising -spin model, corresponding to spiked tensor estimation Richard and Montanari (2014). However, in that model the hard phase spans the full low-noise phase and the transition towards the easy phase, on which we aim focus here, happens for noise-to-signal-ratio too low to be straightforwardly investigated within the replica method.

In the rank-one matrix estimation problem the signal, denoted by

, is extracted from some separable prior probability distribution given by

. This signal is subjected to noisy measurements of the following form

(1)

where

are Gaussian random variables with zero mean and variance

. Therefore one observes the signal through the matrix . The inference problem is to reconstruct the signal given the observation of the matrix . The informational-theoretically optimal performance in this problem was analyzed in detail in Lesieur et al. (2017b) and this analysis was proven rigorously to be correct in Deshpande and Montanari (2014); Krzakala et al. (2016); Barbier et al. (2016); Lelarge and Miolane (2016). Refs. Rangan and Fletcher (2012); Deshpande and Montanari (2014); Lesieur et al. (2017b) also analyzed the performance of the AMP algorithm.

While the theoretical part of this paper is for a generic prior , the results section focuses on the Rademacher-Bernoulli prior

(2)

as this is a prototypical yet simple example in which the hard phase appears for sufficiently low Lesieur et al. (2015a, 2017b). Let us mention that the rank-one matrix estimation with the Rademacher-Bernoulli prior has a very natural interpretation in terms of community detection problem. Keeping this interpretation in mind can help the reader to get intuition about the problem. Nodes are of three types: belong to one community, to a second community, and does not belong to any community. The observations (1) can be interpreted as weights on edges of a graph that are on average larger for nodes that are either both in community one or both in community two, they are on average smaller if one of the nodes is in community one and the other in community two, and they are independent and unbiased when one of the nodes does not belong to any community. Thanks to the output universality result of Lesieur et al. (2015b); Krzakala et al. (2016) the result presented in this paper also hold for a model where the observations correspond to the adjacency matrix of an unweighted graph with Fisher information corresponding to the inverse of the variance .

Iii Bayesian inference and approximate message passing

We study the the so-called Bayes optimal setting, which means that we know both the prior and the variance of the noise. The probability distribution of given is given by Bayes formula

(3)

Since the noise is Gaussian we have

(4)

Both in Eq. (3) and (4) we have omitted the normalization constants. An estimate of the components of the signal that minimize the mean-squared-error with the ground truth signal is computed as

(5)

where the brackets stand for the average over the posterior measure Eq. (3). Therefore in order to solve the inference problem we need to compute the local magnetizations . The AMP algorithm is aiming to do precisely that, its derivation can be found e.g. in Lesieur et al. (2017b). AMP boils down to a set of recursion relations of the form

(6)

whose iterative fixed point is taken as an estimate of the signal. It is known that fixed points of the state evolution of the AMP algorithm is in the thermodynamic limit described by the replica symmetric (RS) solution of the model Donoho et al. (2009); Bayati and Montanari (2011). AMP follows the RS solution irrespectively of the fact whether RS is the physically correct description of the posterior measure or not.

As shown in Antenucci et al. (2018), it is possible to derive a generalized AMP, that we call Approximate Survey Propagation (ASP) algorithm, whose state evolution fixed points coincide with the replica equations in the one-step replica symmetry breaking (1RSB) ansatz. Just as AMP, the ASP algorithm can be also written in a form Antenucci et al. (2018)

(7)

depending on one additional free parameter , corresponding to the Parisi parameter from the spin glass literature. The special case of reduces the ASP algorithm back to AMP. The 1RSB solution is known to provide a better description - in many case exact - of glassy states. In section V we hence study the thermodynamics of the above model in the RS and 1RSB ansatz, focusing on its properties in the hard phase.

Iv Summary of main algorithmic result

Before going to the technical part of the replica analysis in Sec. V, we briefly summarize the corresponding main algorithmic result. In section V we then investigate in detail the 1RSB solution of the low-rank matrix estimation model (1) focusing on the glassy properties of the hard phase. Our main interest, however, is in the relation between the 1RSB solution and the associated algorithmic performance. The main question we ask is whether ASP can (for a suitable choice of the Parisi parameter ) improve on AMP. The experience with survey propagation algorithm applied to constraint satisfaction problems Braunstein et al. (2005) suggests that this should be possible.

In Fig. 1 we plot the magnetization achieved by the ASP algorithm as a function of the noise for several values of the Parisi parameter s. We observe that as the noise decreases the equilibrium value (yellow) is reached first by the curve, corresponding to performance of AMP. In Fig. 3 we then plot the mean-squared-error as a function of the Parisi parameter for several values of the noise . Again we see that in all cases the best error is achieved with . Algorithmically this means that in the present setting, ASP never obtains better accuracy than the canonical AMP algorithm.

The fact that among all the values of the lowest MSE is reached by the states for all is unexpected from the physics point of view. It implies that the AMP that neglects glassiness and wrongly describes the hard region works better as an inference algorithm than an algorithm that correctly describes the metastable states in this region. At the same time, the above result could be anticipated based on mathematical theorem of Deshpande and Montanari (2015) that implies that AMP is optimal among all local algorithms. This theorem applies as long as an iterative algorithm only uses information from nearest neighbours and (nearly) reaches a fixed point after iterations.

V The replica approach to the posterior measure

In order to study the posterior measure, we define the corresponding free energy as

(8)

This is a random object since it depends on the matrix . Furthermore it depends on through the function . Indeed, we want to study the typical behavior of this sample-dependent free energy. Therefore we define

(9)

where is obtained as in Eq. (1), so that is given by

(10)

In order to perform the average defined in Eq. (9) we use the replica method Mézard et al. (1987). Introducing

(11)

we get

(12)

For integer we can represent as an -dimensional integral over replicas with . Stated in this way the problem is obviously symmetric under the exchange of the replicas among themselves. Moreover since we need to integrate over the signal distribution we end up with a system of replicas, that, in the Bayes optimal case, is symmetric under the permutation among all the replicas. Performing standard manipulations, see e.g. Mézard et al. (1987), we arrive at a closed expression for that is

(13)

where is a function that can be computed explicitly and and are overlap matrices. In the large limit, the integral in Eq. (13) can be evaluated using the saddle point method. At the saddle point level the physical meaning of the overlap matrix is given in terms of

(14)

while the matrix is just a Lagrange multiplier. We denote the magnetization of the system, meaning

(15)

The saddle point equations for and can be written in complete generality for any but then one needs to take the analytic continuation down to . One needs an appropriate scheme from which one can take the replica limit. Here we consider two schemes: the replica symmetric (RS) and the 1-step replica symmetry breaking (1RSB) one. We refer here to symmetry under permutations of the replicas with index .

v.1 Reminder of the replica symmetric solution

The RS scheme boils down to consider

(16)

From the point of view of the inference, the relevant quantity to look at is the Mean Square Error (MSE)

(17)

where . Replica symmetry among all the replicas is obtained for . It is well known that, as a direct consequence of Bayes optimality (also called Nishimori condition Zdeborová and Krzakala (2016)), this fully replica symmetric solution is the one that describes thermodynamically dominant states. The more general ansatz is, however, important as it allows to describes metastable states where the Nishimori identities might not hold. Plugging this ansatz inside the expression for and taking the saddle point equations w.r.t. all these parameters one gets the replica symmetric solution as reported in Lesieur et al. (2017b), and proven to give the equilibrium solution in Barbier et al. (2016); Lelarge and Miolane (2016). The RS free energy can be expressed as

(18)

with

(19)

where

(20)

and and are random variables distributed according

and a standard normal distribution, respectively. The values of

for which is stationary are the solution of

(21)

Equilibrium properties of the inference problem are given by the global minima of the free energy Eq. (19). Local minima of the free energy that do not correspond to the equilibrium solution are called metastable.

For illustration, we consider the case of the Rademacher-Bernoulli prior (2) and we set so that the inference problem has an hard phase Lesieur et al. (2017b). The replica symmetric phase diagram is represented in Fig. 1 (yellow curve).

Figure 1: The magnetization, aka the overlap, between the signal and the states described by the 1RSB solution at Parisi parameter , as a function of the noise strength , and sparsity . The curve that show a spinodal transition towards the strongly magnetized solution at largest values of is the one for . The same curve represents also the performance of the AMP algorithm. Taking the glassiness of the metastable branch into account does not improve upon AMP.

At high the noise is so strong that the signal cannot be recovered and therefore . Upon decreasing the signal is relatively stronger w.r.t the noise and for the system undergoes a dynamical transition. On the one hand one can see that the free energy (19) develops a local metastable minimum with . On the other hand, the state undergoes a clustering transition according to the pattern familiar in the physics of spin glasses Franz and Parisi (1995); Castellani and Cavagna (2005). The corresponding RS free energy ceases to describe a paramagnetic state and it describes a non-ergodic phase with an exponential number of metastable states - aka clusters - with zero overlap among each other and identical energy and internal entropy. Both the zero dominating branch and the metastable branch have identical energy and internal entropy. Their free energy difference is the complexity . Moreover, as we will see in the next section, the typical overlap between configurations in these states coincides with the value of of the magnetized solution. For that reason the magnetized state corresponds just to one cluster among the exponential multiplicity dominating the thermodynamics. The complexity (i.e. log of their number) of the thermodynamic states decreases with , until it vanishes at a value where there is the information theoretic phase transition and . The signal is here strong enough so that a first order phase transition happens where the minimum with positive magnetization becomes the global minimum of the free energy. The complexity of the solution becomes negative, the solution is non physical and consequently RSB is necessary to describe the metastable branch. Despite this fact, this RS metastable branch cannot be just dismissed as unphysical: it continues to be relevant algorithmically as dynamical attractor of the AMP algorithm. Decreasing the intensity of the noise further, another phase transition happens in this RS branch. At the metastable minimum develops a small magnetization. Decreasing even further , at this metastable minimum disappears with a spinodal transition. In the interval one finds the hard phase defined by the property that the AMP algorithm is sub-optimal (the shaded yellow region in Fig. 1): the global minimum of the free energy has a high (low MSE), but the small non-physical local minimum continues to describe the attractor of the AMP. The state evolution describing the AMP algorithm starting from random conditions converges to the local minimum of lowest magnetization.

v.2 Glassy phase and complexity

The low branch RS solution is non-physical below , its existence, however, suggests that metastable states exist that should be described with RSB. We therefore consider the 1RSB ansatz. We divide the replicas into blocks, where is the so-called Parisi parameter Mézard et al. (1987). The overlap matrix becomes

(22)

and analogous for . For strictly equal to one we get back the replica symmetric ansatz Eq. (16). Note that for , and are in general different in the solution: this is crucial when evaluating the MSE Eq. (17) as the minimum of the MSE does not correspond in general to the maximum of .

The 1RSB free energy takes the form

(23)

with

(24)

where

(25)

The stationary points of the 1RSB free energy are now obtained by the fixed points of

(26)

where , and and the extremum is a minimum in and a maximum in the other parameters.

We would like to reiterate here the observation that in the same way that the stationary points of the RS free energy correspond to state evolution fixed points of the AMP algorithm, the stationary points of the 1RSB free energy correspond to the fixed points of the state evolution of an approximate survey propagation algorithm that depends on Antenucci et al. (2018). In particular, the expression (17) exactly gives the MSE of such algorithm with and being the solution of (26).

For high enough the 1RSB solution collapses to the RS one, meaning that . At the saddle point equations for admit a solution with , . The value of in this solution coincides with the value of in the high magnetization RS branch discussed in the previous section. At the metastable states undergo an entropy crisis transition. Although the thermodynamically dominant state becomes the state with high correlation with the ground truth signal, glassy states continue to exist. In fact as far as these states are concerned - if we neglect the high magnetization state - the system undergoes there a Kauzmann transition where the dominant glassy states have zero complexity and a value of the Parisi parameter is determined by the condition that complexity (defined below) is equal to zero111Notice the analogy of the high-magnetization state here with the crystal state in the physics of glasses..

Let us now discuss solutions. It is well known that the Parisi parameter can be interpreted as an effective temperature that enables to select families of metastable states of given (internal) free energy Monasson (1995). Their corresponding complexity (defined as the log of their number) is obtained by deriving (24) w.r.t Monasson (1995), and multiplying the result by , i.e.

(27)

As expected this complexity for coincides with the free energy difference between the two RS branches discussed in the previous section.

In Fig. 2 we plot the complexity as function of both and of the noise variance . For each value of we find two regions: a physical region where is positive, and an non-physical one where . Note as the physical region with positive complexity continues not only below , but even well below .

The 1RSB solution is not guaranteed to give the exact description of the glassy states. It is well known that in the replica solutions should be stable against (further) breaking of the replica symmetry. This requires that all the eigenvalues of the Hessian of the free energy should be positive in the solution. The 1RSB solutions can loose stability in two possible ways, associated, to negative values of the following eigenvalues

Gardner (1985); Gross et al. (1985); Montanari et al. (2004):

(28)

where (, and )

(29)
(30)

A negative (type I instability) signals the appearance of new scales of distance between states. A negative on the other hand is met when the glassy states are unstable against a Gardner transition to further RSB Gardner (1985); Gross et al. (1985): each metastable state splits into a hierarchy of new states (type II instability) Montanari et al. (2004). In Fig. 2 we mark with full lines the stable region, with dashed lines the unstable ones. Type I instability is found for large in the non-physical region of negative complexity. Type II instability is found in the physical region at small values of and it has been found also in spin glass models Montanari and Ricci-Tersenghi (2003); Montanari et al. (2004); Crisanti et al. (2005).

Figure 2: The complexity of metastable states as a function of the Parisi parameter and the noise , for prior (2) with sparsity . Upper panel, complexity at fixed in the whole domain of existence of a non-trivial fixed point. Lower panel, the physical region of positive as function of . We draw the stable solutions with a solid line and the unstable, wrt the eigenvalues (28), with a dashed line. For each value of the value of represents the complexity of the family of thermodynamically dominating states. Below the solution in non-physical and . The algorithmic threshold of AMP occurs when the ghost-glassy states at have a spinodal transition towards the signal.

Let’s now discuss in detail the glassy solutions that one finds for representing metastable states with higher free energy than the high-magnetization solution. These solutions have zero or low magnetization (overlap with the signal). As already remarked, for a given , among all the glassy states the ones with lowest total free energy turn out to be the ones with zero complexity . For different fixed values of the parameter , the complexity curves reach zero value at different values of . Remarkably, as illustrated in Fig. 2 a stable (towards higher levels of RSB) zero-complexity solution is found down to a value of noise . Stable solutions of positive complexity exists down to , and solutions with positive complexity (irrespective of the stability) down to . Example of specific values for in Fig. 2 are , , , . This notably means that for , namely in the easy phase where AMP converges close to the signal, families of metastable states continue to exist, some of them being stable with extensive complexity.

One can discuss how do these states influence Monte-Carlo dynamics, that explore the space of configuration according to principles of physical dynamics. On the one hand, one could conjecture that Monte-Carlo dynamics gets trapped by glassy states even below . On the other hand, the dynamics is expected to fall out of equilibrium for all and it is not a priori clear in which states it should get trapped. While AMP clearly works for and does not work for , our analysis does not provide any reason why the threshold should be relevant for Monte Carlo or other sampling-based algorithms. For such physical dynamics, numerical simulations and analytic studies in suitable models are necessary to clarify the question of what is the corresponding algorithmic threshold.

So far we focused on glassy states of positive complexity (i.e. existing with probability one for typical instance). There are also solutions of the 1RSB equations having negative complexity. We will call the negative-complexity solution the ghost-glassy states. From the physics point of view those solutions do not correspond to physical states for typical instances. Yet, from the algorithmic point of view they do correspond to the fixed points of the ASP algorithm Antenucci et al. (2018) run for a given value of Parisi parameter , as such they can be reached algorithmically. At this point it becomes relevant to understand for which value do the ghost-glassy state disappear, developing a spinodal instability towards the high-magnetization state. In particular we can ask the natural question if with a suitable choice of the Parisi parameter the ASP improves over the algorithmic threshold of the usual AMP () and if we could have an for which . With this question in mind in Fig. 3 we plot the mean-squared error (MSE) with the ground truth signal given by Eq. (17) as a function of for various values of . We initialize the 1RSB fixed point equations at infinitesimal magnetization and iterate them till a fixed point. We observe that for all values of the MSE is minimized for , i.e. by the canonical AMP algorithm.

Figure 3: The MSE as a function of the Parisi parameter for different values of the noise strength . The smallest MSE is always reached for , corresponding to the performance of the AMP algorithm, with a threshold at .

Vi Conclusion

In conclusion, we studied the glassy nature of the hard phase in inference problems. Our results imply that indeed the corresponding metastable state is glassy, i.e. composed of exponentially many states. We evaluate their number (complexity) as a function of their internal free energy to conclude that this glassiness extends to a range of the noise parameter even larger than the extent of the the hard phase. This finding re-opens the natural question of performance limits of Monte-Carlo based sampling. While some recent works Decelle et al. (2011) anticipated numerically that Monte-Carlo and message passing will share the same algorithmic threshold, our results do not provide any evidence of this. Instead they suggest that since glassiness is present also below the algorithmic threshold of AMP the performance of sampling-based algorithms will be different in general. In order to validate this proposition one needs to study a different model than the present one. The present model is dense and thus not suitable for large scale simulations, also analytically tractable description of sampling-based dynamics for the present model is a major open question. One possibility is to perform large-scale numerical study with Monte-Carlo based dynamics in diluted models such as those studied in Ricci-Tersenghi et al. (2018). Another possibility is to aim at analytical description of the Langevin dynamics that is known in a tractable form so far only for mixtures of spherical -spin models.

While we anticipate that the performance of the usual sampling-based algorithms will be hampered by the glassiness, it is an interesting open question to investigate whether other algorithms are able to match the performance of AMP. We have in mind for instance the algorithms based on the robust ensemble as introduced in Baldassi et al. (2016).

Concerning the AMP algorithm, we conclude that, despite the fact that it assumes the hard-phase not to be glassy, the improved description in terms of one-step replica symmetry breaking, that takes glassiness into account, does not provide algorithmic improvement. This is at variance with the situation in random constraint satisfaction problems, where the knowledge of the organization space of solutions provided by 1RSB leads to algorithmic improvement Braunstein et al. (2005). We note that this observation is surprising, and we are missing a physically intuitive explanation for why taking glassiness into account improves performance in optimization problems but not in Bayes-optimal inference. We stress that our results provide strong evidence towards the conjecture that the hard phase is impenetrable for some computationally fundamental reasons. Further investigation of this is an exciting direction both for physics and theoretical computer science.

In this paper we use the example of low-rank matrix estimation with spins and as a prototypical example in which the hard phase exists. We checked that the resulting picture applies in a range of parameters and also for some other models (such as planted mixed -spin model) where the hard phase was identified. We expect the picture presented here to be generic in all the problems where the hard phase related to a first order phase transition was identified.

We also note that our above conclusions apply to the case of Bayes-optimal inference where the generative model is matched to the inference model. In case the hyper-parameters are not known or mismatched the message passing algorithm that takes glassiness into account can provide better error and robustness, this is investigated in detail in Antenucci et al. (2018).

Finally, we mention that the results shown here may be compelling also beyond inference problems. In particular, the instabilities of the RS solution at and can be related to a similar phenomenon occurring in the mean field theory of liquids and glasses Parisi and Zamponi (2010); Charbonneau et al. (2017). A phase structure similar to the one presented in this paper is found in that case, if we identify as analogue to an (inverse) density parameter and the reconstruction phase as the crystal. Also in that case, the RS solution representing the liquid at low density describes a non-ergodic extensive complexity phase at higher density. As it is the case here, there is a density where complexity vanishes, but the solution can be continued below this point. Finally, there is a maximum density where the solution undergoes an instability - called Kirkwood instability - and ceases to exist Frisch and Percus (1999); Mari and Kurchan (2011). Our analysis suggests that within inference models not only the non-physical negative complexity RS solution could undergo this instability, but also the glassy ones. Whether this phenomenon could be relevant for other glassy systems is an intriguing question.

Acknowledgments

We would like to thank Giulio Biroli, Florent Krzakala, and Guilhem Semerjian for fruitful discussions. This work is supported by "Investissements d’Avenir" LabEx PALM (ANR-10-LABX-0039-PALM) (SaMURai and StatPhysDisSys projects), and from the ERC under the European Unions Horizon 2020 Research and Innovation Programme Grant Agreement 714608-SMiLe. The work of SF was supported by a grant from the Simons Foundation (No. 454941, Silvio Franz).

References

  • Grassberger and Nadal (1992) Peter Grassberger and Jean-Pierre Nadal, “From statistical physics to statistical inference and back(cargèse, august 31- september 1, 1992),” NATO ASI series. Series C: mathematical and physical sciences  (1992).
  • Zdeborová and Krzakala (2016) Lenka Zdeborová and Florent Krzakala, “Statistical physics of inference: Thresholds and algorithms,” Advances in Physics 65, 453–552 (2016).
  • Richardson and Urbanke (2008) Tom Richardson and Ruediger Urbanke, Modern coding theory (Cambridge university press, 2008).
  • Mézard and Montanari (2009) Marc Mézard and Andrea Montanari, Information, physics, and computation (Oxford University Press, 2009).
  • Krzakala et al. (2012) Florent Krzakala, Marc Mézard, Francois Sausset, Y F Sun,  and Lenka Zdeborová, “Statistical-physics-based reconstruction in compressed sensing,” Physical Review X 2, 021005 (2012).
  • Decelle et al. (2011)

    Aurelien Decelle, Florent Krzakala, Cristopher Moore,  and Lenka Zdeborová, “Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications,” Physical Review E 

    84, 066106 (2011).
  • Deshpande and Montanari (2015) Yash Deshpande and Andrea Montanari, “Finding hidden cliques of size sqrt N/e n/e in nearly linear time,” Foundations of Computational Mathematics 15, 1069–1128 (2015).
  • Montanari (2015) Andrea Montanari, “Finding one community in a sparse graph,” Journal of Statistical Physics 161, 273–299 (2015).
  • Richard and Montanari (2014) Emile Richard and Andrea Montanari, “A statistical model for tensor pca,” in Advances in Neural Information Processing Systems (2014) pp. 2897–2905.
  • Lesieur et al. (2017a) Thibault Lesieur, Léo Miolane, Marc Lelarge, Florent Krzakala,  and Lenka Zdeborová, “Statistical and computational phase transitions in spiked tensor estimation,” in Information Theory (ISIT), 2017 IEEE International Symposium on (IEEE, 2017) pp. 511–515.
  • Györgyi (1990)

    Géza Györgyi, “First-order transition to perfect generalization in a neural network with binary synapses,” Physical Review A 

    41, 7097 (1990).
  • Mézard et al. (1987) Marc Mézard, Giorgio Parisi,  and Miguel Ángel Virasoro, Spin glass theory and beyond (World Scientific, Singapore, 1987).
  • Donoho et al. (2009) David L Donoho, Arian Maleki,  and Andrea Montanari, “Message-passing algorithms for compressed sensing,” Proceedings of the National Academy of Sciences 106, 18914–18919 (2009).
  • Bayati and Montanari (2011) Mohsen Bayati and Andrea Montanari, “The dynamics of message passing on dense graphs, with applications to compressed sensing,” IEEE Transactions on Information Theory 57, 764–785 (2011).
  • Kudekar et al. (2011) Shrinivas Kudekar, Thomas J. Richardson,  and Rüdiger L. Urbanke, “Threshold saturation via spatial coupling: Why convolutional ldpc ensembles perform so well over the bec,” IEEE Transactions on Information Theory 57, 803–834 (2011).
  • Sompolinsky et al. (1990) Haim Sompolinsky, Naftali Tishby,  and H Sebastian Seung, “Learning from examples in large neural networks,” Physical Review Letters 65, 1683 (1990).
  • Franz et al. (2001) Silvio Franz, Marc Mézard, Federico Ricci-Tersenghi, Martin Weigt,  and Riccardo Zecchina, “A ferromagnet with a glass transition,” EPL (Europhysics Letters) 55, 465 (2001).
  • Krzakala and Zdeborová (2009) Florent Krzakala and Lenka Zdeborová, “Hiding quiet solutions in random constraint satisfaction problems,” Physical Review Letters 102, 238701 (2009).
  • Mézard et al. (2002) Marc Mézard, Giorgio Parisi,  and Riccardo Zecchina, “Analytic and algorithmic solution of random satisfiability problems,” Science 297, 812–815 (2002).
  • Lesieur et al. (2015a) Thibault Lesieur, Florent Krzakala,  and Lenka Zdeborová, “Phase transitions in sparse pca,” in Information Theory (ISIT), 2015 IEEE International Symposium on (IEEE, 2015) pp. 1635–1639.
  • Lesieur et al. (2017b) Thibault Lesieur, Florent Krzakala,  and Lenka Zdeborová, “Constrained low-rank matrix estimation: Phase transitions, approximate message passing and applications,” Journal of Statistical Mechanics: Theory and Experiment 2017, 073403 (2017b).
  • Deshpande and Montanari (2014) Yash Deshpande and Andrea Montanari, “Information-theoretically optimal sparse pca,” in Information Theory (ISIT), 2014 IEEE International Symposium on (IEEE, 2014) pp. 2197–2201.
  • Krzakala et al. (2016) Florent Krzakala, Jiaming Xu,  and Lenka Zdeborová, “Mutual information in rank-one matrix estimation,” in Information Theory Workshop (ITW), 2016 IEEE (IEEE, 2016) pp. 71–75.
  • Barbier et al. (2016) Jean Barbier, Mohamad Dia, Nicolas Macris, Florent Krzakala, Thibault Lesieur,  and Lenka Zdeborová, “Mutual information for symmetric rank-one matrix estimation: A proof of the replica formula,” in Advances in Neural Information Processing Systems (2016) pp. 424–432.
  • Lelarge and Miolane (2016)

    Marc Lelarge and Léo Miolane, “Fundamental limits of symmetric low-rank matrix estimation,” Probability Theory and Related Fields , 1–71 (2016).

  • Rangan and Fletcher (2012) Sundeep Rangan and Alyson K Fletcher, “Iterative estimation of constrained rank-one matrices in noise,” in Information Theory Proceedings (ISIT), 2012 IEEE International Symposium on (IEEE, 2012) pp. 1246–1250.
  • Lesieur et al. (2015b) Thibault Lesieur, Florent Krzakala,  and Lenka Zdeborová, “Mmse of probabilistic low-rank matrix estimation: Universality with respect to the output channel,” in Communication, Control, and Computing (Allerton), 2015 53rd Annual Allerton Conference on (IEEE, 2015) pp. 680–687.
  • Antenucci et al. (2018) Fabrizio Antenucci, Florent Krzakala, Pierfrancesco Urbani,  and Lenka Zdeborová, “Approximate survey propagation for statistical inference,”  (2018), arXiv:1807.01296.
  • Braunstein et al. (2005) Alfredo Braunstein, Marc Mézard,  and Riccardo Zecchina, “Survey propagation: An algorithm for satisfiability,” Random Structures & Algorithms 27, 201–226 (2005).
  • Franz and Parisi (1995) Silvio Franz and Giorgio Parisi, “Recipes for metastable states in spin glasses,” Journal de Physique I 5, 1401–1415 (1995).
  • Castellani and Cavagna (2005) Tommaso Castellani and Andrea Cavagna, “Spin glass theory for pedestrians,” Journal of Statistical Mechanics: Theory and Experiment 2005, P05012 (2005).
  • (32) Notice the analogy of the high-magnetization state here with the crystal state in the physics of glasses.
  • Monasson (1995) Rémi Monasson, “Structural glass transition and the entropy of the metastable states,” Phys. Rev. Lett. 75, 2847–2850 (1995).
  • Gardner (1985) Ed Gardner, “Spin glasses with p-spin interactions,” Nuclear Physics B 257, 747–765 (1985).
  • Gross et al. (1985) D. J. Gross, Ido Kanter,  and Haim Sompolinsky, “Mean-field theory of the potts glass,” Physical Review Letters 55, 304 (1985).
  • Montanari et al. (2004) Andrea Montanari, Giorgio Parisi,  and Federico Ricci-Tersenghi, “Instability of one-step replica-symmetry-broken phase in satisfiability problems,” Journal of Physics A: Mathematical and General 37, 2073 (2004).
  • Montanari and Ricci-Tersenghi (2003) Andrea Montanari and Federico Ricci-Tersenghi, “On the nature of the low-temperature phase in discontinuous mean-field spin glasses,” The European Physical Journal B-Condensed Matter and Complex Systems 33, 339–346 (2003).
  • Crisanti et al. (2005) Andrea Crisanti, Luca Leuzzi,  and Tommaso Rizzo, “Complexity in mean-field spin-glass models: Ising p-spin,” Physical Review B 71, 094202 (2005).
  • Ricci-Tersenghi et al. (2018) Federico Ricci-Tersenghi, Guilhem Semerjian,  and Lenka Zdeborova, “Typology of phase transitions in bayesian inference problems,” arXiv preprint arXiv:1806.11013  (2018).
  • Baldassi et al. (2016) Carlo Baldassi, Christian Borgs, Jennifer T Chayes, Alessandro Ingrosso, Carlo Lucibello, Luca Saglietti,  and Riccardo Zecchina, “Unreasonable effectiveness of learning neural networks: From accessible states and robust ensembles to basic algorithmic schemes,” Proceedings of the National Academy of Sciences 113, E7655–E7662 (2016).
  • Parisi and Zamponi (2010) Giorgio Parisi and Francesco Zamponi, “Mean-field theory of hard sphere glasses and jamming,” Rev. Mod. Phys. 82, 789–845 (2010).
  • Charbonneau et al. (2017) Patrick Charbonneau, Jorge Kurchan, Giorgio Parisi, Pierfrancesco Urbani,  and Francesco Zamponi, “Glass and jamming transitions: From exact results to finite-dimensional descriptions,” Annual Review of Condensed Matter Physics 8, 265–288 (2017).
  • Frisch and Percus (1999) H. L. Frisch and J. K. Percus, “High dimensionality as an organizing device for classical fluids,” Physical Review E 60, 2942 (1999).
  • Mari and Kurchan (2011) Romain Mari and Jorge Kurchan, “Dynamical transition of glasses: From exact to approximate,” The Journal of chemical physics 135, 124504 (2011).