# The closest vector problem and the zero-temperature p-spin landscape for lossy compression

We consider a high-dimensional random constrained optimization problem in which a set of binary variables is subjected to a linear system of equations. The cost function is a simple linear cost, measuring the Hamming distance with respect to a reference configuration. Despite its apparent simplicity, this problem exhibits a rich phenomenology. We show that different situations arise depending on the random ensemble of linear systems. When each variable is involved in at most two linear constraints, we show that the problem can be partially solved analytically, in particular we show that upon convergence, the zero-temperature limit of the cavity equations returns the optimal solution. We then study the geometrical properties of more general random ensembles. In particular we observe a range in the density of constraints at which the systems enters a glassy phase where the cost function has many minima. Interestingly, the algorithmic performances are only sensitive to another phase transition affecting the structure of configurations allowed by the linear constraints. We also extend our results to variables belonging to GF(q), the Galois Field of order q. We show that increasing the value of q allows to achieve a better optimum, which is confirmed by the Replica Symmetric cavity method predictions.

• 8 publications
• 4 publications
• 1 publication
• 13 publications
02/20/2019

### Emergence of order in random languages

We consider languages generated by weighted context-free grammars. It is...
04/08/2018

### Complex energy landscapes in spiked-tensor and simple glassy models: ruggedness, arrangements of local minima and phase transitions

We study rough high-dimensional landscapes in which an increasingly stro...
10/11/2021

### On the energy landscape of symmetric quantum signal processing

Symmetric quantum signal processing provides a parameterized representat...
05/29/2018

### Statistical mechanical analysis of sparse linear regression as a variable selection problem

An algorithmic limit of compressed sensing or related variable-selection...
02/23/2020

### Automatic Cost Function Learning with Interpretable Compositional Networks

Cost Function Networks (CFN) are a formalism in Constraint Programming t...
06/27/2014

### An interacting replica approach applied to the traveling salesman problem

We present a physics inspired heuristic method for solving combinatorial...
05/20/2016

### Unreasonable Effectiveness of Learning Neural Networks: From Accessible States and Robust Ensembles to Basic Algorithmic Schemes

In artificial neural networks, learning from data is a computationally d...

## I Introduction

The Closest Vector Problem (CVP) is a constrained optimization problem in which the objective is to find the vector in a high-dimensional lattice that is closest to a given reference vector (in principle, external to the lattice). In this work we will study lattices defined by linear subspaces of , the -dimensional vector space over the Galois Field . In particular for the case (which we will mainly study), the lattice is defined as the solution set of a XORSAT instance, and the CVP can also be recast in the problem of finding the ground state of an Ising spin glass model with external fields.

CVP is a fundamental problem in theoretical computer science: some versions of CVP were among the first ones where an equivalence between worst case and average case complexity has been shown. Such an equivalence has been exploited to propose a robust cryptosystem [1] . Nonetheless the CVP has been proven to be computationally NP-hard [2] and hard to approximate, even allowing a potentially slow pre-processing step [23, 15].

The main motivation for studying this model is that it is an ideal framework to understand, via statistical mechanics computations, the mechanisms underlying the hardness in approximating optimal solutions. As we will see, it shows a wide range of non-trivial properties in the geometry of the solution landscape, while (partially) conserving some analytic feasibility coming from the fact that its configurations are solutions of a linear system in .

Surprisingly, we will show that approximating optimal solutions is hard even in a region of parameters where the space of solutions is “well connected”. In this region the model corresponding to the uniform measure over solutions looks nice enough (e.g. it is in a paramagnetic phase, correlations decay fast enough, a replica-symmetric solution well describes the Gibbs measure). Nonetheless when we add the external field to search for the closest solution to a reference vector, the scenario changes dramatically: ergodicity breaking phase transitions take place and the problem becomes very hard.

Given that the search space is always the same (solutions to a XORSAT constraint satisfaction problem, CSP) the addition of the external field can be seen as a reweighting of the CSP solution space. It is well known that the reweighting of the solution space can induce ergodicity breaking phase transitions [27] and change the location of the phase transitions [9, 10]. In the present model we are going to show how important the effects of the reweighting can be and how they can affect algorithms searching for optimal solutions, which are relevant is several common applications.

### i.1 Compression

The CVP has a straightforward application to the lossy compression of a symmetric binary source (source coding). In this context, the compression task is to take an input and to reduce it into a compressed version with . The decompression task transforms into . The distortion is defined as the Hamming distance , i.e. the number of differing components between the two vectors. A good compression scheme is designed to result in the smallest possible distortions, and the performance of the scheme can be measured in terms of average distortion on random binary inputs.

As an example, one trivial compression scheme consists in truncating the input to the first components (compression) and reconstructing randomly the last components (decompression). Of course, it is possible to do much better. In fact, the best possible performance for a given input distribution has been characterized by Shannon [32, 30, 31] thanks to a duality with the channel coding problem. The smallest achievable average distortion for a given pair on a binary symmetric source is given by the equation

 D=H−1(1−R) (1)

where is called the compression rate, and where

 H(p)=−plog2(p)−(1−p)log2(1−p)

is the binary entropy function and is its inverse. Interestingly, a theoretical asymptotically optimal (but computationally inefficient) scheme is formed by random codes. Random codes are constructed as follows: choose random vectors in . The compression scheme consists in finding the vector that is closest to . The binary representation of will be the compressed vector. When with fixed compression rate , the average distortion falls on the optimal line [32]. As it happens with the dual channel coding problem, a computationally more efficient alternative can be constructed by replacing random vectors by solutions of a random linear system (in a discrete space), and this is the approach that has been taken in [5] and that we will take here. In particular for , random codes correspond to the solution set of instances of a random XORSAT ensemble.

### i.2 Relation with previous works

Let be an matrix with binary entries, and let be a -component vector. An instance of the XORSAT problem is given by a pair , the solution set of this instance being the set of vectors satisfying the equation modulo 2. The random XORSAT ensemble is defined by taking uniformly at random in , and by taking

from some random matrix ensemble. In this paper, we will study the ensemble in which

is sampled uniformly over the set of matrices having a prescribed distribution in the number of non-zero entries per rows and per columns. The random XORSAT problem has been studied extensively in the past [21, 11]. A striking feature is the appearance of phase transitions, or threshold phenomena in the thermodynamic limit, when the number of variables and the number of linear constraints go to infinity at a fixed value of the ratio , the density of constraints per variable. For instance the satisfiability threshold separates a satisfiable regime where random XORSAT instances do admit solutions from an unsatisfiable phase where no solution typically exists. Another transition occurs in the satisfiable phase at , called the clustering transition. Below the solution set of typical instances is rather well connected, any solution can be reached by any other through a path of nearby solutions. Above

the solution set splits into a exponential number of distinct groups of solutions, called clusters, which are internally well connected, but well separated one from the other. This transition also manifests itself with the appearance of a specific type of correlations between variables, known as point-to-set correlations, under the uniform probability measure over the set of solutions. These correlations forbid the rapid equilibration of stochastic processes that respect the detailed balance condition

[24], which justifies the alternative ‘dynamic’ name of the clustering transition.

In [5], a Belief Propagation (BP) scheme has been employed on an ensemble of linear codes called cycle codes. These cycle codes correspond to systems of linear equations (in ) in which each variable participates in at most two equations. It has been observed that the performance of BP improves by adding some leaves (variables of degree one) to the linear system, but not too many of them. In this work, we compute the analytic achievable performance of such codes through the cavity method, and provide a rigorous proof of the exactness of the zero-temperature version of the cavity equations on cycle codes. We then extend our results to codes with higher degrees, and show that this allows to improve the performance of lossy compression. We study the clustering transition for the constrained optimization problem CVP, and show that its clustering threshold arises for density of constraints smaller than the clustering threshold associated to the random XORSAT problem defining the set of constraints. We also study the performances of message-passing algorithms designed to solve this constrained optimization problem. Interestingly, we observe that these algorithms are not affected by the clustering transition associated to the constrained optimization problem (occurring at ), but instead are only affected by the XORSAT clustering transition occurring at .

### i.3 Relation between sparse basis and clustering

Given a basis of the solution space, we define the weight of the basis as the maximum Hamming weight of its elements. We can now establish a fundamental relation between a geometrical property of the solution space and the weight of the lightest sparse basis: for a linear system in , having a basis with weight is equivalent to having all solutions of the system connected between each other by ‘paths’ of solutions with jumps between consecutive ones of hamming distance . This can be proven easily. Suppose indeed that we have a basis in which each element has Hamming weight . Take any two solution vectors , and write in the basis where . Let . Construct the sequence , for . The difference between two points in the sequence is a basis element, so has weight , and the sequence forms a path from to . Conversely, suppose that all solutions can be connected to the vector with paths of jumps with hamming distance . The set of all those vectors ’s thus spans the full set of solutions. It is possible then to extract a solution basis from such a set, and all elements in the basis will have Hamming weight . In summary, the weight of the sparsest basis determines the largest separation between “clusters” or groups of solutions. If the weight of the sparsest basis is , then there will be at least two subsets of solutions separated by hamming distance .

### i.4 Organization

The paper is organized as follows. In section II we define more precisely the constrained optimization problem under study and the statistical physics model associated to it. We present the equations that describe its behavior in the framework of the cavity method from statistical mechanics, and the algorithms that we used to solve its instances. In section III we study the case of cycle codes, i.e. when the binary variables are involved in at most linear constraints. In this particular case we prove that when it converges, Max-Sum algorithm finds the optimal solution. Moreover, we design an exact greedy algorithm, that we call GO for ‘greedy optimal’ that is guaranteed to converge to the optimal solution. In section IV, we study a random ensemble in which the binary variables are involved in more than linear constraints, and we show that in this ensemble the average minimal distance is smaller than the one for cycle codes, thus providing a better code for lossy compression. We show that CVP undergoes a clustering transition before the clustering transition associated to the XORSAT problem representing the constraints. We study the behavior of three algorithms: Belief-Propagation with decimation, Max-Sum with reinforcement, and Survey-Propagation with decimation, and show that their performances are only affected by the clustering transition associated to the XORSAT problem. In section V, we provide a more detailed picture of the phase diagram obtained in section IV. In particular we perform a finite temperature study to relate the two clustering transitions occurring at and that correspond respectively to the cases of zero and infinite temperature. We also argue in favor of a full RSB transition occurring at higher densities of constraints, that prevented us to obtain a reliable prediction of the minimal distortion in this regime. In appendix A we present the results obtained with variables in , the Galois Field of order . We show that cycle codes with higher value of allows to achieve smaller average minimal distortion, thus providing better codes for lossy compression. This trend is confirmed by the zero-temperature Replica Symmetric prediction. We however argue in favor of a RSB transition as increases, that could prevent from an efficient compression scheme when becomes large. The RS and 1RSB formalism specified for the CVP problem is given in Appendix B. In appendix C we show that for cycle codes with binary variables, it is possible to build a basis whose weight is upper bounded by minimal-size rearrangements computed in [26] by A. Montanari and G. Semerjian. When this upper-bound remains finite in the thermodynamic limit, this allows us to conclude from the discussion in I.3 that the solution set of XORSAT is well-connected.

## Ii Definition of the model and statistical mechanics formalism

### ii.1 Definition of the model

The constrained optimization problem can be formulated as follows. Given a reference vector and a linear subspace , find a vector that is the closest to :

 ^x––=argminx––∈CdH(x––,y–) (2)

where

 dH(x––,y–)=1nn∑i=1xi⊕yi (3)

is the Hamming distance. The linear subspace is defined as the solution set of an homogeneous XORSAT instance: let be an matrix with boolean entries , , , then:

 C={x––∈{0,1}n:Hx––=0} . (4)

Here homogeneous means that the r.h.s. of the linear system is equal to the null vector . One can encode the topology of a XORSAT instance into a bipartite graph . The set of variable nodes represents the binary variables . The set of factor nodes represents the constraints encoded in the rows of . An edge is drawn if variable is involved in the -th constraint: . By means of a simple change of variables,

 σi =(−1)xi (5) si =(−1)yi

the problem can be re-written as a statistical physics model. We define the probability law:

 μ(σ––)=1Z(β)(m∏a=1I[∏i∈∂aσi=1])eβ∑ni=1siσi (6)

with being the indicator function of the event . The problem (2) is then equivalent to finding the configuration maximizing the probability law , or equivalently minimizing the energy function

 E(σ)=−n∑i=1σisi (7)

under the set of constraints . Note that can be related to the distortion as follows: .

It will be convenient to define a softened version of the probability law , by replacing the hard constraints on the factors by a soft constraint , with a real parameter:

 μJ(σ––)=1Z(β,J)e−βEJ(σ––) (8)

where:

 EJ(σ––)=−Jm∑a=1(∏i∈∂aσi−1)−n∑i=1σisi . (9)

The first term is bringing an energetic cost to each unsatisfied clause, while the second term is the original energy function, which favors configurations close to the source. In statistical physics, this model is known as a spin glass model in presence of heterogeneous external fields . Sending allows to recover the probability law defined in (6).

### ii.2 Random ensemble of instances

We will be interested in the characterization of the ‘typical’ properties of this constrained optimization problem, where typical is defined with respect to a random ensemble of instances, a property being considered typical if it occurs with a probability going to one in the thermodynamic (large size) limit. In particular, we will consider random external fields in which each external field is i.i.d. uniformly in . As we have seen in the previous subsection, the set of constraints can be represented by a bipartite graph . We will consider random graph ensembles with fixed degree profiles, denoted . Let be the degree profile of the variable nodes, with the maximal degree, and the fraction of variable nodes of degree . Respectively, let be the degree profile of the factor nodes, with the maximal degree, and the fraction of factor nodes of degree . The degree profiles are normalized: and , and they satisfy the following relation

 mkmax∑i=1ipi=ndmax∑i=1iλi=|E| .

We will be interested in the thermodynamic limit , with a fixed ratio , and fixed fractions ’s independent of . The ratio is called the density of constraints per variable and is related to the degree profiles as follows: . In the thermodynamic limit, random graphs extracted from are locally tree-like: the neighborhood of an uniformly chosen vertex within a finite distance is acyclic, with probability going to . Note that in the formalism of lossy compression, the compression rate can be expressed in terms of the degree profiles:

 R=1−α=1−∑dmaxi=1iλi∑kmaxi=1ipi (10)

The equivalence between the CVP and the spin glass model with external fields allows us to apply the cavity method. This method has been first developed in the context of statistical physics of disordered systems, and has later on been applied to random Constraint Satisfaction Problems. The aim of the cavity method is to characterize the properties of the probability measure (6), for typical random graphs in and realization of the external fields , in the thermodynamic limit. In particular, we will be interested in the zero-temperature limit of the cavity method, at which the probability measure (6) concentrates on the configurations satisfying the constraints and achieving the minimal energy. A simplified version of the cavity method, especially of the 1RSB formalism, first derived in [20, 22], can be obtained in this limit. We will also be interested in the finite-temperature (or finite-) version of the cavity method (see V.2). We give the details of the cavity method applied to the CVP in the appendix B.

### ii.3 BP equations and Bethe free-energy

Belief-Propagation (BP) is a method that allows to study the properties of the measure defined in (6) on a single instance, and at finite inverse temperature . When the bipartite graph representing the constraints is a tree, this method is exact, and allows to compute the partition function , as well as the marginal probabilities of any variable

. In practice, the BP method is also used as a heuristic on random sparse instances. For each variable node

, we denote by the set of factor nodes connected to , and similarly for each factor node the set of variable nodes connected to : . We introduce the Belief-Propagation (BP) messages and on each edge as the marginal probability laws of in the amputated graph where some interactions are discarded: is the marginal of when the hyperedge is removed, and is the marginal of when one removes all the hyperedges in . The BP messages obey the following recursive equations:

 mi→a(σi)=1zi→aeβσisi∏b∈∂i∖a^mb→i(σi)^ma→i(σi)=1^za→i∑σ––∂a∖iI[∏i∈∂aσi=1]∏j∈∂a∖imj→a(σj) (11)

where are normalization factors. One can compute the marginal probability of from the solution of the above set of equations:

 μi(σi)=1zieβsiσi∏a∈∂i^ma→i(σi) (12)

The Free Energy can be expressed in terms of BP messages using the Bethe formula:

 FBethe(m––,^m––) = ∑(i,a)∈E1βlogZia(mi→a,^ma→i) (13) −m∑a=11βlogZa({mi→a}i∈∂a) −n∑i=11βlogZi({^ma→i}a∈∂i)

where are defined as follow:

 Za =∑σ––∂aI[∏i∈∂aσi=1]∏i∈∂ami→a(σi) (14) Zi =∑σieβsi,σi∏a∈∂i^ma→i(σi) (15) Zia =∑σimi→a(σi)^ma→i(σi) (16)

Finally, one can also compute the average energy in terms of BP beliefs:

 ⟨E(σ––)⟩μ=−n∑i=1∑σisiσiμi(σi) (17)

### ii.4 Zero-temperature limit: Max-Sum equations and Bethe energy

The Max-Sum (MS) equations can be seen as the zero-temperature limit of the BP equations. The goal here is to describe the set of configurations which maximize the probability in (6), i.e. which solves the constrained optimization problem (2). We define Max-Sum messages as:

 hi→a=limβ→∞12β(logmi→a(+)−logmi→a(−))ui→a=limβ→∞12β(log^ma→i(+)−log^ma→i(−)) (18)

Using this definition and the BP equations we get the following MS equations (associated to the probability law defined in (6) for hard constraints):

 hi→a=si+∑b∈∂i∖aub→iua→i=sign⎛⎝∏j∈∂a∖ihj→a⎞⎠minj∈∂a∖i(|hj→a|) (19)

Once a solution to the MS equations is found, one can compute the Max-Sum belief :

 hi=limβ→∞12β(log(bi(+))−log(bi(−)))=si+∑a∈∂iua→i (20)

which corresponds in case of a tree to the difference in energy , where is the ground state energy when is fixed to the value . Since the energy function takes only integer values, one can deduce that the Max-Sum messages satisfying equations (19) and Max-Sum beliefs (corresponding to differences in energy) also take integer values. Replacing the hard constraints by soft constraints is equivalent to introducing a cut-off on the values of the factor-to-variable . The variable-to-factor messages then take values . Note that the MS equation (19) on is replaced by

 ua→i =sign⎛⎝∏j∈∂a∖ihj→a⎞⎠min(minj∈∂a∖i|hj→a|,J) (21)

One can compute the minimal energy as the large limit of the Free Energy (13):

 EBethe(h––,u––)= m∑a=1Ea({hi→a}i∈∂a) + n∑i=1Ei({ua→i}a∈∂i) + ∑(i,a)∈EEia(hi→a,ua→i), (22)

which is exact when the factor graph is a tree. In the above expression are defined as follows:

 Ea({hi→a}i∈∂a)= 2mini∈∂a(|hi→a|)Θ(∏i∈∂ahi→a) Ei({ua→i}a∈∂i)= −∣∣ ∣∣si+∑a∈∂iua→i∣∣ ∣∣+∑a∈∂i|ua→i| Eia(hi→a,ua→i)= −|ua→i+hi→a| +|ua→i|+|hi→a| (23)

With soft constraints, i.e.  finite, the factor contribution to the Bethe minimal energy (22) is replaced by

 Ea({hi→a}i∈∂a)=2min(J,mini∈∂a|hi→a|)Θ(∏i∈∂ahi→a). (24)

### ii.5 Decimation

The output of the BP algorithm is just an estimate of single-site marginals, and to find a solution to the optimization problem, one needs to convert these marginals into a specific spin configuration. However, note that picking

where does not lead to a good result in general, as it disregards existing correlations between variables (e.g. in case of problems with hard constraints this strategy can lead to inconsistencies, since in general does not satisfy the constraints). To overcome this issue one typically resorts to decimation, i.e. a sequential assignment of the variables according to their beliefs.

In practice, we use the fact that the set of XORSAT constraints is a linear system of equations to improve our algorithm. We first build a basis for this linear system (e.g. by means of Gaussian elimination), thereby identifying a subset of independent variables. The decimation procedure is then applied only to these independent variables. Once all independent variables are fixed, the remaining variables are determined by the linear constraints, thus ensuring that we obtain a solution to the linear system. At each time step, the algorithm solves iteratively the Belief Propagation equations (11) and computes the marginal probabilities of each variable. Then, the algorithm picks the most biased variable, i.e.  among the independent variables that are not yet decimated, samples according to its marginal and switches on a strong external field in the corresponding direction, in such a way that is now fixed in the direction of its belief. We worked with sufficiently large value of (in practice we used ), such that the measure (6) is reasonably concentrated around configurations achieving the minimal energy . In the limit of (Max-Sum equations), the system should be fully concentrated on the configurations of minimal energy. If the minimum is not unique, then decimation is still needed to break the symmetry between equivalent ground states. Alternatively, one can add a small symmetry-breaking random external field so that the ground state becomes unique. This is the strategy we adopted for Max-Sum.

### ii.6 Reinforcement

An alternative to decimation is reinforcement, which consists in updating the external field on a variable according to its belief, thus guiding the system to full polarization. Reinforcement is also sometimes called soft decimation, as it sets at each iteration a soft external field of all variables instead of a strong field on only one variable.

The reinforcement procedure can also be employed to help convergence of MS equations, the small external fields accumulate during time (before convergence) to drive the system to a model with strong external fields for which convergence is easier to achieve.

At each iteration of the Max-Sum algorithm (19), the external field on each variable is updated according to its belief

 s(t+1)i=s(t)i+γ(t)h(t)i (25)

with the Max-Sum belief (20) computed at time , and a used-defined reinforcement schedule. In practice we used the same value at each iteration where is the number of iterations.

### ii.7 Survey Propagation

In regions of the parameter space in which the 1RSB formalism is more appropriate, one could try to employ an iterative algorithm based on its description. One possibility is Survey Propagation, that in a completely analogous way to the Belief Propagation algorithm, iterates 1RSB equations (described later in (73)). Survey propagation is complemented with a decimation procedure.

## Iii A simple case: cycle codes

We start our analysis with a family of linear systems called cycle codes. They correspond to systems of linear equations (in ) in which each variable participates in at most equations. In the graphical representation, a cycle code is a bipartite graph in which each variable node has degree . This particular ensemble has a simple structure that allows to provide exact results. In particular, we provide in III.2 a rigorous proof of the exactness of Max-Sum solution. We also design a greedy optimal (GO) algorithm that is guaranteed to find the optimal solution (see III.3).

### iii.1 Comparison of cavity predictions and algorithmic performances on single instances

We focus on a family of random graph ensembles with factor degree profile , with a positive integer, and with . The variable degree profile is : each variable node has fixed degree . The rate for this family of instances can be expressed as a function of and :

 R(k,pk+1)=1−2k+pk+1 (26)

Varying and allows to span the range . Fig. 1 shows the performance of the algorithms GO (red circles) and Max-Sum with reinforcement (green squares) for graphs with variables of degree 2, compared to the zero-temperature Replica Symmetric prediction (gray dashed line).

As in the rest of the paper, results in Fig. 1 are presented in the rate-distortion plane, using the framework of lossy compression. The blue line corresponds to the exact rate-distortion bound given in equation (1), i.e. the minimal distortion achievable at a given rate . The red line corresponds to the distortion achieved with the trivial compression strategy described in the introduction (see I.1). Note that points relative to Max-Sum have a slightly larger distortion than the results of GO. This is due to the fact that Max-Sum does not converge on all instances, contrarily to the algorithm GO that provides the exact solution on all instances. In case of non-convergence, the strategy adopted was to split the variables in a set of independent and dependent variables, as explained in II.5. After some running time, although Max-Sum algorithm solving (19) has not converged, one fixes the value of the independent variables according to their Max-Sum belief (20), and fixes the dependent variables in order to satisfy the set of linear constraints.

For rate up to (i.e. for )), we observe a good agreement between the zero-temperature Replica-Symmetric cavity prediction and the results of the algorithms (GO and Max-Sum). Indeed, we found numerically that the unique solution of the zero-temperature 1RSB equations (74) is the trivial RS solution (75) for , which confirms that we are in the Replica Symmetric phase in this regime. For larger rates (), there are several signs indicating that the zero-temperature RS solution is not correct: first there is a discrepancy between its predicted average distortion and the results obtained with GO on large () instances (see Fig. 2). Above , the distortion computed with the RS ansatz even goes below the exact rate-distortion lower bound. We were able to obtain a physical solution by considering the finite-temperature cavity method described in appendix B.1 (see Fig. 2, black points are a large extrapolation of the RS prediction at finite ). We leave for future work a further exploration of this regime, in particular to explain the discrepancy between finite and zero-temperature cavity methods. It might be that in this regime one cannot exchange the thermodynamic and zero-temperature limit, and therefore that the zero-temperature cavity method is not correct. It is also possible that a RSB transition occurs as increases, however we could not confirm this hypothesis since we encountered some convergence issues when trying to solve numerically the 1RSB equations at zero temperature (74).

### iii.2 Exactness of the Max-Sum fixed points

For cycle codes, Max-Sum fixed points correspond to optimal solutions. This can be seen as a consequence of [33] (extension of Thm 3 mentioned in the conclusive part), plus a certain property that guarantees that local optimality ensures global optimality for cycle codes. More explicitly, any sub-optimal configuration can be improved by modifying it along a cycle or open path ending on leaves. We will provide however a separate proof for the sake of completeness.

A key hypothesis is that degeneracy on the ground state is removed (by e.g. adding an small random external field on each variable) so that the minimum is unique.

The main tool being used in the proof is the computation tree, whose features are reviewed here, before stating the proof.

#### iii.2.1 Computation trees

A computation tree is a loop-free graph obtained from a loopy one. Here we will follow the notation in [4]. In principle, computation trees can be built for any graph, although here we will focus on factor graphs.

Given a bipartite graph , the idea is to pick a variable node to be the root and then unroll around it, respecting the local structure of each node. Starting from the root, the first level of the computation tree is formed by the root’s neighbors. The second level is made of the first level nodes’ neighbors, except the root. The third level is made of the second level nodes’ neighbors, except the ones already connected from above. Proceeding in this way for a number of levels produces the level- computation tree with root called .

The concept is best explained through an example, shown in Fig. 3. Nodes in the original graphs have multiple counterparts in the computation tree. Sometimes it is useful to refer to those counterparts with the same name as the respective nodes in the original graph.

#### iii.2.2 Statement and proof

The problem of finding the minimum of the distortion or energy expressed as

 minx––:Hx––=0E(x––)=minx––:Hx––=0n∑i=1xi⊕yi, (27)

i.e. the Hamming distance between and the source vector . Consider a MS fixed point on the factor graph . Call , respectively the messages and beliefs and let

 gi={0if hi>01if hi<0 (28)

be the decision variables.

##### Claim.

If there is a unique solution, and if the Max-Sum algorithm has converged, then the set of decision variable corresponds to the optimal solution.

##### Proof.

Pick a node and build the computation tree obtained by unrolling the original graph around until there are at least counterparts of each vertex in and so that all leaves are variable nodes; will be the root.

Now place messages on the edges of that correspond to the original edges on . Attach to all nodes that are leaves in the computation tree but were not leaves in the original graph, a fictitious factor that sends a message equal to the one flowing out of the leaf. By construction, decision variables on the computation tree are now equal to the ones in the original graph.

Consider now the optimization problem with the same structure of the original one but defined on the computation tree. Messages on constitute a solution for the MS equations: they are naturally fulfilled in the inner part of the tree and imposed on the leaves by the fictitious factors. Since MS is exact on trees, the assignment (replicated for each of the counterparts in of each variable in ), is an exact solution for the problem defined on the computation tree.

Now call the optimum for the original problem

 x––∗=argminx––:Hx––=0E(x––). (29)

Suppose (absurd) that decision variable for the root is different from its value in the optimal assignment, . Namely, , where is the complement of under . We show that it is always possible to find a path on such that complementing every variable along such path results in an improvement in the objective function of the problem defined on the tree. This contradicts the fact that is an optimum.

The key idea is the following: suppose two vectors are both solutions of but differ in the root variable . If is disconnected from the rest of the graph, and can have all the other bit equal to each other and be two solutions. If has degree 1, then the single factor attached to it must have at least another incident variable with value different in and in order for the parity check to be satisfied. If has degree 2, then the above must be true for both factor neighbors. With these observation in mind, let us move to the explicit construction of path .

To construct path , start from the root , pick any of its factor neighbors, which by construction are at most two, and do the following: look at all the variable nodes incident on the factor and pick one for which the decision variable disagrees with the optimal solution. There will always be at least one in order for the parity-check to be satisfied, as explained before. Include that variable in the path and move on to its other factor if there is any. The process is halted when either a leaf is encountered and the path ends, or the root is found again.

In case the path ends on a leaf, go back to the root and repeat the process in the other direction, or end the path on the root if the root is a leaf. In case the path ends in a cycle, carry on extending the path repeating the same choice of variables to be included, until the leaves of the computation tree are reached. Let us stress that the resulting path only touches variables which have different values in the solutions on and .

At this point, the path stemming on both sides from the root can either end on a “true” leaf of (one that corresponds to a leaf also on ) or continue until the bottom of the tree, where the fictitious factors are. We prove our claim in the worst case, where both branches of the path go all the way down, the others follow easily.

Call the projection of on : it may contain cycles. Again thanks to the fact that no parity check can be left unsatisfied, if is an open path, then its endpoints must be leaves of . Call the energy of the optimal configuration and the energy of the first non-optimal one. Further, call and the indicator functions of paths on and on respectively.

For sure, since gives a minimum, a transformation that complements the variables touched by gives a positive shift in energy

 E(x––∗⊕x––P′)≥E0+ϵ (30)

Because the energy function (27) is a sum over functions of single nodes, the shift in energy is only due to the flip of variables in . After the flip, all the touched variables assumed the value they have on . But this means that complementing them on would reverse the shift, thus lowering the energy of the problem defined there by at least for each repetition of , at least in the bulk.

If is a path, then the same negative shift in energy happens along on , finding a better optimum than and thus contradicting the starting point. If instead, goes down all the way to the fictitious factors, the improvement gets multiplied times the number of repetitions of , although it might in principle be outbalanced by the change in energy due to the interaction of the one or two leaves ends on with their fictitious factors. An upper bound for this changes is given by the maximum absolute difference in messages on the tree

 umax=max(i,a)∈P′|ua→i|. (31)

Since there is no limit to the tree’s depth, it suffices to repeat the path enough times to be sure that energy will decrease. This amounts to choosing so that

 kϵ>umax (32)

Namely,

 k>umaxϵ (33)

The same argument can be repeated for all ’s for which the solution on the computation tree differs from the presumed optimal one.

This completes the proof.

### iii.3 An optimal greedy algorithm

In this section we present the greedy optimal algorithm GO that performs a local search in the energy landscape, decreasing at each time step the energy , by flipping variables in the codeword in order to obtain a codeword with lower energy .

For this algorithm all variable nodes should have fixed degree (if not, one can add an equation to the system involving all leaf variables. This equation is clearly linearly dependent on the other ones, as it is the sum of all rows). One considers a slightly simplified factor graph in which the set of new vertices is the set of factor nodes , and the edges are linking two vertices through the variable node such that . For any codeword and source vector , one defines a weight function on the graph as follows. The edge passing through the variable node associated to the boolean component has weight if the components of the source vector and of the current codeword coincides: , and otherwise. One then look for a negative cost cycle , using the algorithm presented in [17]. Let be the indicator function on the negative cost cycle . Flipping the variables belonging to results in a new codeword with a strictly smaller energy.

The algorithm starts with a given codeword which may not be the optimal codeword. Then at each time step it finds a negative cost cycle for the weight function , and flip the variables in the cycle to get a lower energy state: . One repeats this procedure until convergence, i.e. when the difference between energies becomes zero (up to numerical precision). Finding a negative cost cycle can be done efficiently [18].

## Iv Moving to higher degrees

We have shown in the previous section that the constrained optimization problem CVP on cycle codes (i.e. in which variable nodes have degree at most ) could be solved exactly. In this section we study random graph ensembles in which variable nodes have higher degree. We show that moving to these ensembles allows to reach a smaller minimal energy. We focus on random graph ensembles with variable degree profile , i.e. with a fraction of variable nodes with degree , and a fraction of variable nodes with degree . The factor degree profile is , i.e factor nodes have fixed degree . This is the simplest choice providing a non-trivial phase diagram. In this ensemble the compression rate can be expressed in terms of the fraction of degree variables:

 R(λ3)=1−λ33∈[0,1/3] (34)

We will compare this random graph ensemble to cycle codes, where the variable degree profile is , (i.e. there is a fraction of degree- variables, and a fraction of degree- variables). In this ensemble we express the rate in terms of the fraction of degree variables:

 R(λ1)=1+λ13∈[1/3,2/3] (35)

### iv.1 Results from the cavity method

Fig. 4, left panel shows the results of the zero-temperature cavity method, under the RS formalism (plain lines) and the 1RSB formalism (circles). For the ensemble with variable nodes of degree and corresponding to rate (35) (in green), we see that the RS and 1RSB predictions are the same. Indeed for this ensemble the unique 1RSB solution that we found was the trivial RS solution. For this graph ensemble, the minimal distortion achievable is larger than the one for the graph ensemble studied in the previous section with rate (26), which is also reported in Fig. 4 (gray dashed line). It is more interesting to look at the graph ensemble in which variable nodes have degree and corresponding to (34) (in pink). The RS prediction (pink line) is clearly unphysical, because at small rates it goes below the rate-information bound (blue line). The 1RSB formalism is needed to give a reliable prediction of the minimal distortion for this ensemble: pink circles correspond to the 1RSB solution obtained at Parisi parameter , see the discussion in V.1. We see that it enters in a 1RSB phase as soon as there is a positive fraction of degree variables (see in particular the details close to in the right panel of Fig. 4). The clustering transition occurs therefore at , and is represented by the vertical dashed line in Fig. 4, left panel. This 1RSB prediction is confirmed by a finite size analysis presented in subsection IV.3. We could reach the physical solution down to rate (left-most pink circle). For smaller rates, we could not find a physical solution. We give more details on the numerical resolution of the 1RSB equations in the next section (see V.1), in particular on the difficulties encountered for small rates.

The vertical plain line indicates the rate at which the dynamical transition occurs for the XORSAT problem: , which corresponds to the clustering transition associated with the measure (6) at . In V.2, we compute the clustering transition for finite values of

, thus making the interpolation between the clustering threshold

at and at (see Fig. 8). There is therefore a range of the rate for which the constrained optimization problem is in a 1RSB phase, while the underlying XORSAT problem defining the set of constraints is Replica-Symmetric. This situation is interesting because it means that the structure of the set of constraints is not enough to describe the complexity of the constrained optimization problem. In the range , the set of allowed configurations (defined as the solution set of the XORSAT instance) is rather well-connected, yet, and despite the simplicity of the optimization function (7), the optimization problem is in a glassy phase: the energy landscape presents many local minima that are separated by free-energy barriers. A similar situation has been recently encountered in other high-dimensional constrained optimization problems, see [29]

for a study of the optimization of a quadratic function where the constraints are modeled by a perceptron constraint satisfaction problem. Moreover, we observe that the RS/1RSB transition is continuous, see Fig.

4, right panel. In contrast, the RS/1RSB transition for the underlying constrained satisfaction problem (the XORSAT problem) is a random first-order transition.

The physical scenario we have found in this constraint satisfaction problem can be summarized as follows: (i) the set of allowed solutions is well connected until it undergoes a random first order transition (RFOT) and become clustered for , (ii) however, already for , the application of an external field in a random direction induces a continuous transition. This is reminiscent of what happens in the -spin model [12], where the discontinuous phase transition becomes continuous under the application of an external field. Let us give a simple intuition of the structure of solutions in this problem. The most abundant solutions are well connected for any . However, this strong connectedness is no longer true as soon as we put a bias in any direction and we concentrate the measure on a subset of solutions: leaving the region where the most abundant solutions live, for any , the space of solutions acquires a non trivial structure that in turn induces a continuous phase transition and requires the replica symmetry to be broken in order to describe correctly this non trivial structure. Eventually at also the most abundant solutions acquire a non trivial structure (actually undergo a clustering transition) and this is likely to have important consequences for algorithms. Indeed, while a problem undergoing a continuous transition can be well approximated by polynomial algorithms, we expect a RFOT to induce a much more serious algorithmic barrier. We explore algorithmic consequences in the next subsection.

### iv.2 Algorithmic results

In this subsection, we report the results obtained with the algorithms described in section II. Fig. 5 shows the results of Max-Sum with reinforcement (red stars), Belief-Propagation with decimation at finite inverse temperature (blue squares), and Survey-Propagation with decimation (green diamonds) at maximizing the 1RSB free-energy, see the discussion in V.1. The result of the zero-temperature cavity method within the 1RSB ansatz is also reported (pink circles). For rates in the range there is good agreement between the cavity prediction and the algorithmic results. As the rate decreases, one observes a jump toward solutions with higher distortion found by the three algorithms, while the cavity method predicts a smaller distortion. This decrease in performance arises around the XORSAT dynamical transition occurring at . In the clustered regime , none of the three algorithms is able to find the optimal solution.

This result is interesting, because the algorithmic transition does not match with the phase transition associated to the constrained optimization problem that we found at , instead it matches with the clustering transition for the XORSAT problem that models the constraints (and which is not related to the optimization function). A possible explanation for the fact that algorithms perform well in the range and undergo a dramatic algorithmic transition only approaching is the following. As long as the most abundant solutions are well connected (they undergo a clustering transition only at ) and the phase transition induced by the external field (the linear function to be optimized) is continuous 111We stress than the problem in this region probably undergoes a full replica symmetry breaking (FRSB) transition, but we are able to perform only a 1RSB computation, that should approximate closely the actual solution. In this situation we expect optimizing algorithms to perform in an efficient way: on the one hand the space of solutions is well connected by passing through the most abundant (although not very optimized) solutions and on the other hand a continuous phase transition induces correlations that can be well approximated by polynomial algorithms. For these reasons, we expect smart optimizing algorithms, like the ones we have used, to perform reasonably well above . Obviously the performances of these algorithms start degrading already above , because approaching the clustering transition at the structure of solutions starts acquiring a sponge-like topology, that will eventually break up in separated cluster at . When the topology is sponge-like, with tiny corridors connecting the regions that will become soon clusters, the application of the external field can have dramatic effect, effectively anticipating the clustering transition.

Eventually for the most abundant solutions are clustered and moving between solutions becomes very difficult, effectively inducing large algorithmic barriers. In this regime the effects of the RFOT are manifest and all the algorithms get stuck in solutions of very large distortion. This reminds the threshold energy phenomenon, well known in glassy models and hard optimization problems [28].

A last important comment about the connection between the dynamical behavior of these smart algorithms and the thermodynamic phase diagram of the problem is about the possibility that smart algorithms do not converge on solutions that dominates the equilibrium measure. This has been already observed in constraint satisfaction problems [13] and in the binary perceptron problem [8]. The finite-temperature study reported in section V.2 predicts that the clusters in this problem are point-like, i.e. they are made of only one solution, but these solutions should be very hard to find by algorithms. It is therefore more likely that the solutions found by message-passing algorithms in Fig. 5 belong to atypical, large clusters. We conjecture that these clusters are sub-dominant and thus not described by the cavity method, but yet are the relevant clusters from an algorithmic point-of-view, since they are made of solutions that are more accessible for algorithms.

### iv.3 Exact enumeration

Given that no linear-time message-passing algorithm can approach the optimal distortion predicted by the RSB cavity method at zero temperature, we need a different numerical approach to convince the reader that the analytical prediction based on the cavity method is actually meaningful. We have performed an exact enumeration for small sizes and a finite size study of the random graph ensemble with degree profile , to compare with the results of the zero-temperature 1RSB cavity method. Fig. 6 shows the exact results for sizes . The results are averaged over several instances drawn at random from the random graph ensemble . For each instance, the solution set of the associated XORSAT instance is computed exactly, and the solution with minimal distortion is extracted. A linear extrapolation is done in the large size limit, which is in good agreement with the 1RSB cavity prediction. This exact enumeration procedure allows to give predictions for rates smaller than , below which the 1RSB cavity method does not provide a physical solution. We remind that the physically correct solution probably requires the breaking of the replica symmetry infinite times (FRSB), so instabilities in the 1RSB solution at very small rates do not come as a surprise.

## V A more detailed picture of the phase diagram

In this section, we give more details on the results obtained with the cavity method, in particular for the random graph ensemble with degree profiles , .

### v.1 Instability in the zero-temperature solution

The results of the zero-temperature cavity method (see Fig. 4) have been obtained by taking simultaneously the limit for the inverse temperature and for the Parisi parameter, with a finite value for . As explained in the appendix (B.2) we used the softened measure (8) for the numerical resolution of the zero-temperature cavity equations, as it allowed us to represent populations of Max-Sum messages as finite vectors of size . In the large limit, the softened measure (8) concentrates on configurations minimizing the energy (9). In the 1RSB phase, the set of these configurations is split into an exponential number of clusters separated by free-energy barriers. Since , all the clusters are weighted identically (independently of their size). The choice of the value of to describe correctly the cluster decomposition is delicate. Following the seminal work [20], one should compute the 1RSB free-energy defined in (71) and maximize it over . In practice, for each value of the rate plotted in Fig. 4, we studied the -dependence of the solution of the zero-temperature 1RSB equations (74). Fig. 7. shows the study of the -dependence for rate (left panel), the vertical red line indicates the optimal value . The right panel shows the optimal value of as a function of the rate. For each value of the rate, from the solution of the 1RSB equations we computed the 1RSB free-energy and the zero-temperature complexity (see equation (80). We also computed the internal distortion , from the internal energy as (see equation (78)). For the appropriate choice of , gives a prediction for the minimal distortion. Finally, we also computed the averaged distribution of cavity fields

 Pavg(h)=∫d