1 Local optima of the Hamiltonian
Let be a symmetric matrix with zero diagonal such that the
are independent standard normal random variables.TheSherrington-Kirpatrick model of spin glasses is defined by a random Hamiltonian, that is, a random function . For a configuration , is defined as follows.
We follow the usual convention of calling a spin configuration, the coordinates of spins, and the value the energy of configuration .
Given and as above, we let denote a new configuration obtained from by flipping the -th spin and leaving other coordinates unchanged. That is,
We say that is a local minimum or a local optimum of if
That is, is a local minimum if flipping the sign of any individual spin does not decrease the value of the energy.
The global optimum —called the “ground-state energy”—has been extensively studied. The problem was introduced by Sherrington and Kirkpatrick  as a mean-field model for spin glasses. The value of the optimum was determined non-rigorously in the seminal work of Parisi , as a consequence of the so-called “Parisi formula”. Parisi’s formula was proved by Talagrand  in a breakthrough paper, see also Panchenko  for an overview. It follows from Talagrand’s result that
is a constant whose value is numerically estimated to be about(Crisanti and Rizzo ) and known to be bounded by (Guerra ).
In this paper we are interested in locally optimal solutions. An important reason of why local optima are worth considering is because local optima may be computed quickly by simple greedy algorithms, see  and subsection 1.2 below. We show that the expected number of local optima grows exponentially and we establish the rate of growth. Also, we examine the conditional distribution of given that is locally optimal. We prove that the distribution is concentrated on an interval of exponentially small width and determine the location.
In order to state the main result of the paper, we need a few definitions.
Let be the distribution function of a standard normal random variable and introduce . For , we let denote the following Fenchel-Légendre transform:
is strictly concave and achieves its global maximum at . We let denote the maximum value of .
We have that, as , for any choice of ,
Moreover, there exists constants , and such that, for and ,
The values of the constants are numerically evaluated to be and . Since the global minimum of is about , the typical value a local optimum comes fairly close.
Also note that Proposition 1 below implies that is between and .
1.2 Local minima, greedy algorithms and MaxCut
Our problem is related to finding a local optimum of weighted MaxCut on the complete graph, which was recently studied in Angel, Bubeck, Peres, and Wei . Given , we denote the value of the cut as
Note that there is a correspondence between cuts and spin configurations with:
In particular, what Angel et al. call locally optimal cuts correspond exactly to our notion of local minimum.
Starting from a given , do a sequence of local “greedy moves” – i.e. single spin flips that decrease energy – until no more such moves are available. The main result of  is that this process ends at a local minimum after a polynomial number of moves. Unfortunately, it is not clear that the distribution of the value of this local minimum is similar to the one we study in Theorem 1.
2 The probability of local optimality
In this section we take the first and crucial step to prove Theorem 1. For any fixed spin configuration , we establish an integral formula for the probability that is locally optimal.
Since is fixed, we will write instead of most of the time.
A key point in our calculations is that the random vector
is a multivariate normal vector with zero mean and covariance matrix such that for all and for all . In other words,
where is the identity matrix and is the column vector with in each component.
Clearly, the eigenvalues ofare with multiplicity and with multiplicity , and therefore .
One may use the Sherman-Morrison formula to invert and obtain
We may rewrite this as:
where is a vector of independent standard normal random variables.
In what follows, we derive some simple upper and lower bounds for the integral above.
If is a vector of independent standard normal random variables, then for all ,
Proof. The inequality on the left-hand side is obvious from Jensen’s inequality. To prove the right-hand side, we use the Gaussian logarithmic Sobolev inequality. In particular, writing and , the inequality on page 126 of Boucheron, Lugosi, and Massart  asserts that
Since , we obtain the differential inequality
This inequality has the same form as the one at the top of page 191 of  with and and Theorem 6.19 implies the result above.
Summarizing, we obtain the following bounds
For all spin configurations ,
In the next section we take a closer look at the integral expression of the probability of local optimality. In fact, we prove that converges to defined in the introduction.
3 The value of local optima
In this section we study, for any fixed and , the joint probability
Therefore, we may follow the calculations in the previous section and obtain:
Thus, by a change of variables, we get
where is a vector of independent standard normal random variables.
We deduce the following proposition.
We have that, for all ,
4 Approximating the integral
In order to establish convergence of the exponent and also the “typical” value of the energy, we need to understand the behavior of the numerator and the denominator of the key equation (3.1).
The main idea is to obtain a Laplace-type approximation to the integral. Make the approximation
is an average of i.i.d. random variables expectation and light tails. Therefore, it satisfies a Large Deviations Principle with a rate function :
Readers familiar with Varadhan’s Lemma (see e.g. [5, page 32]) should expect that, as ,
In fact, the intuition behind the Lemma is that most of the “mass” of the expectation concentrates around , where achieves the above supremum. This means that the conditional measure described in Proposition 2 should concentrate around .
Our calculations confirm this reasoning. The usual statement of Varadhan’s Lemma does not apply directly because is an unbounded function of . Another minor technicality is that the function is divided by instead of . In what follows we have opted for a self-contained approach to our estimates, which gives quantitative bounds. This section collects the corresponding technical estimates. We finish the proof of Theorem 1 in the next section.
The next Lemma is a quantitative version of the large deviations principle (or Cramér’s Theorem) for .
For , define as in the introduction. Let be a vector of i.i.d. standard normal coordinates. Then:
for some independent of and . Moreover, is smooth and .
We will use this Lemma to estimate expectations of the form:
The function defined below naturally shows up in our estimates.
with is as in Lemma 2. For ,
Proof. Let . Note that:
We may compute the expectation of this expression as follows.
We split the above integral in two parts.
For part (I), we bound the probability in the integral by , and obtain:
because . Term (II) may be evaluated using the estimate from Lemma 2.
which has the desired form because
and we finish the proof via the identity
and using the bounds in Lemma 2 (which are valid for all ).
5 Proof of main Theorem
The previous section shows that, in order to estimate the expectations in Lemma 3, we need to understand the function . The case of interest for us is when , which is when we recover the expectations in (3.1). Since varies with , we will consider instead:
and note that
The next Lemma contains some information on .
Proof. See subsection 6.2.
We can now obtain good upper and lower estimates on the integral expressions in Lemma 3 and finish the proof of the main Theorem.
Proof. [of Theorem 1] In this proof we assume for simplicity. We will use the notation to denote the value of a constant independent of that may change from line to line. Finally, we set
We will now apply this to estimate expectations to the left of . That is, we consider:
In this range is uniformly bounded, so and
At the same time,
For bounding the expectation for , we cannot simply use and . However, note that
Also, recalling the expression for in Lemma 2,
This allows us to obtain, for ,
This leads to our main results. Indeed, if we apply the above bounds with , we obtain that, as
This implies the first statement in the Theorem via Proposition 1.
Secondly, we apply Proposition 2 and obtain:
and (for small enough, so that below is ):
6 Auxiliary results
6.1 Lemmas on large deviations of
The goal of this subsection is to prove a series of Lemmas that together imply Lemma 2. We first find an expression for the Laplace transform of
Let be a standard normal random variable. For all ,
where , with .
We will need to compute the large deviations rate function for , with i.i.d. standard normal. As usual, this is given by the Fenchel-Légendre transform of :
The next lemma collects technical facts on and the value that achieves the minimum.
For each , there exists a unique such that