 # Local optima of the Sherrington-Kirkpatrick Hamiltonian

We study local optima of the Hamiltonian of the Sherrington-Kirkpatrick model. We compute the exponent of the expected number of local optima and determine the "typical" value of the Hamiltonian.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Local optima of the Hamiltonian

Let be a symmetric matrix with zero diagonal such that the

are independent standard normal random variables.The

Sherrington-Kirpatrick model of spin glasses is defined by a random Hamiltonian, that is, a random function . For a configuration , is defined as follows.

 H(σ):=∑1≤i

We follow the usual convention of calling a spin configuration, the coordinates of spins, and the value the energy of configuration .

Given and as above, we let denote a new configuration obtained from by flipping the -th spin and leaving other coordinates unchanged. That is,

 σ(i)j:={−σi,j=i;σj,j∈[n]∖{i}.

We say that is a local minimum or a local optimum of if

 ∀i∈[n]:H(σ(i))≥H(σ).

That is, is a local minimum if flipping the sign of any individual spin does not decrease the value of the energy.

The global optimum —called the “ground-state energy”—has been extensively studied. The problem was introduced by Sherrington and Kirkpatrick  as a mean-field model for spin glasses. The value of the optimum was determined non-rigorously in the seminal work of Parisi , as a consequence of the so-called “Parisi formula”. Parisi’s formula was proved by Talagrand  in a breakthrough paper, see also Panchenko  for an overview. It follows from Talagrand’s result that

 n−3/2minσ∈{−1,+1}nH(σ)→−cin probability,

where

is a constant whose value is numerically estimated to be about

(Crisanti and Rizzo ) and known to be bounded by (Guerra ).

In this paper we are interested in locally optimal solutions. An important reason of why local optima are worth considering is because local optima may be computed quickly by simple greedy algorithms, see  and subsection 1.2 below. We show that the expected number of local optima grows exponentially and we establish the rate of growth. Also, we examine the conditional distribution of given that is locally optimal. We prove that the distribution is concentrated on an interval of exponentially small width and determine the location.

### 1.1 Results

In order to state the main result of the paper, we need a few definitions.

Let be the distribution function of a standard normal random variable and introduce . For , we let denote the following Fenchel-Légendre transform:

 μ∗(x):=supλ≥0(λx−λ22−ϕ(λ)).

Lemma 2 below shows that is well defined. Lemma 4 shows that the mapping

 x≥√2π↦R(x):=x24−μ∗(x)

is strictly concave and achieves its global maximum at . We let denote the maximum value of .

###### Theorem 1.

We have that, as , for any choice of ,

 limn→+∞1nlogP{σ is locally optimal}=α∗−log2.

Moreover, there exists constants , and such that, for and ,

 P{−v∗2−ϵ≤n−3/2H(σ)≤−v∗2+ϵ∣∣∣σ is locally optimal}≥1−exp(L√n−ϵ2n).

The values of the constants are numerically evaluated to be and . Since the global minimum of is about , the typical value a local optimum comes fairly close.

Also note that Proposition 1 below implies that is between and .

### 1.2 Local minima, greedy algorithms and MaxCut

Our problem is related to finding a local optimum of weighted MaxCut on the complete graph, which was recently studied in Angel, Bubeck, Peres, and Wei . Given , we denote the value of the cut as

 Cut(S,[n]∖S):=∑i∈S∑j∈[n]∖SWi,j.

Note that there is a correspondence between cuts and spin configurations with:

 σS,i:=21i∈S−1(i∈[n]).
 Cut(S,[n]∖S)=−H(σS)+∑1≤i

In particular, what Angel et al. call locally optimal cuts correspond exactly to our notion of local minimum.

Starting from a given , do a sequence of local “greedy moves”  – i.e. single spin flips that decrease energy – until no more such moves are available. The main result of  is that this process ends at a local minimum after a polynomial number of moves. Unfortunately, it is not clear that the distribution of the value of this local minimum is similar to the one we study in Theorem 1.

## 2 The probability of local optimality

In this section we take the first and crucial step to prove Theorem 1. For any fixed spin configuration , we establish an integral formula for the probability that is locally optimal.

Define:

 Zi(σ):=H(σ(i))−H(σ)2=−∑j∈[n]∖iσiσjWi,j(i∈[n]). (2.1)

Note that

 σ is a local minimum ⇔∀i∈[n],Zi(σ)≥0. (2.2)

Moreover,

 −H(σ)=12n∑i=1∑j∈[n]∖i−σiσjWi,j=∑ni=1Zi(σ)2. (2.3)

Since is fixed, we will write instead of most of the time.

A key point in our calculations is that the random vector

 Z=(Z1,Z2,…,Zn)T

is a multivariate normal vector with zero mean and covariance matrix such that for all and for all . In other words,

 C=(n−2)Idn+1n1Tn,

where is the identity matrix and is the column vector with in each component.

Clearly, the eigenvalues of

are with multiplicity and with multiplicity , and therefore .

One may use the Sherman-Morrison formula to invert and obtain

 C−1=1n−2(Idn−12n−21n1Tn) ,

and therefore

 P{σ is locally optimal} = 1(2π)n/2det(Σ)1/2∫[0,∞)nexp(−xTC−1x2)dx = 1(2π)n/2(2n−2)1/2(n−2)(n−1)/2∫[0,∞)nexp(−∥x∥222(n−2)+∥x∥212(n−2)(2n−2))dx = 2−n1(2π)n/2(2n−2)1/2(n−2)(n−1)/2∫Rnexp(−∥x∥222(n−2)+∥x∥212(n−2)(2n−2))dx .

We may rewrite this as:

 P{σ is locally optimal}=2−n√n−22n−2Eexp(∥N∥214(n−1))

where is a vector of independent standard normal random variables.

In what follows, we derive some simple upper and lower bounds for the integral above.

###### Lemma 1.

If is a vector of independent standard normal random variables, then for all ,

 λE∥N∥21≤logEexp(λ∥N∥21)≤λE∥N∥21(1+nλ(1−nλ)) .

Proof.   The inequality on the left-hand side is obvious from Jensen’s inequality. To prove the right-hand side, we use the Gaussian logarithmic Sobolev inequality. In particular, writing and , the inequality on page 126 of Boucheron, Lugosi, and Massart  asserts that

 λF′(λ)−F(λ)logF(λ)≤λ22E[eλf(N)∥∇f(N)∥2] .

Since , we obtain the differential inequality

 λF′(λ)−F(λ)logF(λ)≤2nλ2F′(λ) .

This inequality has the same form as the one at the top of page 191 of  with and and Theorem 6.19 implies the result above.

Since

 E∥N∥21=n+n(n−1)2π ,

we get

 P{σ is locally optimal}≥2−n√n−22n−2exp(n/(4(n−1))+n2π)

and

 P{σ is locally optimal}≤2−n√n−22n−2exp((n/(4(n−1))+n2π)4n−13n−1)

Summarizing, we obtain the following bounds

###### Proposition 1.

For all spin configurations ,

 12π−log2−O(1/n)≤1nlogP{σ is % locally optimal}≤23π−log2+O(1/n)

In the next section we take a closer look at the integral expression of the probability of local optimality. In fact, we prove that converges to defined in the introduction.

## 3 The value of local optima

In this section we study, for any fixed and , the joint probability

 P{σ is locally optimal,n−3/2H(σ)≤−Δ} .

We let with as in the previous section. Recall from equations (2.2) and (2.3) that

 σ is locally optimal⇔∀i∈[n],Zi≥0

and

 −H(σ)n3/2=12n3/2n∑i=1Zi.

Therefore, we may follow the calculations in the previous section and obtain:

 P{σ is locally optimal,n−3/2H(σ)≤−Δ} = P{(∩ni=1{Zi≥0})⋂{n∑i=1Zi≥2Δn3/2}} = 1(2π)n/2det(C)1/2∫[0,∞)n∩{x:∑ixi≥2Δn3/2}exp(−xTC−1x2)dx = 1(2π)n/2(2n−2)1/2(n−2)(n−1)/2∫[0,∞)n∩{x:∑ixi≥2Δn3/2}exp(−∥x∥222(n−2)+∥x∥212(n−2)(2n−2))dx = 2−n1(2π)n/2(2n−2)1/2(n−2)(n−1)/2∫{x:∥x∥1≥2Δn3/2}exp(−∥x∥222(n−2)+∥x∥212(n−2)(2n−2))dx .

Thus, by a change of variables, we get

 P{σ is locally optimal,n−3/2H(σ)≤−Δ} = 2−n√n−22n−2E[1{∥N∥1≥2Δn3/2/√n−2}exp(∥N∥214(n−1))] ,

where is a vector of independent standard normal random variables.

We deduce the following proposition.

###### Proposition 2.

We have that, for all ,

 (3.1)

## 4 Approximating the integral

In order to establish convergence of the exponent and also the “typical” value of the energy, we need to understand the behavior of the numerator and the denominator of the key equation (3.1).

The main idea is to obtain a Laplace-type approximation to the integral. Make the approximation

 E[exp(∥N∥214(n−1))]≈E[exp(∥N∥214n)].

Observe that

 ∥N∥1n=1nn∑i=1|Ni|

is an average of i.i.d. random variables expectation and light tails. Therefore, it satisfies a Large Deviations Principle with a rate function :

 P{∥N∥1≥nx}≈e−μ∗(x)n.

Readers familiar with Varadhan’s Lemma (see e.g. [5, page 32]) should expect that, as ,

 1nlogE[exp(∥N∥214n)]=1nlogE[exp(n(∥N∥1/n)24)]→supvv24−μ∗(v).

In fact, the intuition behind the Lemma is that most of the “mass” of the expectation concentrates around , where achieves the above supremum. This means that the conditional measure described in Proposition 2 should concentrate around .

Our calculations confirm this reasoning. The usual statement of Varadhan’s Lemma does not apply directly because is an unbounded function of . Another minor technicality is that the function is divided by instead of . In what follows we have opted for a self-contained approach to our estimates, which gives quantitative bounds. This section collects the corresponding technical estimates. We finish the proof of Theorem 1 in the next section.

The next Lemma is a quantitative version of the large deviations principle (or Cramér’s Theorem) for .

###### Lemma 2.

For , define as in the introduction. Let be a vector of i.i.d. standard normal coordinates. Then:

 P{∥N∥1≥nx}=e−(μ∗(x)+rn(x))n

with

 0≤rn(x)≤κ(x−√2/π√n+1n)

for some independent of and . Moreover, is smooth and .

Proof.  This follows directly from Lemmas 5, 6 and 7 in subsection 6.1.

We will use this Lemma to estimate expectations of the form:

 E[exp(c∥N∥212n)1{∥N∥1≤an}] and E[exp(c∥N∥212n)1{∥N∥1≥bn}].

The function defined below naturally shows up in our estimates.

 Rc(x):=cx22−μ∗(x).(x≥√2/π). (4.1)
###### Lemma 3.

For ,

 E[exp(c∥N∥212n)1{∥N∥1≤an}]=(I)+(II),

where

 1≤(I)≤exp(nRc(√2/π))

and

 (II)=cn∫a√2/πxexp(n(Rc(x)−rn(x)))dx.

with is as in Lemma 2. For ,

 E[exp(c∥N∥212n)1{∥N∥1≥bn}] = exp{n(Rc(b)−rn(b))} +cn∫+∞bxexp{n(Rc(x)−rn(x))}dx.

Proof.  Let . Note that:

 1{∥N∥1≤an}exp(cn∥N∥212n)=ϕc,n(∥N∥1n)1{∥N∥1n≤a}.

We may compute the expectation of this expression as follows.

 E[1{∥N∥1≤an}exp(cn∥N∥212n)] = 1+∫a0ϕ′c,n(x)P{∥N∥1n≥x}dx = 1+cn∫a0exp(cnx22)P{∥N∥1n≥x}dx.

We split the above integral in two parts.

 (I) = 1+cn∫√2/π0xexp(cnx22)P{∥N∥1n≥x}dx (II) = cn∫a√2/πxexp(cnx22)P{∥N∥1n≥x}dx.

For part (I), we bound the probability in the integral by , and obtain:

 1≤(I)≤1+cn∫√2/π0xexp(cnx22)dx≤exp(cnx22)|x=√2π=exp(nRc(√2/π))

because . Term (II) may be evaluated using the estimate from Lemma 2.

 (II)=cn∫a√2/πxexp(cnx22−nμ∗(x)−nrn(x))dx,

which has the desired form because

 cnx22−nμ∗(x)=nRc(x).

Similarly,

 1{∥N∥1≥bn}exp(cn∥N∥212n)=ϕc,n(∥N∥1n)1{∥N∥1n≥b},

and we finish the proof via the identity

 E[1{∥N∥1≥bn}exp(cn∥N∥212n)] = ϕc,n(b)P{∥N∥1n≥b} +∫+∞bϕ′c,n(x)P{∥N∥1n≥x}dx

and using the bounds in Lemma 2 (which are valid for all ).

## 5 Proof of main Theorem

The previous section shows that, in order to estimate the expectations in Lemma 3, we need to understand the function . The case of interest for us is when , which is when we recover the expectations in (3.1). Since varies with , we will consider instead:

 R(x)=R12(x):=x24−μ∗(x)(x≥√2/π). (5.1)

and note that

 R(x)≤Rc(x)≤R(x)+(2c−1)x24. (5.2)

The next Lemma contains some information on .

###### Lemma 4.

Let . Define as in equation (5.1) and as in Lemma 2. Then there exists a unique that maximizes over . Leting denote the value of the maximum, for any , there exists with:

 R(x)−α∗=−θ(x)(x−v∗)2.

Proof.  See subsection 6.2.

We can now obtain good upper and lower estimates on the integral expressions in Lemma 3 and finish the proof of the main Theorem.

Proof.  [of Theorem 1] In this proof we assume for simplicity. We will use the notation to denote the value of a constant independent of that may change from line to line. Finally, we set

 c=cn:=n2(n−1)=1+12(n−1).

Lemma 4 and (5.2) give:

 ∀x≥√2π:Rc(x)−α∗∈[−10(x−v∗)2,−16(x−v∗)2+x2(n−1)]. (5.3)

We will now apply this to estimate expectations to the left of . That is, we consider:

 E[exp(c∥N∥212n)1{∥N∥1≤an}],√2π≤a≤v∗.

In this range is uniformly bounded, so and

 ∀√2π≤x≤v∗:0≤rn(x)≤L√n.

Combining Lemma 3 with and (5.3), we obtain:

 E[exp(c∥N∥212n)1{∥N∥1≤an}]exp(nα∗) ≤ exp(n(Rc(√2/π)−α∗)) +cn∫a√2/πxexp(n(Rc(x)−α∗))dx ≤ exp(L−(v∗−√2/π)24n) +n∫a√2/πxexp(L+n(v∗−x)26)dx ≤ L(1+cn)exp(L−(a−v∗)2n4) ≤ exp(Llogn−(a−v∗)2n4).

At the same time,

 E[exp(c∥N∥212n)1{∥N∥1≤v∗n}]exp(nα∗) ≥ exp(−L√n)∫v∗v∗−1nxexp(n(Rc(x)−α∗))dx ≥ 1n(v∗−1n)exp(−L√n−10(1/n)2)n ≥ exp(−L√n).

For bounding the expectation for , we cannot simply use and . However, note that

 −16(x−v∗)2+x24(n−1)≤⎧⎪⎨⎪⎩−15(x−v∗)2+L√nfor x≤(n−1)1/4% ;−16(x−v∗)2+2(x−v∗)2+2(v∗)2(n−1)≤−15(x−v∗)2+Ln for larger x.

Also, recalling the expression for in Lemma 2,

 0≤rn(x)≤κ(x−√2/π√n+1n)≤L√n+L(x−v∗)√n.

This allows us to obtain, for ,

 E[exp(c∥N∥212n)1{∥N∥1≥bn}]exp(nα∗) ≤ (L√n−(b−v∗)2n4); E[exp(c∥N∥212n)1{∥N∥1≥v∗n}]exp(nα∗) ≥ exp(−L√n).

This leads to our main results. Indeed, if we apply the above bounds with , we obtain that, as

 E[exp(c∥N∥212n)] = E[exp(c∥N∥212n)1{∥N∥1≤v∗n}] +E[exp(c∥N∥212n)1{∥N∥1≥v∗n}] = exp(nα∗±L√n).

This implies the first statement in the Theorem via Proposition 1.

Secondly, we apply Proposition 2 and obtain:

 P{−H(σ)≤−v∗2−ϵ∣σ local optimum} ≤ E[1{∥N∥1≥bn}exp(∥N∥214(n−1))]Eexp(∥N∥214(n−1)) ( with b=(v∗+2ϵ)√nn−1) = exp(−L√n−ϵ2n),

and (for small enough, so that below is ):

 P{−H(σ)≥−v∗2+ϵ∣σ local optimum} = E[1{∥N∥1≤an}exp(∥N∥214(n−1))]Eexp(∥N∥214(n−1)) ( with a=(v∗−2ϵ)√nn−1) = exp(−L√n−ϵ2n).

## 6 Auxiliary results

### 6.1 Lemmas on large deviations of ∥N∥1

The goal of this subsection is to prove a series of Lemmas that together imply Lemma 2. We first find an expression for the Laplace transform of

###### Lemma 5.

Let be a standard normal random variable. For all ,

 Eeλ|N(0,1)|=eλ2/2+ϕ(λ) ,

where , with .

Proof.

 Eeλ|N(0,1)| = 2√2π∫∞0eλx−x2/2dx = 2eλ2/21√2π∫∞0e(x−λ)2/2dx = 2eλ2/2P{N(0,1)>−λ} .

We will need to compute the large deviations rate function for , with i.i.d. standard normal. As usual, this is given by the Fenchel-Légendre transform of :

 μ∗(x):=supλ≥0λx−logEeλ|N(0,1)|.

The next lemma collects technical facts on and the value that achieves the minimum.

###### Lemma 6.

For each , there exists a unique such that

 λ+ϕ′(λ)=x.

Defining:

 x≥√2π↦μ∗(x):=λ∗(x)