Local optima of the Sherrington-Kirkpatrick Hamiltonian

12/21/2017 ∙ by Louigi Addario-Berry, et al. ∙ 0

We study local optima of the Hamiltonian of the Sherrington-Kirkpatrick model. We compute the exponent of the expected number of local optima and determine the "typical" value of the Hamiltonian.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Local optima of the Hamiltonian

Let be a symmetric matrix with zero diagonal such that the

are independent standard normal random variables.The

Sherrington-Kirpatrick model of spin glasses is defined by a random Hamiltonian, that is, a random function . For a configuration , is defined as follows.

We follow the usual convention of calling a spin configuration, the coordinates of spins, and the value the energy of configuration .

Given and as above, we let denote a new configuration obtained from by flipping the -th spin and leaving other coordinates unchanged. That is,

We say that is a local minimum or a local optimum of if

That is, is a local minimum if flipping the sign of any individual spin does not decrease the value of the energy.

The global optimum —called the “ground-state energy”—has been extensively studied. The problem was introduced by Sherrington and Kirkpatrick [9] as a mean-field model for spin glasses. The value of the optimum was determined non-rigorously in the seminal work of Parisi [8], as a consequence of the so-called “Parisi formula”. Parisi’s formula was proved by Talagrand [10] in a breakthrough paper, see also Panchenko [7] for an overview. It follows from Talagrand’s result that

where

is a constant whose value is numerically estimated to be about

(Crisanti and Rizzo [4]) and known to be bounded by (Guerra [6]).

In this paper we are interested in locally optimal solutions. An important reason of why local optima are worth considering is because local optima may be computed quickly by simple greedy algorithms, see [2] and subsection 1.2 below. We show that the expected number of local optima grows exponentially and we establish the rate of growth. Also, we examine the conditional distribution of given that is locally optimal. We prove that the distribution is concentrated on an interval of exponentially small width and determine the location.

1.1 Results

In order to state the main result of the paper, we need a few definitions.

Let be the distribution function of a standard normal random variable and introduce . For , we let denote the following Fenchel-Légendre transform:

Lemma 2 below shows that is well defined. Lemma 4 shows that the mapping

is strictly concave and achieves its global maximum at . We let denote the maximum value of .

Theorem 1.

We have that, as , for any choice of ,

Moreover, there exists constants , and such that, for and ,

The values of the constants are numerically evaluated to be and . Since the global minimum of is about , the typical value a local optimum comes fairly close.

Also note that Proposition 1 below implies that is between and .

1.2 Local minima, greedy algorithms and MaxCut

Our problem is related to finding a local optimum of weighted MaxCut on the complete graph, which was recently studied in Angel, Bubeck, Peres, and Wei [2]. Given , we denote the value of the cut as

Note that there is a correspondence between cuts and spin configurations with:

In particular, what Angel et al. call locally optimal cuts correspond exactly to our notion of local minimum.

Starting from a given , do a sequence of local “greedy moves”  – i.e. single spin flips that decrease energy – until no more such moves are available. The main result of [2] is that this process ends at a local minimum after a polynomial number of moves. Unfortunately, it is not clear that the distribution of the value of this local minimum is similar to the one we study in Theorem 1.

2 The probability of local optimality

In this section we take the first and crucial step to prove Theorem 1. For any fixed spin configuration , we establish an integral formula for the probability that is locally optimal.

Define:

(2.1)

Note that

(2.2)

Moreover,

(2.3)

Since is fixed, we will write instead of most of the time.

A key point in our calculations is that the random vector

is a multivariate normal vector with zero mean and covariance matrix such that for all and for all . In other words,

where is the identity matrix and is the column vector with in each component.

Clearly, the eigenvalues of

are with multiplicity and with multiplicity , and therefore .

One may use the Sherman-Morrison formula to invert and obtain

and therefore

We may rewrite this as:

where is a vector of independent standard normal random variables.

In what follows, we derive some simple upper and lower bounds for the integral above.

Lemma 1.

If is a vector of independent standard normal random variables, then for all ,

Proof.   The inequality on the left-hand side is obvious from Jensen’s inequality. To prove the right-hand side, we use the Gaussian logarithmic Sobolev inequality. In particular, writing and , the inequality on page 126 of Boucheron, Lugosi, and Massart [3] asserts that

Since , we obtain the differential inequality

This inequality has the same form as the one at the top of page 191 of [3] with and and Theorem 6.19 implies the result above.  

Since

we get

and

Summarizing, we obtain the following bounds

Proposition 1.

For all spin configurations ,

In the next section we take a closer look at the integral expression of the probability of local optimality. In fact, we prove that converges to defined in the introduction.

3 The value of local optima

In this section we study, for any fixed and , the joint probability

We let with as in the previous section. Recall from equations (2.2) and (2.3) that

and

Therefore, we may follow the calculations in the previous section and obtain:

Thus, by a change of variables, we get

where is a vector of independent standard normal random variables.

We deduce the following proposition.

Proposition 2.

We have that, for all ,

(3.1)

4 Approximating the integral

In order to establish convergence of the exponent and also the “typical” value of the energy, we need to understand the behavior of the numerator and the denominator of the key equation (3.1).

The main idea is to obtain a Laplace-type approximation to the integral. Make the approximation

Observe that

is an average of i.i.d. random variables expectation and light tails. Therefore, it satisfies a Large Deviations Principle with a rate function :

Readers familiar with Varadhan’s Lemma (see e.g. [5, page 32]) should expect that, as ,

In fact, the intuition behind the Lemma is that most of the “mass” of the expectation concentrates around , where achieves the above supremum. This means that the conditional measure described in Proposition 2 should concentrate around .

Our calculations confirm this reasoning. The usual statement of Varadhan’s Lemma does not apply directly because is an unbounded function of . Another minor technicality is that the function is divided by instead of . In what follows we have opted for a self-contained approach to our estimates, which gives quantitative bounds. This section collects the corresponding technical estimates. We finish the proof of Theorem 1 in the next section.

The next Lemma is a quantitative version of the large deviations principle (or Cramér’s Theorem) for .

Lemma 2.

For , define as in the introduction. Let be a vector of i.i.d. standard normal coordinates. Then:

with

for some independent of and . Moreover, is smooth and .

Proof.  This follows directly from Lemmas 5, 6 and 7 in subsection 6.1.  

We will use this Lemma to estimate expectations of the form:

The function defined below naturally shows up in our estimates.

(4.1)
Lemma 3.

For ,

where

and

with is as in Lemma 2. For ,

Proof.  Let . Note that:

We may compute the expectation of this expression as follows.

We split the above integral in two parts.

For part (I), we bound the probability in the integral by , and obtain:

because . Term (II) may be evaluated using the estimate from Lemma 2.

which has the desired form because

Similarly,

and we finish the proof via the identity

and using the bounds in Lemma 2 (which are valid for all ).  

5 Proof of main Theorem

The previous section shows that, in order to estimate the expectations in Lemma 3, we need to understand the function . The case of interest for us is when , which is when we recover the expectations in (3.1). Since varies with , we will consider instead:

(5.1)

and note that

(5.2)

The next Lemma contains some information on .

Lemma 4.

Let . Define as in equation (5.1) and as in Lemma 2. Then there exists a unique that maximizes over . Leting denote the value of the maximum, for any , there exists with:

Proof.  See subsection 6.2.  

We can now obtain good upper and lower estimates on the integral expressions in Lemma 3 and finish the proof of the main Theorem.

Proof.  [of Theorem 1] In this proof we assume for simplicity. We will use the notation to denote the value of a constant independent of that may change from line to line. Finally, we set

Lemma 4 and (5.2) give:

(5.3)

We will now apply this to estimate expectations to the left of . That is, we consider:

In this range is uniformly bounded, so and

Combining Lemma 3 with and (5.3), we obtain:

At the same time,

For bounding the expectation for , we cannot simply use and . However, note that

Also, recalling the expression for in Lemma 2,

This allows us to obtain, for ,

This leads to our main results. Indeed, if we apply the above bounds with , we obtain that, as

This implies the first statement in the Theorem via Proposition 1.

Secondly, we apply Proposition 2 and obtain:

and (for small enough, so that below is ):

 

6 Auxiliary results

6.1 Lemmas on large deviations of

The goal of this subsection is to prove a series of Lemmas that together imply Lemma 2. We first find an expression for the Laplace transform of

Lemma 5.

Let be a standard normal random variable. For all ,

where , with .

Proof.

 

We will need to compute the large deviations rate function for , with i.i.d. standard normal. As usual, this is given by the Fenchel-Légendre transform of :

The next lemma collects technical facts on and the value that achieves the minimum.

Lemma 6.

For each , there exists a unique such that

Defining: