Computing the partition function of the Sherrington-Kirkpatrick model is hard on average

10/13/2018 ∙ by David Gamarnik, et al. ∙ MIT 0

We consider the algorithmic problem of computing the partition function of the Sherrington-Kirkpatrick model of spin glasses with Gaussian couplings. We show that there is no polynomial time algorithm for computing the partition function exactly (in the sense to be made precise), unless P=#P. Our proof uses the Lipton's reducibility trick of computation modulo large primes lipton1991new and near-uniformity of the log-normal distribution in small intervals. To the best of our knowledge, this is the first statistical physics model with random parameters for which such average case hardness is established.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We consider the algorithmic complexity of the problem of computing the partition function of the Sherrington-Kirkpatrick model with Gaussian couplings. The model is defined as follows. Fix a positive integer . Given a sequence and parameter called inverse temperature, the associated partition function is defined as

where denotes any spin configuration. The Sherrington-Kirkpatrick (SK for short) model corresponds to the case when are i.i.d. with standard normal distribution. Throughout the paper is assumed to be a constant independent of . The normalization is need to have a non-trivial free energy limit , the computation of which was a long journey starting with Parisi’s replica method, see recent book by Panchenko [Pan13] on this topic. The algorithmic problem, which is the focus of this paper, is the problem of computing when is given as a (random) input. The (worst-case) algorithmic problem of computing for arbitrary input is known to be #P-hard for a much broader class of statistical physics models and the associated partition functions. The case of random inputs is a totally different story, however, as the classical method of reduction used for establishing the formal hardness in the worst-case sense, seems not applicable to the case of random inputs. The case of random (Gaussian) inputs

is the subject of the paper. Since the input to the model is a real valued vector

, the notion of algorithmic input needs to be discussed. We consider the model where the values (or more relevantly ) are computed up to a certain level of precision and the resulting approximations are defined as model input. Here is defined as . The algorithm designer is required to construct a polynomial time algorithm for computing exactly the partition function associated with input instead of the original input .

Our main result shows that if such an algorithm succeeds with probability at least

then P=#P. Here the randomness is with respect to the randomness of the input . For our result to hold we assume that the number of precision bits is at least and at most . The logarithmic lower bound is required for technical reasons to ensure near-uniformity of the density function of the log-normal distribution. The polynomial upper bound on is required so that the problem instance itself has polynomial size, as otherwise the polynomiality of the algorithm’s running time is not sound.

Our proof uses Lipton’s trick [Lip91] for establishing the average case hardness of computing the permanent of a matrix with entries chosen i.i.d. uniformly at random from a finite interval of integers. He observed that when the computation is done modulo appropriately chosen prime number , larger than the size of the matrix, the permanent can be written as a polynomial of an integer variable, corresponding to integer multiples of a random uniform input, see Proposition 3.3 below. Then solving for the coefficients of polynomial allows for computing the permanent of an arbitrary matrix, using the algorithm with random inputs as a sub-routine. Thus the average-case hardness of the problem is equivalent to the worst-case hardness of computing the permanent a matrix, which is known to be #P-hard. Lipton’s result assumes the existence of an algorithm which succeeds with probability , and a similar requirement of probability of the success of algorithm is adopted for our result. Subsequently, the Liptons’s result was strengthened to assume only constant probability of success [FL92], and then inverse polynomial probability of success [CPS99]. Our assumption is similar to the one of Lipton’s. The exponent appears due to the degree of the associated polymomial, vs. for the permanent. Weakening this assumption on the algorithmic strength is left for future work.

The method we use relies on an idea similar Lipton’s, but several additional steps are needed. First we reduce the problem to the problem of computing the partition function associated with ”cuts”, appropriately defined. This is to avoid the necessity of dealing with two correlated random inputs and , depending on the sign of the product . Second, we need to establish that the density of the log-normal distribution is ”sufficiently” Lipschitz continuous in small intervals, and for this purpose we need to appropriately bound the tails of the log-normal distribution in scaling .

The proof technique utilized in this papers as well as the ones appearing in earlier similar results, including [Lip91] and [CPS99], is unfortunately too fragile to establish hardness results for algorithms which only aim at some approximation guarantees. Thus for example the average case hardness of computing to within a multiplicative factor , which is a common benchmark in the field of approximation algorithms, remains open. Another potential direction is to establish a formal average case hardness under a different computation model – Blum-Shub-Smale machine operating over real valued inputs [BSS89],[BCSS97]. It is conceivable that techniques similar to the ones used here are extendable to this model of computation as well. There the challenge is that it is not clear what is the appropriate real valued analogue of .

There are some immediate extensions of our result, which we do not pursue for the sake of simplicity. The SK model can be replaced by -spin model interaction defined by

. The same trick of reducing this Hamiltonian to Hamiltonian on cuts works here as well. The assumption of Gaussianity also can be replaced by random variables with sufficiently well behaved density. At the same time, the current method cannot treat the Bernoulli case

with probability , and the average case hardness of this problem is a very interesting open question. It is worth noting that the average case hardness of the problem of computing a permanent of a matrix with entries is open as well. Another interesting open problems is the average case hardness of the problem of computing the ground state of the SK model, namely the problem of computing . Again the current method based on modulo prime computation is not applicable for this problem as the algebraic structure of the partition function is lost when one switches to the minimization. The approach of approximating the ground state by using a very low temperature (high value of

), again seems useless, as one would need to establish the hardness of the problem of computing the partition function approximately, which as we mentioned above we are not capable of doing yet. The problem of computing the ground state of the SK model was raised recently as one of the open problems at the American Institute of Mathematics workshop ”Phase transitions in randomized computational problems” in June 2017 

[Aim].

We close this section with a list of notational conventions. and are respectively the set of all integers, the set of all non-negative integers, and the set of all integers with . For any , denotes the largest integer which does not exceed . The function is assumed to have base .

2 Algorithmic formalism and the main result

The problem of computing is reducible to the problem of computing the partition function associated with cut problem defined below. For each spin configuration let

be the associated cut value. Observe that

implying that

Since is trivially computable, it suffices to compute . For convenience we use the same notation and define

dropping the factor which is absorbed by and flipping the sign, which is irrelevant due to the symmetry of normal distribution.

Denoting further by , we have

For arbitrary let

Our focus is on algorithms which compute exactly, in the sense to be made precise now, and which succeeds with high probability with respect to the randomness of . Since both and can take irational values, they cannot be used as formal algorithmic input. We adopt here a model of computation where the values are approximated up to a certain level of digital precision, specified by the algorithm designer, and these approximations are used as an input to the algorithm. Specifically, given real values , the designer fixes a positive integer and obtains first . The computational goal of the algorithm designer is to compute exactly, where . Namely, the algorithmic goal is to compute the exact value of the partition function defined in terms of approximations of .

For convenience, we now switch to a model with integer inputs. Let

(1)

For each let denote the cardinality of the set . Let

(2)

and thus we focus on computing instead. We note that is integer valued since . For every integer valued vector we define as in (2). Fixing any , a key observation is that

(3)

is a polynomial in with degree

(4)

We now state our main result which establishes hardness of computing .

Theorem 2.1.

Suppose the precision value satisfies , for any constant . Namely the number of bits in the precision is at least logarithmic and at most polynomial in . If there exists a polynomial in time algorithm which on input produces a value satisfying

for all sufficiently large , then .

In the above, the probability is with respect to the randomness of , which in turn is derived from the randomness of .

Let us comment on the significance of assumption . It basically says that the input to the algorithm are integers with binary description (logarithm) of at most polynomial length . This assumption is adopted purely for the purpose of staying within polynomial time algorithms, which require input of at most polynomial length. The logarithmic lower bound is a technical requirement needed for the proof details.

3 Proof of Theorem 2.1

We begin by verifying that the density of log-Normal distribution is Lipschitz continuous within a finite interval and obtain a bound on the Lipschitz constant. Recall that are i.i.d. standard normal and .

Let denote the common density of .

Lemma 3.1.

For every satisfying and every , the following bound holds.

(5)
Proof.

The density of is given by

Here denotes the standard normal random variable. It is easy to see that

(6)

as since diverges faster than for every constant as . Also

(7)

as . Both bounds are very crude of course, but suffice for our purposes.

We have for every

Now since for , we obtain that in the range

Applying these bounds, exponentiating, and using the assumption on the lower bound on and , we obtain

(8)

We use the following simple result regarding the number of primes between and

Lemma 3.2.

The number of primes between and is at least for all sufficiently large .

Proof.

We use the Prime Number Theorem (PNT), which says that the number of primes at most , denoted by satisfies

If the number of primes between and is at most then . But since then implying

contradicting PNT when is large enough. ∎

We now fix any prime satisfying

(9)

and consider computation. Specifically for any we let denote so that takes values in . We may assume that the input is computed component-wise as well since is a sum-product of of entries of . Similarly, for an algorithm , we denote by the output computed .

Let ) be i.i.d. chosen uniformly at random in . The key to our hardness result is the following average case hardness result of computing the partition function when the input is distributed i.i.d. uniformly at random from .

Proposition 3.3.

Suppose is a polynomial in time algorithm which on any input produces some output . Suppose satisfies

(10)

Then .

The probability above is with respect to the randomness of .

Proof.

We will use as basis the #P-hardness of computing for arbitrary inputs . Namely, if there exists a polynomial time algorithm for computing for any input with probability bounded away from zero as , then P=#P. Fix an arbitrary . A key observation is that for each positive integer which is not a multiple of , and each , is distributed uniformly at random in . We generate a sample and for each we feed to the algorithm . We recall that is defined by (4). The independence of for is maintained as well. Then by this observation, applying (10) we obtain

Here we use that . Namely the algorithm correctly computes the polynomial defined by (3) with , at values with probability at least .

We now use the fact that, given a polynomial , if we have access to values , then we can compute efficiently the coefficients by solving the following system of equations with unknowns:

The non-degeneracy of the matrix above is guaranteed, since if there is another vector , solving the same system, then the -degree polynomial defined by coefficients has more than roots.

As a result we can compute efficiently

with probability . If now the algorithm is repeated

times and the value appeared the majority number of times, the probability that a wrong answer appears as majority is at most the probability that in a Binomial distribution with success probability

, the failure occurs at least half of the time, which by standard Chernoff bound is at most exponential in . Taking to be polynomial in , we obtain that our algorithm repeated polynomially many times succeeds in computing the partition function modulo with probability at least .

Now we use a standard method which shows that computing efficiently modulo primes can be converted to an efficient computation of quantity of interest. The details can be found in [Lip91] and also in Jerrum [Jer06]. We repeat the argument here for completeness. Given an integer suppose have an algorithm for computing it modulo a prime . Fix two primes and . As they are co-primes, using the Euclidean algorithm we can find integers and such . Using our algorithm we find find residues and solving and . Letting , we have and . Since and are co-primes, this means .

We now use this method inductively. Applying Lemma 3.2, we can find a sequence of primes with for all . This can be done by brute force search in polynomial time since is at most polynomial in , by the assumption of the theorem. Here is the precision value assumed in the statement of the theorem. Using the method above we find modulo . But this product is larger than . We claim that with high probability this in itself is larger than the partition function , and thus the algorithm finds not just a remainder, but the value of the partition function itself. To show the claim, note that using a tail bound for standard normal , namely , we have

which for gives a bound . Thus the value of is at most with probability at least . Thus w.h.p. the partition function is at most sum of terms, each of which is a products of at most terms each bounded by . The value of the partition function is thus at most

The lower bound assumption on stated in the theorem implies that this bound is which is dominated by , as claimed.

Thus using this algorithm we find the value of the partition function itself with probability implying P=#P. ∎

Next we establish that the distribution of is nearly uniform.

Lemma 3.4.

The following bound holds for every

Proof.

Recall that are defined by (1). We have for every

We let

and consider separately the case and . For applying Lemma 3.1 with

we have for very and such that

Since , we obtain

Applying the value of and we have . Given an upper bound implied by (9), we have that the exponent is

It is easy to check that

which can be checked by verifying that implies . Thus the term above is

We obtain a bound

Similarly, we obtain for the same range of

Thus

as the sum above is at most the integral of the density function, and thus at most .

We now consider the case . We have applying (6)

which applying the value of is .

Finally, suppose . Applying (7)

We conclude that

Thus

completing the proof of the lemma. A lower bound is shown similarly. ∎

We now complete the proof of Theorem 2.1. By Lemma 3.4, the total variation distance between and is , where the upper bound of (9) was used. Thus we can couple with i.i.d. uniform random variables on integers in , such that

In particular, using the union bound, we can couple with a vector

with i.i.d. uniform distribution on integers in

, such that

The coupling implies

and

Then if

we obtain that the algorithm satisfies

for all sufficiently large , contradicting Proposition 3.3. This completes the proof of the theorem.

References

  • [Aim] AimPL, Phase transitions in randomized computational problems. Available at http://aimpl.org/phaserandom, 2017.
  • [BCSS97] L. Blum, F. Cucker, M. Shub, and S. Smale, Complexity and real computation, Springer-Verlag. New York, 1997.
  • [BSS89] Lenore Blum, Mike Shub, Steve Smale, et al.,

    On a theory of computation and complexity over the real numbers:

    -completeness, recursive functions and universal machines
    , Bulletin (New Series) of the American Mathematical Society 21 (1989), no. 1, 1–46.
  • [CPS99] Jin-Yi Cai, Aduri Pavan, and D Sivakumar, On the hardness of permanent, Annual Symposium on Theoretical Aspects of Computer Science, Springer, 1999, pp. 90–99.
  • [FL92] Uriel Feige and Carsten Lund, On the hardness of computing the permanent of random matrices, Proceedings of the twenty-fourth annual ACM symposium on Theory of computing, ACM, 1992, pp. 643–654.
  • [Jer06] M. Jerrum, Counting, sampling and integrating: algorithms and complexity. Lecture notes, Chapter 2., 2006.
  • [Lip91] Richard Lipton, New directions in testing, Distributed Computing and Cryptography 2 (1991), 191–202.
  • [Pan13] Dmitry Panchenko, The sherrington-kirkpatrick model, Springer Science & Business Media, 2013.