Quantum-inspired annealers as Boltzmann generators for machine learning and statistical physics

by   Alexander E. Ulanov, et al.
University of Oxford

Quantum simulators and processors are rapidly improving nowadays, but they are still not able to solve complex and multidimensional tasks of practical value. However, certain numerical algorithms inspired by the physics of real quantum devices prove to be efficient in application to specific problems, related, for example, to combinatorial optimization. Here we implement a numerical annealer based on simulating the coherent Ising machine as a tool to sample from a high-dimensional Boltzmann probability distribution with the energy functional defined by the classical Ising Hamiltonian. Samples provided by such a generator are then utilized for the partition function estimation of this distribution and for the training of a general Boltzmann machine. Our study opens up a door to practical application of numerical quantum-inspired annealers.



There are no comments yet.


page 4

page 5

page 6

page 7

page 8


Boltzmann machines as two-dimensional tensor networks

Restricted Boltzmann machines (RBM) and deep Boltzmann machines (DBM) ar...

Information Perspective to Probabilistic Modeling: Boltzmann Machines versus Born Machines

We compare and contrast the statistical physics and quantum physics insp...

Quantum-enhanced Markov chain Monte Carlo

Sampling from complicated probability distributions is a hard computatio...

Bayesian machine learning for Boltzmann machine in quantum-enhanced feature spaces

Bayesian learning is ubiquitous for implementing classification and regr...

Sample-efficient learning of quantum many-body systems

We study the problem of learning the Hamiltonian of a quantum many-body ...

Boltzmann Generators - Sampling Equilibrium States of Many-Body Systems with Deep Learning

Computing equilibrium states in condensed-matter many-body systems, such...

On exploring practical potentials of quantum auto-encoder with advantages

Quantum auto-encoder (QAE) is a powerful tool to relieve the curse of di...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The Ising model is a cornerstone of various fields of science ranging from magnetism Baxter2014 and quantum field theory Zuber1977 to combinatorial optimization McMahon2016 and finance Sornette2014 . This model analyzes magnetic interactions in a set of spin- particles. The energy of each spin configuration with external fields is given by



is the vector of spins values,

the coupling matrix, b

the bias vector and

is the number of interacting spins. The Ising problem consists in finding the ground state or low-energy spin configurations of the energy functional (1). This is known to be an NP-hard combinatorial problem Barahone1982 and multiple classical numerical algorithms Kirkpatrick1983 ; Bilbro1989 ; Benlic2013

, neural-network based methods

Smith1999 ; Barrett2019 , quantum hardware devices McMahon2016 ; Harris2018 ; Inagaki2016 and quantum-inspired numerical algorithms King2018 ; Kalinin2018 ; Tiunov2019 have been developed to approximately solve it.

The joint probability corresponding to each spin configuration is described by the Boltzmann distribution


where is the inverse temperature of the system and the normalizing constant is referred to as the partition function. Knowledge of the partition function is important in multiple tasks ranging from statistical physics Wu2019 to machine learning Ma2013 . While the unnormalized probability is easy to compute for any spin configutation, the exact calculation of the partition function is intractable for a large and fully connected spin system. Therefore, approximate methods are usually employed. One of the most popular approaches is the iterative solution of a set of mean-field equations, such as the naive mean-field approach, Bethe Bethe1935 and Thouless-Anderson-Palmer Thouless1977 approximations. While relatively easy to implement, mean-field-based methods Jordan1999 usually perform poorly when dealing with systems having strong, long-range correlations between spins.

An alternative approach is by generating unbiased samples from the true Boltzmann distribution, which can then be used to estimate the partition function either by direct summation or using more advanced algorithms such as annealed importance sampling (AIS) Neal1998 . Boltzmann sampling is an important problem in its own right, relevant to multiple tasks in machine learning Hinton2006 , many-body physics Carleo2017 , quantum-state tomography Torlai2018 ; TiunovQST2019 , chemistry Zhu2002 and protein folding Rizzuti2013 .

Software-based sampling algorithms include Markov chain Monte Carlo (MCMC)

Newman1999 or simulated annealing Kirkpatrick1984 ; however, these methods are prone to get trapped in local optima. Recently, this family was joined by machine-learning (ML) based algorithms, namely Variational Autoregressive Networks (VAN) Wu2019 and Boltzmann Generator Networks (BGN) Noe2019 , which have been demonstrated to be efficient for both approximate Boltzmann sampling and partition function estimation.

In 2014, Dumoulin et al. Dumoulin2014 proposed to use quantum hardware — specifically, the quantum annealer — as a fast source of samples from a given Boltzmann distribution. This study analyzed the main limitations imposed by existing quantum hardware (such as D-Wave) among which are noise in parameters, limited parameter range, and restrictions in available architectures. The proposed method was experimentally implemented using D-Wave 2X quantum annealer Benedetti2016 ; Benedetti2017 for the training of a Boltzmann machine Hinton2006 ; Teh2006 . However, the above mentioned limitations in existing quantum annealers limited the study to low-dimensional datasets.

A novel device that appears promising in this context is the optical coherent Ising machine (CIM), which was shown McMahon2016 ; Inagaki2016 to find good ground state approximations for classical, fully connected Ising systems of sizes up to 2000 spins. In a separate study, CIM was shown to significantly outperform the D-Wave annealer in application to densely connected Ising systems CIMvsDwave . This technology inspired a range of classical algorithms, some of which — the noisy mean-field annealing (NMFA) King2018 and the CIM simulation (SimCIM) Tiunov2019 — were shown to outperform the CIM in terms of both the computational speed and mean sample energy. A further advantage of these algorithms is the ability to operate with Ising Hamiltonian with the coupling and bias matrix elements described by arbitrary real numbers. However, to the best of our knowledge, neither CIM nor any of the quantum-inspired algorithms (NMFA or SimCIM) have been applied to Boltzmann sampling yet.

An important application of the Boltzmann sampling and the partition function estimation is the setting known as the inverse Ising problem Nguyen2017 in statistical physics or Boltzmann machine (BM) Hinton1983 ; Hinton2002 in ML. It consists in finding the coupling matrix and bias vector b entering the Boltzmann probability law (2), which maximizes the likelihood of a specific set of spin configuration samples. The samples are utilized to estimate the intractable terms in the gradients arising in the training.

In this paper, we find that the quantum-inspired numerical annealer SimCIM Tiunov2019 is capable of drawing high-quality unbiased samples from a Boltzmann probability distribution associated with the Ising model. We demonstrate the training of a fully-visible and fully-connected BM using samples provided by SimCIM on two training sets. The first dataset (Bars&Stripes, BAS) is of low dimension, which permits exhaustive search over all possible spin configuration, thereby enabling direct quality estimation and comparison of various sampling methods. Second, we train the BM on the high-dimensional dataset of handwritten digit images MNIST. The corresponding Ising graph is fully connected and has a dimension of 794. In addition to character recognition, we numerically estimate the partition function of this system. In the latter task, we use AIS with SimCIM as the sampler for all intermediate distributions.

SimCIM implementation for Boltzmann sampling is beneficial compared to VAN or BGN because, in order to draw samples, SimCIM requires no knowledge about the system of interest apart from the spin interaction strength. Unlike VAN or BGN, SimCIM does not require specific neural network training for each update of the coupling matrix and bias vector and each run of the partition function estimation, which could be very demanding for high dimensional systems. On the other hand, in comparison with analog annealers such as D-Wave or CIM, SimCIM is advantageous in that it supports real-valued and fully-connected high-dimensional coupling matrices and could be executed on a classical computer.

Ii Review of previous results and methods

In this section, we give a brief recap of previous results and techniques that are relevant to our work.

ii.1 General Boltzmann Machines

The Boltzmann machine (BM) is a stochastic energy-based data model which was introduced at the dawn of ML by Hinton and Sejnowski Hinton1983

. BM is the primary component of deep belief networks

Hinton2006 and deep Boltzmann machines Salakhutdinov2009 . Mathematically, the BM can be represented as a complete graph whose nodes represent binary units that can take on the values of . The nodes are connected by edges, each of which is associated with an arbitrary real value. The graph is assigned the “energy” described by Eq. (1).

To apply the BM for machine learning — for example, character recognition — we associate every pixel of the bitmap containing the character with a node of the BM. The BM parameters (coupling matrix and biases) are “trained”, i.e. assigned such values that the energy (1) associated with the bitmaps within the training set is significantly lower than that for an arbitrary bitmap. Then, when posed with a task of recognizing whether a particular unknown bitmap resembles those from the training set, the energy (1) for that bitmap is calculated. A low value would indicate a high likelihood of the affirmative answer.

In order to facilitate the training, we further associate the energy with the Boltzmann-like probability given by Eq. (2) with . The optimization objective during the BM training is the maximization of the total log-likelihood of all bitmaps of the training set S. We then have HintonPracticalGuide


which gives rise to a conceptually straightforward gradient descent update rule


where is the learning rate. In Eq. (4), the first term is the expectation with respect to the training set and often called the positive phase of the update, while the second term is the expectation over the model’s probability distribution and called the negative phase.

Practical implementation of this gradient descent is complicated by the fact that calculating the negative phase requires exhaustive search over all possible configurations of s, and is hence intractable for a classical computer. Therefore it is usually replaced with a numerical estimation using a set of configuration samples drawn from the distribution (2). In this case, Eqs. (4) become


where is the number of elements in the training set, the number of drawn samples.

Traditionally, in the context of BM training, the sampling has been approached using MCMC Salakhutdinov2009 . This approach is however too slow to be practical because sometimes it requires an extremely large number of steps to give unbiased equilibrium samples. This motivates the interest to quantum annealers and their simulators in the context of BM training.

ii.2 Annealed importance sampling

It is not required to know the exact value of partition function for the approximate BM training procedure introduced above. However, this knowledge of is necessary for the evaluation of the training set likelihood , and hence permits the estimation of the training quality. While it appears straightforward that the partition function can be evaluated directly by performing summation over a large number of samples, this approach is less efficient than the procedure known as annealed importance sampling Neal1998 .

Consider two probability distributions and defined as and , where is the corresponding unnormalized probability which is easily computable for both distributions. Assume that is uniform while is the distribution of our interest. Then we can estimate

with samples from the uniform distribution using importance sampling


where denotes the samples drawn from the uniform distribution . This method, while being relatively simple to implement, gives estimates of

with high variance especially when the target distribution

differs significantly from uniform. To reduce the variance of this estimate, a sequence of intermediate distributions , where and is chosen to gradually transition from the uniform to target distribution. For each pair of consecutive distributions, the importance weight is calculated using Eq. (II.2) and then the partition function of the target distribution is estimated by multiplying


To estimate using AIS, it is necessary to draw multiple unbiased samples from all intermediate distributions. Given that can be very large, a fast sampling method is essential.

ii.3 SimCIM numerical annealer

SimCIM is an iterative algorithm for sampling low-energy spin configurations in the classical Ising model. It treats each spin value as a continuous variable from the range . Each iteration begins with calculating the mean field acting on each spin by all other spins. Then the gradients for the spin values are calculated according to , where are the annealing control parameters and is Gaussian noise. Then the spin values are updated according to , where

is the activation function


After multiple updates, the spins will tend to either or and the final discrete spin configuration is obtained by taking the sign of each . The typical evolution of spin values and the corresponding energy is shown in Fig. 1.

Figure 1: Application of a digital annealler to the G6 graph from the G-set Gset . a) Trajectories of individual spins. b) Evolution of the spin configuration energy for SimCIM.

SimCIM has been tested on a variety of problems, including graphs from the G-set Gset . Implemented on a consumer graphic processor, this algorithm runs faster and generates higher quality samples than many analog and digital annealing processes.

ii.4 Routine for effective temperature estimation

In 2016, Benedetti et al. Benedetti2016 pointed out that quantum annealers that have strong interaction with the environment, such as D-Wave 2000Q, freeze out the dynamics of a spin system before the termination of the annealing process. As a result, such annealers sample from a Boltzmann distribution with some finite temperature. As we demonstrate in the next section, the samples generated by SimCIM have the same property.

As we see from Eqs. (1) and (2), scaling the coupling matrix and bias vector b is equivalent to scaling the effective temperature by the same factor. This enables us to control the temperature of the distribution from which the samples are drawn, which is necessary for many applications, including BM training. To take advantage of this capability, however, we also need a method that would allow to be measured. We describe this method below Benedetti2016 .

Consider a sample set corresponding to some inverse temperature . The Boltzmann probability of a spin configuration with energy is , where is the degeneracy of level . For two energy levels and , separated by , the logarithmic ratio of the probabilities is given by


Because of the unknown degeneracies, we cannot use the above equation to evaluate directly from a set of samples. However, one can obtain an additional set of samples with a different inverse temperature, , by scaling the coupling matrix and bias vector by . The difference of the log ratios for the two sets is then


Now and can be estimated from the slope of the linear dependence between and and the ratio .

For this method to work well, it is necessary that the two sets of samples have significant overlap, i.e. contain multiple samples within small energy intervals around both and . This can be achieved by judicious choice of .

Iii Results

iii.1 Temperature evaluation for SimCIM

The goal of this subsection is to demonstrate SimCIM’s capability to draw unbiased samples from a desired Boltzmann distribution, and that the methods of temperature evaluation and control described in section II.4 apply to SimCIM. We work with an Ising Hamiltonian with and being a symmetric matrix whose off-diagonal elements are uniformly sampled from and the diagonal elements are set to . This relatively simple Ising problem allows exhaustive search over spin combinations, thereby enabling us to draw a sample set from the exact Boltzmann distribution, which can be used as a benchmark.

Figure 2: Temperature evaluation and control for SimCIM. (a) Histograms of 50000 vectors sampled either by SimCIM with different parameters or from the true Boltzmann distribution. (b) Dependencies (10) for pairwise combinations of the sample sets from (a).

We sampled two batches of 50000 vectors via SimCIM with different inverse temperatures and , with , by scaling the coupling matrix . An additional set of 50000 samples was drawn from the true Boltzmann distribution with . The histograms of these samples are presented in the Fig. 2 (a) and the pairwise dependencies (), defined by Eq. (10), in Fig. 2 (b). The linear shape of these dependences shows that the samples produced by SimCIM follow the Boltzmann distribution.

Our next goal is to use SimCIM to sample from the Boltzmann distribution with . To this end, we determine the slope of the dependencies () for the two sample sets drawn from SimCIM [red dots in Fig. 2 (b)] and find , from which we find and . Using this information, we rescale the coupling matrix for and sample 50000 additional vectors using SimCIM. The dependence () for this set with respect to the true Boltzmann distribution is close to constant, as expected [Fig. 2(b)].

iii.2 BM training on Bars & Stipes dataset

To benchmark the SimCIM performance as a sampling mechanism for machine learning, we start with BM training on the simple synthetic dataset from the Bars & Stripes (BAS) family [Fig. 3]. It consists of 30 instances of 4 4 bitmaps, with each pixel taking a value from . All instances occur with the same probabilities except the first and last one in Fig. 3, whose probability of occurrence is twice as high. We refer to this as the “ground truth probability distribution” of the training set. Similarly to the previous section, this simple dataset allows to do a full exhaustive search over all spin configurations and is thus handy for benchmarking.

Figure 3: Bitmaps from the BAS dataset. Each picture is 4 4 pixels. Pixels are either -1 (blue) or +1 (red).

The BM was trained using the method described in Sec. II.1. Each bitmap was unrolled into a vector of size 16. The gradients’ positive phase was always calculated from the full training set. The negative phase of gradients from Eqs. (II.1) was calculated using several different approaches. As a baseline method for comparison, we exactly calculated the gradients by exhaustive search of all possible spin configurations. The second method estimated gradients using samples drawn by MCMC with the chain length of 1000. Third, we utilized vectors sampled by SimCIM with and without temperature correction. All approximate methods of gradients’ negative phase estimation used sample sets of equal size 250. To obtain temperature corrected sample sets, two additional sample sets of size 250 were produced for each update (see Sec. II.4).

To monitor the training process, we selected two numerical metrics. First, we calculated the mean log-likelihood of the training set with respect to the current state of the BM parameters. Second, we monitored the Kullback-Leibler (KL) divergence between the ground truth probability distribution defined by the training set and the distribution defined by the current BM parameters, which can be calculated explicitly thanks to the small size of the problem. Both metrics were evaluated after each update of the BM parameters. The learning curves are presented in Fig. 4.

Figure 4:

BM learning curves for the BAS dataset. (a) Average log-likelihood of the training set computed after each update of the BM parameters. The BM was trained by full exhaustive search (orange), vectors sampled by MCMC (blue), SimCIM with temperature compensation (green), and SimCIM without temperature compensation (red). The black dashed line corresponds to the ground truth average log-likelihood of the training set. (b) Kullback-Leibler divergence between the ground truth probability distribution defined by the training set and probability distribution defined by the current BM parameter set.

From these learning curves, we can see that gradients calculated using samples drawn by SimCIM with temperature correction follow the true gradients better than all other methods do. This is to be expected, because in order for the training method defined by Eqs. (II.1) to work well, the samples must be drawn from the Boltzmann distribution with temperature 1. The vectors sampled by SimCIM satisfy this requirement better than samples provided by other sampling mechanisms.

iii.3 Partition function estimation

In this section, we implement AIS with SimCIM as the sampling tool to estimate the partition function

of a Boltzmann distribution and compare its performance with other approximate methods. The energy functionals for this test have been obtained from the coupling and weight matrices from consecutive epochs of the BM training process described in the previous subsection. The BM was trained using true gradient updates for 2000 epochs. The partition function was estimated every 50 epochs.

The methods included in the comparison are as follows.

  • AIS equipped with SimCIM and MCMC samplers for importance weights calculations (7). We used 10 intermediate distributions and sample sets of size 250 to compute each importance weight. Consecutive intermediate distributions differed from each other only by the inverse temperature . To gradually transition from the uniform () to target distribution () we increase in steps. At each step, we used two additional sample batches of size 250 to estimate SimCIM’s effective temperature. In total, AIS with SimCIM employed 7500 samples for the computation of each partition function, where only 2500 samples were used for the algorithm and rest are auxiliary samples for the temperature correction.

  • VAN — an artificial neural network-based method for partition function estimation of discrete distributions, which was sown to outperform mean-field family algorithms Wu2019 . VAN was trained either for 2000 epochs with 250 samples in each batch (which gives samples in total) or for 1500 epochs with batch-size of 5 (which is 7500 samples total).

  • Direct sampling, with the sample sets of size either 7500 or drawn by SimCIM.

Figure 5: (a) Estimation of the logartithm of the partition function LogZ for the probability distributions defined by the BM at different training epochs, implemented using several algorithms: AIS with MCMC (green) and SimCIM (orange) samplers; VAN with a total or 7500 (blue) and (cyan, dashed) samples; direct sampling from SimCIM with 7500 (black) or (gray, dashed) samples. The exact value (red) is shown for comparison. Inset: difference between the exact value and various approximations of . (b) KL divergence between the true Boltzmann distribution and its approximations derived from vectors sampled by different algorithms.

The results are sown in Fig. 5. We see that the most competitive methods are AIS equipped with the SimCIM sampler and VAN; however, the total sample consumption with AIS+SimCIM was 7500 while that with VAN was .

Fig. 5 (b) shows the KL divergence between the true Boltzmann probability distribution and approximate distributions derived using samples drawn by SimCIM, MCMC and trained VAN. We can see that VAN (which consumed more samples for training) approximates the true Boltzmann probability distribution better compared to all other methods. However, VAN requires training for every new BM parameter set, which could be quite time-consuming. For example, it takes around 15 seconds on a desktop with an NVidia GTX 1080 Ti GPU for VAN, while SimCIM samples 7500 vectors in 1.5 seconds.

Figure 6: Bitmaps from the MNIST dataset.

iii.4 BM training on MNIST dataset

To further test the capabilities of SimCIM as a sampling mechanism, we trained a BM on bitmaps from the MNIST dataset (Fig. 6). This dataset contains 60000 instances of

pixel grayscale bitmaps of handwritten digits in the training set and 10000 similar bitmaps in the test set. Each bitmap was unrolled into a vector and the vector elements after normalization were binarized to the values

with a threshold of 0.3. In order to enable the recognition of different digits, each training bitmap was augmented with a 10-bit one-hot vector such that for and otherwise, where is the label of (i.e. the actual digit corresponding to) the bitmap. In this way, we obtained a set consisting of 60000 binary vectors of length , which we split into training and cross-validation sets containing 50000 and 10000 images respectively.

During the training, we used minibatches of size 500 for the calculation of the positive phase of the gradient. The negative phase was calculated using vector sets of size 250 sampled by SimCIM with and without effective temperature adjustment. The model’s parameters were updated after each minibatch. The BM initial weights were initialized as small random numbers. The training procedure was executed for 50 epochs, each epoch containing updates.

The training process has been evaluated using two benchmarks. The first one was the average log-likelihood of the test set bitmaps. Because the exact value of this parameter is intractable to compute in this case, we utilized AIS equipped with the SimCIM sampler to approximate the partition function and subsequently estimate the average log-likelihood. The second metric was the classification accuracy of the test set digits. Each test bitmap was augmented with 10 variants of the one-hot vectors corresponding to 10 possible labels, and the corresponding 10 energy values were calculated. The label corresponding to the lowest energy yielded the model’s prediction for the digit contained in the bitmap. The choice of energy, rather than log-likelihood, as the classification criterion, is justified in this case, because, for a given coupling matrix, the partition function is constant and hence the likelihood (2) is a monotonic function of the energy.

Figure 7: Learning curves for the approximate log-likelihood (a) and classification accuracy (b).

The learning curves are shown in Fig. 7

. We see that the BM can be trained successfully using the SimCIM sampler in view of both evaluation metrics. Consistently with our previous observations, the models trained using SimCIM with effective temperature adjustment perform better in terms of both the average log-likelihood and classification accuracy.

The classification accuracy reaches 86.9% [Fig. 7

(b)]. This result compares poorly to state-of-the-art supervised learning techniques based e.g. on convolutional deep neural networks

Ciresan2012 but is comparable to those obtained by other neural network architectures without hidden units. We believe that adding hidden units to our model would significantly enhance its accuracy.

In order to test the generative capabilities of the trained model, we used it to generate bitmaps corresponding to different digits. To that end, a one-hot vector corresponding to a particular digit was chosen and the remaining part of the spin vector was sampled using the coupling matrix obtained through the training. A fixed gives rise to effective biases that ensure that bitmaps depicting the digit have lower energies, and are thus more likely to be produced by the annealer.

The results of this numerical experiment are shown in Fig. 8. We can see that the quality of generated digits improves with training.

Figure 8: Digit reconstruction by the Boltzmann machine trained using the SimCIM sampler for (a) 4, (b) 9, (c) 14, (d) 29, and (e) 44 epochs respectively.

Iv Summary

We have demonstrated the quantum-inspired digital annealer SimCIM as a mechanism that provides high-quality one-shot samples from the Boltzmann distribution for a classical Ising energy functional. We used the samples to train a fully-visible and fully-connected general Boltzmann machine and to estimate the partition function of a high-dimensional probability distribution. The model trained on the MNIST dataset is able to achieve a reasonable classification accuracy and meaningful picture reconstruction. Approximate values of the partition function, as well as gradients computed using samples provided by SimCIM, agree well with the exact values computed using exhaustive search in a setting where such search is possible.

Implementation of SimCIM as a Boltzmann generator is beneficial compared to deep neural network-based methods because SimCIM does not require time-consuming training and could work in a plug-and-play manner. On the other hand, SimCIM is beneficial compared to quantum annealers such as D-wave because existing quantum annealing hardware suffers from multiple limitations, such as the noise in parameters, limited parameter range, restrictions in available architectures, decoherence, etc. Dumoulin2014 . Our algorithm enables simulation of various settings in which these limitations are curtailed or even completely eliminated, and therefore provides insights into the potential that near-term noisy intermediate-scale quantum (NISQ) devices Preskill2018 may have in application to various tasks such as ML or statistical physics.

This study motivates several open questions that could be of interest both for machine learning and quantum physics communities. ML researchers may wish to investigate whether or not quantum annealers and their simulators are applicable to the training of multilayer deep Boltzmann machines Salakhutdinov2009 , deep belief networks Hinton2006 or restricted BM based convolutonal neural networks Lee2011 and make rigorous comparison with existing methods of their training. For physicists, the approach to BM training and partition function estimation presented here may be handy in the tasks of finding neural-network based solutions to quantum optimization problems, such as the ground states of molecules Xia2018 and quantum many-body systems Carleo2017 as well as quantum state tomography Torlai2018 ; TiunovQST2019 of high dimensional systems. A further interesting direction of future research is the development of analog annealers with the aim to outperform existing digital annealing algorithms. These annealers need not be limited to the quantum domain; indeed, analog annealers that successfully employ classical hardware have been proposed and tested Bello2019 ; Chou2019 .


We are grateful to Jacob D. Biamonte for a fruitful discussion. This work is supported by Russian Science Foundation (19-71-10092).


  • (1) R. J. Baxter. Exactly solved models in statistical mechanics (Dover Publications, 2014).
  • (2) J. B. Zuber, and C. Itzykson. Quantum field theory and the two-dimensional ising model, Phys. Rev. D 15, 2875 (1977).
  • (3) P. L. McMahon, A. Marandi, Y. Haribara, R. Hamerly, C. Langrock, et. al.. A fully programmable 100-spin coherent Ising machine with all-to-all connections, Science 354, 614 (2016).
  • (4) D. Sornette. Physics and financial economics (1776-2014): puzzles, ising and agent-based models, Reports on Prog. Phys. 77, 062001 (2014).
  • (5) F. Barahona. On the computational complexity of ising spin glass models, J. Phys. A: Math. Gen. 15, 3241 (1982).
  • (6) S. Kirkpatrick, C. D. Gelatt, and M. P. Vecchi. Optimization by simulated annealing, Science 220, 671 (1983).
  • (7) G. Bilbro, R. Mann, T. K. Miller, W. E. Snyder, D. E. Van den Bout, and M. White. Optimization by mean field annealing, in Advances in neural information processing systems, pp. 91–98 (1989).
  • (8) U. Benlic and J.-K. Hao. Breakout local search for the max-cut problem, Eng. Appl. Artif. Intell. 26, 1162 (2013).
  • (9) K. A. Smith. Neural networks for combinatorial optimization: a review of more than a decade of research, INFORMS J. on Comput. 11, 15 (1999).
  • (10) T. D. Barrett, W. R. Clements, J. N. Foerster, and A. I. Lvovsky. Exploratory Combinatorial Optimization with Reinforcement Learning, arXiv:1909.04063 (2019).
  • (11) R. Harris, Y. Sato, A. J. Berkley, M. Reis, F. Altomare, et. al.Phase transitions in a programmable quantum spin glass simulator, Science 361, 162 (2018).
  • (12) T. Inagaki, Y. Haribara, K. Igarashi, T. Sonobe, S. Tamate, et. al. A coherent ising machine for 2000-node optimization problems, Science 354, 603 (2016).
  • (13) A. D. King, W. Bernoudy, J. King, A. J. Berkley, and T. Lanting. Emulating the coherent Ising machine with a mean-field algorithm. arXiv: 1806.08422 (2018).
  • (14) K. P. Kalinin, and N. G. Berloff. Global optimization of spin Hamiltonians with gain-dissipative systems, Scientific Reports 8, 17791 (2018).
  • (15) E. S. Tiunov, A. E. Ulanov, and A. I. Lvovsky. Annealing by simulating the coherent Ising machine, Opt. Express 27, 10288 (2019).
  • (16) D. Wu, L. Wang, and P. Zhang. Solving Statistical Mechanics Using Variational Autoregressive Networks, Phys. Rev. Lett. 122, 080602 (2019).
  • (17) J. Ma, J. Peng, S. Wang, and J. Xu. Estimating the partition function of graphical models using Langevin importance sampling, Journal of Machine Learning Research 31, 433 (2013).
  • (18) H. A. Bethe. Statistical theory of superlattices, Proc. R. Soc. Lond. A 150, 552 (1935).
  • (19) D. J. Thouless, P. W. Anderson, and R. G. Palmer. Solution of ’Solvable model of a spin glass’, Philos. Mag. 35, 593 (1977).
  • (20) M. I. Jordan, Z. Ghahramani, T. S. Jaakkola, and L. K. Saul. An Introduction to Variational Methods for Graphical Models, Mach. Learn. 37, 183 (1999).
  • (21) M. E. J. Newman, and G. T. Barkema. Monte Carlo Methods in Statistical Physics, Clarendon Press, 1 edition (1999).
  • (22) S. Kirkpatrick. Optimization by simulated annealing: Quantitative studies, Journal of statistical physics 34, 975 (1984).
  • (23) G. E. Hinton. Reducing the Dimensionality of Data with Neural Networks, Science 313, 504 (2006).
  • (24) G. Carleo, and M. Troyer. Solving the quantum many-body problem with artificial neural networks, Science, 355, 602 (2017).
  • (25) G. Torlai, G. Mazzola, J. Carrasquilla, M. Troyer, R. Melko, and G.Carleo. Neural-network quantum state tomography, Nature Physics, 14, 447 (2018).
  • (26) E. S. Tiunov, V. V. Tiunova, A. E. Ulanov, A. I. Lvovsky, and A. K. Fedorov. Experimental quantum homodyne tomography via machine learning, arXiv: 1907.06589 (2019).
  • (27) Z. Zhu, M. E. Tuckerman, S. O. Samuelson, and G. J. Martyna. Using Novel Variable Transformations to Enhance Conformational Sampling in Molecular Dynamics, Phys. Rev. Lett. 88, 100201 (2002).
  • (28) B. Rizzuti, and V. Daggett. Using simulations to provide the framework for experimental protein folding studies, Archives of Biochemistry and Biophysics, 531 128 (2013).
  • (29)

    V. Dumoulin, I. J. Goodfellow, A. Courville, and Y. Bengio. On the Challenges of Physical Implementations of RBMs. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence (2014).

  • (30)

    M. Benedetti, J. Realpe-Gómez, R. Biswas, and A. Perdomo-Ortiz. Estimation of effective temperatures in quantum annealers for sampling applications: A case study with possible applications in deep learning, Physical Review A, 022308 (2016).

  • (31) M. Benedetti, J. Realpe-Gómez, R. Biswas, and A. Perdomo-Ortiz. Quantum-Assisted Learning of Hardware-Embedded Probabilistic Graphical Models, Phys. Rev. X, 7, 041052 (2017).
  • (32) G. E. Hinton, S. Osindero, and Y.-W. Teh. A Fast Learning Algorithm for Deep Belief Nets, Neural Computation 18, 1527–1554 (2006).
  • (33) R. Hamerly, T. Inagaki, P. L. McMahon, D. Venturelli, A. Marandi, et. al. Experimental investigation of performance differences between Coherent Ising Machines and a quantum annealer, Science Advances 5, eaau0823 (2019).
  • (34) F. Noé, S. Olsson, J. Köhler, and H. Wu. Boltzmann generators: Sampling equilibrium states of many-body systems with deep learning, Science 365, eaaw1147 (2019).
  • (35)

    H. C. Nguyen, R. Zecchina, and J. Berg. Inverse statistical problems: from the inverse Ising problem to data science, Advances in Physics

    66, 197 (2017).
  • (36)

    G. E. Hinton, and T. J. Sejnowski. Optimal perceptual inference, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 448 (1983).

  • (37)

    G. E. Hinton. Training Products of Experts by Minimizing Contrastive Divergence, Neural Computation

    14, 1771–1800 (2002).
  • (38) R. M. Neal. Annealed Importance Sampling, Journal Statistics and Computing 11, 125 (1998).
  • (39) R. Salakhutdinov, and G. Hinton. Deep Boltzmann Machines, In Proceedings of the 12th International Conference on Artificial Intelligence and Statistics (AISTATS) (2009).
  • (40)

    G. Hinton. A Practical Guide to Training Restricted Boltzmann Machines,in Neural Networks: Tricks of the Trade, 2nd ed., Lecture Notes in Computer Science, edited by G. Montavon, G. B. Orr, and K.-R. Miller (Springer, New York, 2012), Vol. 7700, pp. 599–619.

  • (41) G-set graphs are freely available for download at https://web.stanford.edu/yyye/yyye/Gset/
  • (42)

    Lee, H., Grosse, R., Ranganath, R., & Ng, A. Y. Unsupervised learning of hierarchical representations with convolutional deep belief networks, Communications of the ACM, 54(10), 95 (2011).

  • (43) Cireşan, D., Meier, U., Schmidhuber, J. Multi-column deep neural networks for image classification, IEEE Conference on Computer Vision and Pattern Recognition. pp. 3642–3649 (2012)
  • (44) Preskill, J. Quantum Computing in the NISQ era and beyond, Quantum 2, 79 (2018).
  • (45) Xia, R. and Kais, S. Quantum machine learning for electronic structure calculations, Nature Communications 9, 4195 (2018).
  • (46) L. Bello, M. Calvanese Strinati, E. G. Dalla Torre, and A. Pe’er. Persistent Coherent Beating in Coupled Parametric Oscillators, Phys. Rev. Lett. 123 083901 (2019).
  • (47) J. Chou, S. Bramhavar, S. Ghosh, and W. Herzog. Analog Coupled Oscillator Based Weighted Ising Machine, Scientific Reports 9, 14786 (2019).