 # Comment on "Solving Statistical Mechanics Using VANs": Introducing saVANt - VANs Enhanced by Importance and MCMC Sampling

In this comment on "Solving Statistical Mechanics Using Variational Autoregressive Networks" by Wu et al., we propose a subtle yet powerful modification of their approach. We show that the inherent sampling error of their method can be corrected by using neural network-based MCMC or importance sampling which leads to asymptotically unbiased estimators for physical quantities. This modification is possible due to a singular property of VANs, namely that they provide the exact sample probability. With these modifications, we believe that their method could have a substantially greater impact on various important fields of physics, including strongly-interacting field theories and statistical physics.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## Appendix A Additional Details on Algorithm

### a.1 Lightning Review of VAN

Wu et al approximate Boltzmann distributions with an auto-regressive generative model by minimizing the KL divergence

 KL(q|p) =∑sq(s)lnq(s)p(s)=∑sq(s)(lnq(s)+βH(s)+βZ). (1)

A PixelCNN is used which allows exact evaluation of the probability and relatively efficient sampling. Also note that the partition function is a constant and therefore the last summand leads to no contribution to the gradient. As a result, VAN can be trained by sampling from the model , evaluating the probability and the Hamiltonian for these samples and using gradient descent.

### a.2 Bounding the output probabilities of VAN

We can simply interpret the original network output as the probability by the following mapping:

 q=(q′−12)(1−2ϵ)+12.

### a.3 Proof: Estimators are Asymptotically Unbiased

Assume that the support of the sampling distribution contains the support of the target distribution . This property is ensured by ensuring that the probability takes values in .

#### a.3.1 Neural Importance Sampling

Then, importance sampling with respect to , i.e.

 ⟨O(s)⟩≈N∑i=1wiO(si), si∼q(s), wi=^wi∑i^wi, ^wi=e−βH(si)q(si). (2)

is an asymptotically unbiased estimator of the expectation value because

 ⟨O(s)⟩p=∑sp(s)O(s)=∑sq(s)p(s)q(s)O(s)=1Z∑sq(s)exp(−βH(s))q(s)=^w(s)O(s)≈1ZNN∑i=1^w(si)O(si), (3)

where . The partition function can be similarly determined

 Z=∑sexp(−βH(s))=∑sq(s)exp(−βH(s))q(s)≈1NN∑i=1^w(si). (4)

Combining the previous equations, we obtain

 ⟨O(s)⟩p≈∑iwiO(si) with wi=^wi∑i^wi. (5)

#### a.3.2 Neural MCMC Sampling

The sampler can be used as a trial distribution

for a Markov-Chain which uses the following acceptance probability in its Metropolis step

 pa(s′|s)=min(1,p0(s|s′)p(s′)p0(s′|s)p(s))=min(1,q(s)exp(−βH(s′))q(s′)exp(−βH(s))). (6)

This fulfills the detailed balance condition

 pt(s′|s)exp(−βH(s))=pt(s|s′)exp(−βH(s)) (7)

because the total transition probability is given by and therefore

 pt(s′|s)exp(−βH(s)) =q(s′)min(1,q(s)exp(−βH(s′))q(s′)exp(−βH(s)))exp(−βH(s)) =min{q(s′)exp(−βH(s)),q(s)exp(−βH(s′))} =pt(s|s′)exp(−βH(s′)),

where we have used the fact that the min operator is symmetric and that all factors are strictly positive. The latter property is ensured by the fact that .

## Appendix B Additional Details on Experiments

### b.1 Setup

We use a Tesla P100 GPU with 16GB of memory both for training and sampling. A

lattice is considered. The reference values are generated using the Wolff algorithm using 2M steps with 100k warm-up steps. We use the ResNet version of VAN with the following hyperparameter choices (chosen to match with the ones used by Wu et al.):

The reference implementation of Wu et al. is used to train the VANs. For estimating the results of VAN and saVANt-NIS, we use 1000 iterations sampling 500 configurations each. For saVANt-NMCMC, we use 100k steps sampling 500 candidate configurations in a batch. No warm-up steps are required because candidates are sampled from a pre-trained VAN. As is demonstrated in Table 4, all algorithms have roughly the same runtime for sampling but saVANt leads to significant reduction in training time as explained in the main text.