GNisi: A graph network for reconstructing Ising models from multivariate binarized data

09/09/2021
by   Emma Slade, et al.
GSK
16

Ising models are a simple generative approach to describing interacting binary variables. They have proven useful in a number of biological settings because they enable one to represent observed many-body correlations as the separable consequence of many direct, pairwise statistical interactions. The inference of Ising models from data can be computationally very challenging and often one must be satisfied with numerical approximations or limited precision. In this paper we present a novel method for the determination of Ising parameters from data, called GNisi, which uses a Graph Neural network trained on known Ising models in order to construct the parameters for unseen data. We show that GNisi is more accurate than the existing state of the art software, and we illustrate our method by applying GNisi to gene expression data.

READ FULL TEXT VIEW PDF

page 5

page 7

page 12

page 13

06/25/2021

VEGN: Variant Effect Prediction with Graph Neural Networks

Genetic mutations can cause disease by disrupting normal gene function. ...
05/03/2018

Gene regulatory networks: a primer in biological processes and statistical modelling

Modelling gene regulatory networks not only requires a thorough understa...
06/28/2022

Learning the Evolutionary and Multi-scale Graph Structure for Multivariate Time Series Forecasting

Recent studies have shown great promise in applying graph neural network...
05/08/2020

Predicting gene expression from network topology using graph neural networks

Motivation: It is known that the structure of transcription and protein ...
04/30/2020

SkipGNN: Predicting Molecular Interactions with Skip-Graph Networks

Molecular interaction networks are powerful resources for the discovery....
06/03/2019

Incorporating Biological Knowledge with Factor Graph Neural Network for Interpretable Deep Learning

While deep learning has achieved great success in many fields, one commo...
06/06/2021

Graph2Graph Learning with Conditional Autoregressive Models

We present a graph neural network model for solving graph-to-graph learn...

1 Introduction

The ability to infer graphical models which effectively model interactions in a dataset is a powerful numerical tool across many scientific disciplines. The Ising model, as it is known in the language of statistical physics, is a particularly well-known approach which can be used to understand the underlying correlations in any form of binary data (“spins”) via pairwise and single-site interactions. Inferring the parameters of an Ising model from data, however, is computationally demanding and often intractable, as it is an inverse problem that typically relies on sampling from the Boltzmann distribution the Ising parameters imply.

Existing techniques to solve the inverse Ising model approximately include mean-field inference (10.1162/089976698300017386; coniii), pseudo-likelihood (PhysRevLett.108.090201; 10.1214/09-AOS691), and perturbative expansion methods (Nguyen_2012) which can capture the general form of the graph. More accurate methods include Monte Carlo simulation (mc1; 10.1371journal.pcbi.1003776; ACKLEY1985147) and the adaptive cluster expansion method (ace), which, while more accurate may be slow to converge or numerically unstable.

Given the utility of inverse Ising models for deriving direct interactions from observed covariations in large numbers of variables, it is natural that there are have been many successful applications in the biological domain. Such approaches have been used for determining protein structure (Obermayer2014; 10.1371/journal.pone.0028766; Qin690; Shakhnovich7195), viral fitness in HIV (10.1371journal.pcbi.1003776; Barton1965), correlated states in neural populations (Schneidman2006), correlations in nucleotide sequences (10.1371/journal.pcbi.0030231; 10.1093/bioinformatics/btn313) as well as modelling a statistical description of gene expression levels (Lezon19033).

Graph networks are a deep learning architecture designed to learn from graph-structured data. They were originally proposed in 

(gori2005new; scarselli2008graph) (see e.g.(battaglia2018relational; hamilton2020graph; wu2020comprehensive) for an overview). The application of graph networks to predict the behavior of complex physical systems has succeeded in a wide range of settings (pmlr-v119-sanchez-gonzalez20a; pfaff2021learning), including the modelling of spin-glass (spinglass). In this paper we present a novel method for solving inverse Ising problems using graph networks, which we denote GNisi, which alleviates the slow convergence or inaccuracy of previous methods. In our model, the graph network is trained on known Ising models, generated using Monte Carlo methods, and then the trained network is used to determine the Ising parameters for unseen binary data.

We will firstly compare our method to existing methods for solving inverse Ising problems on a generated dataset against which we can compare to ground truth. We then apply our method to genomic data from the Cancer Cell Line Encyclopaedia project (CCLE) (Ghandi2019; Li2019; Barretina2012; Stransky2015) under CC BY 4.0 licence 111https://creativecommons.org/licenses/by/4.0/legalcode, in order to model the covariation of gene expression from the anonymized (depmap_2020) dataset, for two subsets of genes that are known to be highly correlated.

2 Theoretical outline

2.1 The Ising model

For a dataset of bits,

, the Ising model for that data is given by the probability distribution

(1)

where the partition function is the normalisation such that probabilities sum to identity

(2)

where is the inverse temperature, and the Hamiltonian is defined as

(3)

One can show the probability distribution, Eq. (1) maximizes the Shannon entropy

(4)

under the constraint that the first and second moments of

are appropriately defined. In particular, we require the sample means and covariances to match the connected first and second moments

(5)
(6)

The problem is then to find the parameters , such that the first and second moments match the mean values of and observed in the data. A much simpler task is to compute the pairwise term, by inverting the covariance matrix of the dataset, , if it is invertible for the system. One may show analytically that, in the limit of high temperature, the two matrices are equivalent, however, there are many situations in which the data distribution will not be well-described by such a Gaussian form.

2.2 Graph neural networks

Let a graph be defined as where is the set of nodes and is the set of edges connecting nodes in . We denote as the set of neighbours of node . Using as similar notation as in (battaglia2018relational), we let represent the attributes of node for all and represent the attributes of edge for all . We illustrate the architecture in Fig. 1. Then, a graph network can be characterized in terms of edge and node updates as

(7a)
(7b)

where , , are the update functions to be learned and is an aggregation function reducing a set of elements to a single one via some input’s permutation equivariant transformation. In the particular case of Ising models, we are working with undirected graphs. If, as we do here, one wishes to infer the existence of the edges (i.e.  determine whether two bits are correlated), then we must start with a fully-connected graph, where .

(a) Edge update
(b) Node update
Figure 1: Illustration of graph network updates. In red the object being updated, and in blue the quantities used to perform the update.

3 Methods

3.1 Metropolis Monte Carlo simulations

In this paper we aim to learn the Ising model parameters, of an unseen system by training a graph network on a wide range of known Ising models. To generate the Ising models, we use the Metropolis Monte Carlo algorithm (metropolismc), which calculates valid bit-strings for an Ising model using single-spin flip dynamics. In each Monte Carlo iteration of the algorithm, a single spin is selected at random and the change in the energy of the system, is calculated as the difference between the Hamiltonian (3) for the two systems. If , the change is favourable and the spin is flipped. Otherwise, the spin is flipped only if , where is a random number in the range . This process is repeated for every site in the lattice, until the model has converged and the energy can no longer decrease.

In Fig. 2 we show representative training runs for 2 Ising models at low and high temperatures. We observe that the energy of the high temperature model (left) does not converge, as one would expect, whereas the low temperature system (right) converges rapidly.

Figure 2: Convergence of the energy for Ising models at high (left) and low (right) temperatures.

In order to allow the network to learn a wide range of Ising models, we generate the training data for a wide range of temperatures, lattice sizes and sparsities. The Monte Carlo data is converted to fully-connected graphs for the purposes of training; the initial node features of each graph are the states of the spins , whilst the edge attributes are the value of the coupling determined from the Ising model parameters.

3.2 Training procedure

For each Ising model we present the network with 1000 samples of converged spins generated using the Metropolis Monte Carlo algorithm, split between 800 samples in training and 200 validation. We minimize the loss between the predicted and true Ising parameters and we use early-stopping, selecting the model with the highest accuracy on the validation set. The test set consists of entirely unseen models, each with 1000 samples.

The full graph network is composed of 6 graph layers, with the node and edge encoding each composed of 1 hidden layer with dimensionality [123,119,28,126,126,126]. The ReLU activation function is used and we use the Adam optimizer 

(kingma2014method) with a fixed learning rate of . The number of nodes in the hidden layers as well as number of graph layers, the learning rate and optimizer are determined by optimising the validation loss of the models with and 50% sparsity using the Optuna (optuna_2019) framework. Within each graph layer, each node is updated based on its previous embedding and the sum of the incoming edges, whilst the edges are updated based on their previous embeddings and the embeddings of the connected nodes.

4 Experimental results

In this section we will show the results of the GNisi framework on two separate datasets. Firstly, we will discuss how GNisi performs on a test set composed of unseen Ising models with differing physical properties where ground truth is known. We will additionally compare with existing codes which solve the inverse Ising task using adaptive cluster expansion (ace; coniii), which are the current state of the art in solving this class of problems. Secondly we apply GNisi to data from the Cancer Cell Line Encyclopaedia project (CCLE) (Ghandi2019; Li2019; Barretina2012; Stransky2015), which consists of 1376 cell lines, in order to model the covariation of expression of a set of highly correlated genes.

4.1 Results on unseen Ising models

Quantifying how well we are able to generate an accurate Ising model for a set of bit-strings can be done in several ways. Matching the ground truth matrix matrix-element by matrix-element can of course only be assessed when one has ground truth to compare with, which is not the case in general. As discussed in Sec. 2.1, in the maximum entropy approach, the aim is to match the first and second moments of such that the Shannon entropy of the system is maximized. Here we will aim to show that with GNisi we are able to additionally match the sampled third order moments of

(8)

which provides us with much higher level of precision of how closely the generated Ising model matches the ground truth data. Furthermore, we will show that we are able to closely match the underlying Boltzmann distribution of the data, Eq. (15). We note that, in order to prevent any additional overfitting on the test set, the bit-string samples used to construct the Ising model, which are generated from the Metropolis Monte Carlo algorithm, are not used in quantifying the goodness of fit. In other words, we generate a distinct set of bit-string samples when constructing the Boltzmann distributions.

We compare our results for a representative model from the test set, namely an Ising model of size 50, at a low temperature and 25% sparsity in the matrix elements. We can compare the Ising model obtained by GNisi with the ones obtained with the two current open source packages for solving inverse Ising models; ACE v.1 

(ace) and coniii v.2.3.0 (coniii), both released under MIT licence. In order to provide a fair comparison, we run the packages using default settings when possible, in order to prevent fine-tuning the results on a model-by-model basis as there is no equivalent to fine-tuning in GNisi. Fine-tuning of the two codes does not, at least for the data we present here, improve the results at the cost of vastly increased runtime. As the coniii package contains differing sampling methods for solving the inverse Ising method, we choose a distinct method from ACE, the Monte Carlo histogram method (broderick2007faster).

In Fig. 9 we show reconstructed Ising models for the system using the differing methods, along with the ground truth matrix in panel (b).

(a) GNisi
(b) Ground truth
(c) ACE
(d) coniii
Figure 3: Reconstructed Ising models for a system of size 50 with and 25% sparsity in the matrix elements.

The differences in the matrices in Fig. 9 are hard to quantify by eye, however one can clearly tell ACE and coniii do not capture the correct overall behaviour of the system. Indeed, the mean squared error between the ground truth and the different predictions quantifies this as shown in Table 6

, showing that GNisi is better than the other methods at estimating the collective behavior of the system. This is an advantage of our methodology as we do not rely on matching means and covariances; by directly using the data, we retain much more information about the system than can be contained in the covariance matrix over the samples.

Method MSE
GNisi 15.38 0.04
ACE 101.53 -0.02
coniii 179.60 0.02
Table 1: MSE and Pearson correlation coefficient between the Ising parameters computed using the different methods and ground truth.

It is useful to compute the Pearson correlation coefficient between the ground truth and reconstructed matrices which we show in Table 6 and suggests that GNisi is not getting the individual matrix-elements correct. In Fig. 10 we plot a scatter plot for the ground truth and predicted Boltzmann probability distributions, Eq. (1), for a sample of possible bit-strings, distinct from the bit-strings used to generate the Ising model for each method, as in general, we can evaluate the Boltzmann distribution for any possible bit-string. This can be quantified by computing the partition function, Eq. (15) as well as the Pearson correlation coefficient, which we present in Table 7. We observe that GNisi far outperforms the other methods and captures well the overall distribution. We can conclude that, despite the fact that GNisi does not reproduce individual matrix-elements, it matches the Boltzmann distribution of the ground truth very well. Therefore, it does produce a valid Ising model for the data it is given and we see that GNisi produces the most accurate Ising model compared to the other methods.

Method
Ground truth 55.37
GNisi 52.39
ACE 70.52
coniii 271.41
Table 2: Partition functions and correlation coefficients computed for an Ising model with differing methods.
(a) GNisi
(b) ACE
(c) coniii
Figure 4: Scatter plots of probability distributions, Eq. (1) computed using the different methods for a sample of possible bit-strings.

A final point of comparison is comparing the first, second and third connected moments, , of (Eqs. (5), (6) and (8) respectively) and comparing to ground truth. ACE and coniii are constructed such that they produce an Ising model which matches the first and second moments, however GNisi does not have this requirement imposed in the architecture, and it is therefore a non-trivial check of our methodology that we can do so. In addition, we consider the third moment of as a method to more powerfully compare our methodology with the ground truth. For the second moments, for an -dimensional lattice, there are possible combinations, which is not prohibitive to compute for our system. At third order, however, we have to consider , and we instead will consider only a sample of the full distribution. In Table 8 we show the mean squared error between the moments computed using the ground truth, and with the various methods. In all cases we show the th connected moment with either all bit-strings being 1 (), or 0 (

). Unsurprisingly, both ACE and coniii match the moments very well, as they construct Ising models under the condition that the maximum entropy condition is met. It is therefore impressive that GNisi, which does not attempt to match moments, also matches the ground truth to very high precision. We show in the Supplementary Material the results obtained using a random matrix with the same size couplings as GNisi, to explicitly show that GNisi is correctly learning features of the Ising model. Interestingly, ACE and coniii match extremely well (to the precision shown here), suggesting the maximum entropy approach strongly constrains higher order moments.

In order to explain the behavior we observe, namely that all the methods used in this paper can match the moments of the data distribution well but, to a greater or lesser extent, not match the individual parameters of the ground truth Ising model, we note that we have a sloppy model (BrownS03; Brown_2004). A sloppy model is one which is poorly constrained as there exist a very large number of parameter choices which can match the data distribution well. As we need to match parameters for an Ising model of size , this behaviour can be expected to worsen as we increase the size of the system. It is therefore not surprising that the 3 different methods match the moments of the ground truth matrix with 3 distinct predictions for the Ising model. It is for this reason we argue that the ability to correctly reconstruct the Boltzmann distribution is the fundamental test for whether a generated model is accurate; in Fig. 10 we observe that the predictions from GNisi correlates with the ground truth bit-string by bit-string, which strongly suggests that GNisi has correctly learned underlying features about the Ising model.

MSE
Moment GNisi ACE coniii
0.03 0.11 0.11
0.0 0.0 0.0
Table 3: Mean squared error between the th moment computed using ground truth and the differing methods.

4.2 Results on the CCLE dataset

In this section we apply the pre-trained GNisi model to two distinct datasets of genes from the CCLE dataset, each of which are known to be highly mutually correlated in order to provide an estimate of the ground truth. We binarize the continuous data by identifying the

-level quantile, and setting the gene expression to 0 below the

-th quantile, and 1 above it. In order to prevent the binarized data from being highly skewed, we set

throughout. As above, where we compare GNisi and ACE against ground truth, we will show results run using ACE default settings. Due to the poor quality of the results with coniii on the synthetic data, we will not include a comparison with the package here.

4.2.1 Highly correlated genes

In Fig. 5 we show reconstructed Ising models for the first system using the two differing methods. We can see that both GNisi and ACE correctly produce a highly correlated matrix, however the couplings produced by ACE are an order of magnitude larger than those produced by GNisi.

(a) GNisi
(b) ACE
Figure 5: Reconstructed Ising models for a group of genes from the CCLE dataset known to be highly mutually correlated.

We argued in the previous section that the fundamental test to determine the accuracy of our reconstructed Ising model is to compare the Boltzmann distribution constructed from the Ising model against the data. Unlike in Sec. 4.1, where we compare samples of the partition function against ground truth in Fig. 10, in this example (and all real-world applications) we do not know the true Ising model which describes the data. In Fig. 6 we show a sample of the log Boltzmann probability distributions computed using the observed data samples and the reconstructed Ising models from GNisi and ACE. As there are samples in the observed dataset, we sample possible bit-strings from the total number of combinations and compute the probability of each permutation, given the underlying Ising model for GNisi and ACE. It is clear that, while GNisi does not perfectly match the observed distribution, as the distribution has a smaller tail, it matches very closely. On the other hand, ACE is strongly peaked at , which is correct, but has no other features.

We now turn to the first, second and third moments predicted by the different models. In Table 4 we show mean squared error between the sampled moments and the observed moments from the data distribution. We see that on the whole, both methods agree well with the data, with GNisi having a larger errors for most moments. Interestingly, for this subset of genes, we find that ACE outperforms GNisi with respect to matching moments as shown in Table 4. It should be noted, however, that matching the low-order moments of the training data does not necessarily capture the data distribution better by all measures, and there was no ground truth of a known underlying Ising model to compare to in this case of real experimental data. Indeed, in this case it is clear that GNisi produces the more accurate Ising model when judged by how close the predicted log Boltzmann probability distribution is to the observed distribution.

MSE
Moment GNisi ACE
0.36 0.43
Table 4: Mean squared error between the th moment computed using ground truth and ACE for the CCLE data.
Figure 6: Log Boltzmann probability distribution for the different methods using CCLE data.

4.2.2 Genes with known mutual relation to each other and to BRCA1

Finally, we turn to the second distinct set of genes. As before, some biological ground truth is known about the 21 genes in this dataset; here we pick the subset of genes with known functional relation to BRCA1 and to each other. In Fig. 7 we show the reconstructed Ising models using GNisi and ACE; as before we see that both methods find a high level of mutual correlation in the matrix, however GNisi finds much stronger couplings than ACE.

(a) GNisi
(b) ACE
Figure 7: Reconstructed Ising models for a group of genes from the CCLE dataset known to be highly mutually correlated with BRCA1.

To again determine the accuracy of the reconstructions, we plot the observed and sampled log Boltzmann probability distributions in Fig. 8. As with the previous dataset, GNisi matches the log Boltzmann probability distribution for the data better than ACE, although ACE performs much better with this dataset than the previous one. We again see that GNisi matches the shape of the distribution somewhat better than ACE. We show the MSE between moments for this dataset in Table 5; again both GNisi and ACE perform well, with ACE somewhat worse than GNisi in this case.

Figure 8: Log Boltzmann probability distribution for the different methods using CCLE data for genes known to be highly mutually correlated with BRCA1.
MSE
Moment GNisi ACE
0.43 0.27
Table 5: Mean squared error between the th moment computed using ground truth and ACE for the CCLE data for genes known to be highly mutually correlated with BRCA1.

5 Conclusion

In this paper we have presented a novel method for solving the inverse Ising problem. Inverse Ising methods have been successfully applied to a wide range of biological systems, however the computational difficulty of solving the inverse problem often prevents their use, or necessitates the use of approximate methods such as Gaussian approximations or pseudo-likelihood methods. With GNisi, we have presented a fast, accurate and easy-to-use method for solving inverse Ising models using graph neural networks.

We have presented results where we apply GNisi to both synthetic and real data, and compared our results with the current state-of-the-art codes. We find that across the board, GNisi outperforms other methods. Additionally, GNisi requires no fine-tuning; once the data has been collected, it is sufficient to provide it to the trained model and an Ising model that accurately describes the data is rapidly produced. It is important to note that the results presented here for GNisi are with 1000 Monte Carlo samples per model; we expect that results will improve upon the generation of more data.

It should be noted that the choice of what random distribution of Ising couplings to sample in the Monte Carlo simulation data used to train GNisi should affect the kinds of Ising models it is capable of reconstructing: in an instance where the best Ising model of the data would be a highly rare event in the space of models considered in the training data, GNisi might be expected to fail at constructing a useful model, whether for protein structure or neuronal firing.

References

Appendix A Comparison between GNisi and ACE for a high temperature system

In the main paper we compared our results for a representative model from the test set at low temperature. Here we will present the same results, performing the same comparisons, for an Ising model from the test set at a temperature one of order magnitude higher, namely an Ising model of size 50 with 75% sparsity in the matrix elements for . As before, we run the ACE and coniii packages using default settings, in order to prevent fine-tuning the results on a model-by-model basis. The MSE between the ground truth and differing methods is shown in Table 6.

In Fig. 9 we show reconstructed Ising models for the system using the differing methods, along with the ground truth matrix in panel (b).

(a) GNisi
(b) Ground truth
(c) ACE
(d) coniii
Figure 9: Reconstructed Ising models for a system of size 50 with and 75% sparsity in the matrix elements.
Method MSE
GNisi 14.78 0.03
ACE 69.98 0.03
coniii 408.06 0.01
Table 6: MSE between the Ising parameters computed using the different methods and ground truth.

In Fig. 10 we plot the scatter of predicted versus ground truth probability distributions, for a sample of possible bit-strings, which are again distinct from the bit-strings used to generate the Ising model for each method. For this model, ACE and GNisi perform very similarly as can also be observed in Table 7 where we show also the Pearson correlation coefficient, .

Method
Ground-truth 42.86
GNisi 46.48 0.11
ACE 46.73 0.10
coniii 252.78 0.02
Table 7: Partition functions computed for an Ising model with differing methods
(a) GNisi
(b) ACE
(c) coniii
Figure 10: Scatter plots of probability distributions computed using the different methods for a sample of possible bit-strings.

Finally we will compare first, second and third moments. In Table 8 we show the mean squared error between the moments computed using the ground truth, and with the various methods. Again, GNisi does very well at matching moments despite it not being a restriction when constructing the Ising model.

MSE
Moment GNisi ACE coniii
0.07 0.12 0.13
1.32
0.0 0.0 0.0
Table 8: Mean squared error between the th moment computed using ground truth and the differing methods.

Appendix B Accuracy of inverse covariance matrix approximation

In this section we will compare the accuracy of inverting the covariance matrix over samples compared to ground truth and GNisi. For a larger system of size 50, there are equally likely bit-string combinations at high temperature, which is computationally intractable to compute, and one therefore will not expect the inverse covariance matrix computed with 1000 samples to be accurate. We will instead here focus on a much smaller system, of lattice size 5 as the 32 possible combinations of bit strings enable us to perform a fair comparison between GNisi and the inverse covariance matrix. The model shown here is at inverse temperature with 55% sparsity in the matrix elements.

(a) GNisi
(b) Ground truth
(c)
Figure 11: Reconstructed Ising models for a system of size 5 with and 55% sparsity in the matrix elements.

We can see in Fig. 11 that GNisi matches very well the ground truth, whilst the inverse of the covariance matrix performs very poorly. This is more explicitly shown in Table 9 where we compare the Boltzmann distributions computed using the two methods. We see that GNisi performs well overall, whilst the inverse of the covariance matrix captures nothing about the true distribution.

Method
Ground-truth 19.64
GNisi 12.84
0.00
Table 9: Partition functions computed for an Ising model with GNisi and the inverse of the covariance matrix.

Appendix C Derivation of maximum entropy approach

The maximum entropy approach states that, under the condition the first and second moments of the samples match the true connected moments, the Ising model maximises the Shannon entropy of the system. In this section we will derive this result.

Our constraints, in the maximum entropy approximation, are

(9)
(10)
(11)

The constrained optimisation problem can be formulated as extremising the Lagrangian of the system, . We therefore have

So

(12)

and

(13)
(14)

which is the Ising model where we identify as the external field and as the pairwise terms respectively.

Appendix D Derivation of inverse covariance matrix approximation

In this appendix we derive the approximation where the inverse of the covariance matrix over nodes is approximately equal to the pairwise coupling term. We derive the approximation using field theoretic methods.

The partition function for a discrete number of bit-strings is given by the usual

(15)

In the infinite temperature limit, all possible combination of spins are equally possible. We can therefore take and then

(16)

where signifies we take path integrals.

The constraints on the first and second moments are now given by

(17)
(18)

where

(19)

and and are the external field and pairwise terms respectively. Via a change of coordinates where we can write (19) as

(20)

and so (17) becomes

(21)
(22)
(23)

using the fact that

(24)

and properties of multi-dimensional Gaussians of size , as we assume in this approximation our data is Gaussian. We therefore have

(25)

Eq. (18) is then

(26)
(27)
(28)

using Eq. (25). Again using the properties of multi-dimensional Gaussians, we find

(29)

The remaining integral is a 2-point correlation function which we can solve analytically. The partition function Eq. (16) is, dropping all constant factors,

(30)
(31)
(32)
(33)
(34)

Now, using the definition of functional derivatives from statistical mechanics

(35)

where , the action, can be identified with and so

(36)

where the integral above is equivalent to the integral in (29). Bringing it all together we finally have

(37)
(38)
(39)

Therefore, the covariance matrix can be identified as being proportional to the inverse pairwise term in the Hamiltonian

(40)

Appendix E Definition of the 3rd connected moment

In this section we state the equation used to compute the 3rd connected moments in the main text. We use the coskewness for

-random variables:

(41)

where is the expected value of and

its standard deviation.

Appendix F Performance of a random Ising model

In the main text, we state that GNisi, for the Ising model against which we have ground truth, out-performs ACE and coniii. It is reasonable to ask whether a random matrix with parameters of the same size as GNisi would out-perform it. In Fig. 12, we show the scatter plot for the Boltzmann probability distribution for a matrix generated at random with the same size couplings as GNisi. We observe that the probability distribution is captured very poorly; indeed we find compared to the ground truth value of with the Pearson coefficient on the Boltzmann probability distribution of . In Table 10 we show the corresponding MSE between the moments computed using ground truth and the random matrix, again observing that GNisi is better. We therefore can conclude that GNisi out-performs a random matrix and is correctly learning the Ising models.

Figure 12: Scatter plot for the Boltzmann probability distribution for a random matrix against ground truth.
MSE
Moment Random matrix
0.10
0.0
Table 10: Mean squared error between the th moment computed using ground truth and the random matrix.