Message passing for probabilistic models on networks with loops

09/23/2020 ∙ by Alec Kirkley, et al. ∙ 0

In this paper, we extend a recently proposed framework for message passing on "loopy" networks to the solution of probabilistic models. We derive a self-consistent set of message passing equations that allow for fast computation of probability distributions in systems that contain short loops, potentially with high density, as well as expressions for the entropy and partition function of such systems, which are notoriously difficult quantities to compute. Using the Ising model as an example, we show that our solutions are asymptotically exact on certain classes of networks with short loops and offer a good approximation on more general networks, improving significantly on results derived from standard belief propagation. We also discuss potential applications of our method to a variety of other problems.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Many complex phenomena can be modeled using networks, which provide powerful abstract representations of systems in terms of their components and interactions newman2018networks . The phenomena are often modeled using probabilistic formulations that capture the probabilities of states of network nodes. Examples include the spread of epidemics through networks of social contacts kiss2017mathematics , cascading failures in power grids carreras2004dynamical , and the equilibrium behavior of spin models such as the Ising model DGM02b . Networks are also used to represent pairwise dependencies between variables in statistical models that do not otherwise have a network component, as a convenient tool for bookkeeping and visualization of model structure jordan1998learning

. Such “graphical models,” which allow us to represent the conditional dependencies between variables in a non-parametric manner, form the foundation for many modern machine learning techniques 

koller2009probabilistic .

In this paper we consider probabilistic models on networks, models whose key features are that the state variables of the model live on the nodes of the network and the only explicit interactions between variables are pairwise dependencies between nodes directly connected by network edges. The Hamiltonian of a spin model might contain only pairwise interactions between spins and their neighbors, for example, while the individuals in an epidemic model can catch a disease only from those they have direct contact with.

In probabilistic models we are typically interested in computing expectation values or marginal probabilities over the ensemble of possible states of the system. For instance, in Ising spin systems the one-point marginal probabilities of spin states—the average probability of a particular spin being up or down—can be used to compute the average magnetization. Unfortunately, there are few analytic results for expectations or marginal probabilities in such models mussardo2010statistical and those that do exist tend to have narrow applicability and can be complicated PhysRev.85.808 . Even computational exploration of these models can be challenging, as state spaces grow exponentially with the size of the system.

In this paper we focus on message passing, also called the “cavity method” or “belief propagation,” an efficient method for the solution of probabilistic models on networks that straddles the line between analytic and numerical approaches bethe1935statistical ; mezard2009information . Message passing methods involve deriving a set of self-consistent equations satisfied by the variables or probabilities in a model then solving those equations by numerical iteration. The name “message passing” comes from the fact that the equations can be thought of as representing messages passed along network edges describing the probability that the variable on a given node takes a certain value, given the values of its neighbors.

Standard formulations of message passing, however, have a crucial weakness: they rely on the assumption that the states of the neighbors are uncorrelated with one another, or equivalently that the network of pairwise dependencies is a tree, i.e., a network that contains no loops. In practice, very few networks are perfect trees, though standard message passing methods are known to give good approximate results on networks that satisfy the weaker condition of being “locally tree-like,” meaning that local regions of the network take the form of trees even though the network as a whole is not a tree. In effect, this means that the network can contain long loops, but not short ones newman2018networks . Unfortunately, even this weaker condition is not true for many real-world networks, which often contain a high density of short loops watts1998collective , so standard message passing can give quite poor results in practical situations.

This limitation of traditional message passing has been widely noted and a number of attempts have been made in the past to remedy it. Message passing has been successfully extended to certain classes of random graphs that incorporate short loops in a limited fashion, such as Husimi graphs EKBV05 ; MNB11 ; BMN12 and other tree-like agglomerations of small loopy subgraphs newman2009random ; yoon2011belief , although these techniques are not generally applicable to real-world networks. Another approach incorporates the effect of loops using a perturbative expansion around the loopless case montanari2005compute ; chertkov2006loop , although this approach becomes progressively less accurate as the number of loops increases and is therefore best suited to networks with a low loop density, which rules out a large fraction of real networks whose loop density is often high watts1998collective ; newman2003social . Perhaps the best known extension of belief propagation, and the one most similar to our own approach, is the method known as “generalized belief propagation” yedidia2001generalized , which is based on the idea of passing messages not just between pairs of nodes but between larger groups. This method is quite complicated to implement in practice, however, and requires explicit construction of the groups, which involves nontrivial pre-processing steps welling2004choice . The method we propose requires no such steps.

The foundational concepts necessary for applying message passing to loopy networks are described in cantwell2019message

, with specific applications to percolation models and spectral calculations. In this paper we extend those approaches to the solution of general probabilistic models. Our approach centers on the development of self-consistent message passing equations for the marginal probabilities of states of sets of nodes in a neighborhood around a given reference node, which allow for the fast computation of joint distributions within the neighborhood. We can then average over these joint distributions either exactly (for networks with small local neighborhoods) or approximately using Monte Carlo sampling (for larger neighborhoods) to calculate properties of interest such as single-site marginals.

To ground our discussion we use the Ising model as an example of our approach, showing how our improved message passing methods can produce better estimates for this model than regular message passing. We show that our methods are asymptotically exact on networks whose loop structure satisfies certain general conditions and give good approximations for networks that deviate from these conditions. We give example results for the Ising model on both real and artificial networks and also discuss applications of our method to a range of other problems.

Ii Message passing on networks with loops

Our first step is to develop the general theory of message passing for probabilistic models on loopy networks. With an eye on the Ising model, our discussion will be in the language of spin models, although the methods we describe can be applied to any probabilistic model with pairwise dependencies between variables.

ii.1 Model description

Consider a general undirected, unweighted network  composed of a set  of nodes or vertices and a set  of pairwise edges. The network can be represented mathematically by its adjacency matrix A with elements when nodes and are connected by an edge and 0 otherwise. On each node of the network there is a variable or spin , which is restricted to some discrete set of values . In a compartmental model of disease propagation, for instance, could be the infection state of a node kermack1927contribution ; durrett1995spatial . In a spatial model of segregation could represent land occupation stauffer2007ising .

Spins and interact if and only if there is an edge between nodes  and , a formulation sufficiently general to describe a large number of models in fields as diverse as statistical physics, machine learning, economics, psychology, and sociology pelizzola2005cluster ; krzakala2016statistical ; geman1986markov ; yasuda2006triangular ; decelle2011asymptotic ; zhou2007self ; galam1997rational ; stauffer2008social . Interactions are represented by an interaction energy , which controls the preference for any particular pair of states and to occur together. The quantity represents any external parameters, such as temperature in a classical spin system or infection rate in an epidemiological model, that control the nature of the interaction. We also allow for the inclusion of an external field with parameters , which controls the intrinsic propensity for  to take an particular state. This could be used for instance to encode individual risk of catching a disease in an epidemic model.

Given these definitions, we write the probability  that the complete set of spins takes value s in the Boltzmann form

(1)

where the Hamiltonian

(2)

is the log-probability of the state to within an arbitrary additive constant, and the partition function

(3)

is the appropriate normalizing constant, ensuring that sums to unity. In this paper we will primarily be concerned with computing the single-site (or one-point) marginal probabilities

(4)

where denotes all spins with the exception of . For convenience we have dropped and from the notation on the left of the equation, but it should be clear contextually that depends on both of these variables.

The one-point marginals reveal useful information about physical systems, such as the magnetization of classical spin models or the position of a phase transition. They are important for statistical inference problems, where they give the posterior probability of a variable taking a given state after averaging over contributions from all other variables (e.g., the total probability of an individual being infected with a disease at a given time). Unfortunately, direct computation of one-point marginals is difficult because the number of terms in the sum in Eq. (

4) grows exponentially with the number of spins. The message passing method gives us a way to get around this difficulty and compute accurately and rapidly.

Message passing can also be used to calculate other quantities. For instance, we will show how to compute the average energy (also called the internal energy), which is given by

(5)

The average energy is primarily of interest in thermodynamic calculations, although it may also be of interest for statistical inference, where it corresponds to the average log-likelihood.

We can also compute the two-point correlation function between spins

(6)

This function can be computed by first calculating the one-point marginal , then fixing and repeating the calculation for . The same approach can also be used to compute -point correlation functions.

ii.2 Message passing equations

Our method operates by dividing a network into neighborhoods cantwell2019message . A neighborhood around node  is defined as the node  itself and all of its edges and neighboring nodes, plus all nodes and edges along paths of length or less between the neighbors of . See Fig. 1 for examples. The key to our approach is to focus initially on networks in which there are no paths longer than  between the neighbors of , meaning that all paths are inside . This means that all correlations between spins within  are accounted for by edges that are also within , which allows us to write exact message passing equations for these networks. Equivalently, we can define a primitive cycle of length  starting at node  to be a cycle (i.e., a self-avoiding loop) such that at least one edge in the cycle is not on any shorter cycle beginning and ending at . Our methods are then exact on any network that contains no primitive cycles of length greater than .

We develop a series of such methods, where the th member of the series is exact on networks that contain primitive cycles of length and less only. The message passing equations become progressively more complex as gets larger: they are very tractable for smaller values but become impractical when is large. In many real-world networks the longest primitive loop will be relatively long, requiring an unwieldy set of equations for an exact solution. However, long loops introduce smaller correlations between variables than short ones, and moreover the density of long loops is in many cases lower: the network is “locally dense but globally sparse.” In this situation, we find that the message passing equations for low values of , while formally approximate, give excellent results. They account correctly for the effect of the short loops in the network, while making only a small approximation by omitting the long ones.

In practice, quite modest values of can give good results. The smallest possible choice of corresponds to assuming there are no loops in the network at all, that the network is a tree. This is the assumption made by traditional message passing methods, and it gives poor results on many real-world networks. The next approximation after this, however, with , which correctly accounts for the effect of loops of length three in the network (i.e., triangles), produces substantially better results, and the approximation (which includes loops of length four) is in many cases impressively accurate. In the following developments, we drop from our notation for convenience—the same equations apply for all values of .

Having defined the initial neighborhood  we now further define a neighborhood  to be node  plus all edges in that are not contained in  and the nodes at their ends. Our method involves writing the marginal probability distribution on the spin at node  in terms of a set of messages received from nodes  that are in , including nodes that are not immediate neighbors of . This contrasts with traditional message passing in which messages are received only from the immediate neighbors of . These messages are then in turn calculated from further messages receives from nodes , and so forth.

When written in this manner, the messages receives are independent of one another in any network with no primitive cycles longer than . Messages received from any two nodes  and within are necessarily independent since they are calculated from the corresponding neighborhoods and which are disconnected from one another: if they were connected by any path then that path would create a primitive cycle starting at but passing outside of , of which there are none in the hypothesized network. By the same argument, we also know that and only overlap at the single node  for any .

With these observations in mind, we can start at a focal node  and consider as comprising a central set of nodes and edges surrounding . Then we can think of the set of neighborhoods for all as comprising the next “layer” in the network, the sets for all as a third layer, and so forth until all nodes and edges in the network are accounted for. In a network with no primitive cycles longer than , this procedure counts all interactions exactly once, allowing us to rewrite our Hamiltonian as a sum of independent contributions from the various layers thus:

(7)

where and are the sets of spins for the nodes in the neighborhoods and  and we have defined the local Hamiltonians

(8)
(9)

The decomposition of Eq. (7) is illustrated pictorially in Fig. 1.

Figure 1: Diagram of the expansion used in the Hamiltonian decomposition of Eq. (7), with . The focal node is in red while the rest of its neighborhood  is in green. Nodes and edges in purple represent the neighborhood excluding node 1. We also label the corresponding spin and message variables used in Eqs. (11) and (12).

The essential feature of this decomposition is that it breaks sums over spins such as those in Eqs. (3) and (4) into a product of sums over the individual neighborhoods . Because these neighborhoods are, as we have said, independent, this means that the partition function and related quantities factorize into products of sums over a few spins each, which can easily be performed numerically. For instance, the one-point marginal of Eq. (4) takes the form

(10)

which can be written recursively as

(11)

with

(12)

where the normalization constants and ensure that the marginals and messages are normalized so that they sum to . (In practice, we simply normalize the messages by dividing by their sum.) The quantity  is equal to the marginal probability of node  having spin  when all the edges in  are removed. Alternatively, one can think of it as a local external field on node  that influences the probability distribution of . To make this more explicit one could rewrite Eq. (11) as

(13)

where plays the role of the external field.

Equations (11) and (12) define our message passing algorithm and can be solved for the messages  by simple iteration, starting from any suitable set of starting values and applying the equations repeatedly until convergence is reached.

With only slight modification we can use the same approach to calculate the internal energy as well. The contribution to the internal energy from the interactions of a single node  is , where the factor of compensates for double counting of interactions. Summing over all nodes  and weighting by the appropriate Boltzmann probabilities, the total internal energy is

(14)

All of the quantities appearing here are known a priori, except for the messages  and the normalizing constants , which are calculated in the message passing process. Performing the message passing and then using the final converged values in Eq. (14) then gives us our internal energy.

ii.3 Implementation

For less dense networks, those with node degrees up to about 20, the message passing equations of Eqs. (11) and (12) can be implemented directly and work well. The method is also easily parallelizable, as we can update all messages asynchronously based on their values from the previous iteration, as well as compute the final marginals in parallel.

For networks with higher degrees the equations can become intractable, the huge reduction in complexity due to the factorization of the Hamiltonian notwithstanding. For a model with distinct spin states at every node, the sum over states in the neighborhood of  has terms, which can quickly become computationally expensive to evaluate. Moreover, if just a single node has too large a neighborhood it can make the entire computation intractable, as that single neighborhood can consume more computational power than is available.

In such situations, therefore, we take a different approach. We note that Eq. (12) is effectively an expectation

(15)

where we use the shorthand

(16)

We approximate this average using Markov chain Monte Carlo importance sampling over spin states, and after convergence of the messages the final estimates of the marginals

can also be obtained by Monte Carlo, this time on the spins in . We describe the details in Section III.

ii.4 Calculating the partition function

The partition function is perhaps the most fundamental quantity in equilibrium statistical mechanics. From a knowledge of the partition function one can calculate virtually any other thermodynamic variable of interest. Objects equivalent to

also play a central role in other areas, such as Bayesian statistics, where the quantity known as the

model evidence, the marginal likelihood of observed data given a hypothesized model, is mathematically analogous to the partition function and plays an important role in model fitting and selection mackay2003information ; migon2014statistical ; friel2012estimating .

Unfortunately, the partition function is difficult to calculate in practice. The calculation can be done analytically in some special cases salinas2001introduction ; baxter2016exactly , but numerical calculations are difficult due to the need to sum over an exponentially large number of states, and Monte Carlo calculations are challenging because of the difficulty of correctly normalizing the Boltzmann distribution.

Another concept central to statistical mechanics is the entropy

(17)

which has broad applications not just in physics but across the sciences shannon1951prediction ; bialek2012biophysics ; cabral2013entropy . Like the partition function, entropy is difficult to calculate numerically, and for exactly the same reason, since the two are closely related. For the canonical distribution of Eq. (1) the entropy is given in terms of the free energy by

(18)

Even if we know the internal energy  therefore (which is relatively straightforward to calculate), the entropy is at least as hard to calculate as the partition function. Indeed the fundamental difficulty of normalizing the Boltzmann distribution is equivalent to establishing the zero of the entropy, a well known hard problem (unsolvable within classical thermodynamics, requiring the additional axiom of the Third Law).

As we now show, the entropy can be calculated using our message passing formalism by appropriately factorizing the probability distribution over spin states. Since we have already developed a prescription for computing  (see Eq. (14)), this also allows us to calculate the partition function.

As shown in Section A.4 of the appendix, the state probability  in Eq. (1) can be rewritten in the factorized form

(19)

where is the joint marginal distribution of the variables in the neighborhood of node , is the joint marginal distribution in the intersection of the neighborhoods  and , and denotes pairs of nodes that are contained in each other’s neighborhoods.

By a series of manipulations, this form can be further expressed as the pure product

(20)

where

(21)

with being the indicator function and

(22)

Substituting (20) into Eq. (17), we get an expression for the entropy thus:

(23)

Note that, like the well known Bethe approximation for the entropy yedidia2003understanding , this expression has contributions from the one- and two-point marginals and of Eqs. (6) and (11), but also contains a term that depends on the joint marginal  in the intersection , which may be nontrivial if . As shown in the appendix, we can calculate this joint marginal using the message passing equation

(24)

where denotes the terms of the Hamiltonian of Eq. (2) that fall within  and is the corresponding normalizing constant. For sufficiently small,  can be computed exactly. In other cases we can calculate it using Monte Carlo methods similar to those we used previously for the marginals .

ii.5 Comparison with generalized belief propagation

Message passing methods for probabilistic models on loopy networks have been proposed in the past, the best known being the generalized belief propagation method of Yedidia et al. yedidia2001generalized . Generalized belief propagation employs a region-based approximation yedidia2005constructing , in which the free energy is approximated by a sum of independent local free energies of regions of edges and their accompanying nodes. Once the regions are defined it is straightforward to write down belief propagation equations, which can be used to calculate marginals and other quantities of interest, including approximations to the partition function and entropy. Perhaps the best known example of generalized belief propagation, at least within the statistical physics community, is the cluster variational method, in which the regions are defined so as to be closed under the intersection operation pelizzola2005cluster and the resulting free energy is called the Kikuchi free energy kikuchi1951theory .

The accuracy and complexity of generalized belief propagation is determined by the specific choice of regions, which has been described as being “more of an art than a science” yedidia2003understanding . Loops contained within regions are correctly accounted for in the belief propagation, while those that span two or more regions are not and introduce error. At the same time, the computational complexity of the belief propagation calculations increases exponentially with the size of the regions yedidia2003understanding

, so choosing the right regions is a balancing act between enclosing as many loops as possible while not making the regions too large. A number of heuristics have been proposed for choosing the regions 

kappen2002novel ; pakzad2002minimal ; welling2004choice but real-world networks can pose substantial difficulties because they often contain both high degrees and many loops newman2018networks , which effectively forces us to compromise either by leaving loops out or by using very large regions. Our method can have a significant advantage in these systems because it can accommodate large, tightly connected neighborhoods through local Monte Carlo sampling. Our method also has the benefit that the neighborhoods are constructed automatically based on the network structure rather than being chosen by the user.

Iii Example applications

As an archetypal application of our methods we consider the Ising model on various example networks. The ferromagnetic Ising model in zero external field is equivalent in our notation to the choice

(25)

where is the inverse temperature. Note that temperature in this notation is considered a part of the Hamiltonian. It is more conventional to write temperature separately, so that the Hamiltonian has dimensions of energy, rather than being dimensionless as here, but incorporating the temperature is notationally convenient in the present case. It effectively makes the temperature a parameter  in Eq. (2) (and all are equal).

As example calculations, we will compute the average magnetization , which is given by

(26)

and the heat capacity , given by

(27)

As detailed in Section A.1 of the appendix, we employ an extension of the message passing equations to compute  that avoids having to use a numerical derivative to evaluate Eq. (27). In brief, we consider the messages to be a function of then define their derivatives with respect to as their own set of messages

(28)

with their own associated message passing equations derived by differentiating Eq. (12). We then compute the heat capacity  by differentiating Eq. (14), expressing the result in terms of the , and substituting it into Eq. (27).

iii.1 Phase transition

In many geometries, the ferromagnetic Ising model has a phase transition at a nonzero critical temperature between a symmetric state with zero average magnetization and a symmetry broken state with nonzero magnetization. Making the substitution (25) in Eqs. (11) and (12), we can show that the message passing equations for the Ising model always have a trivial solution for all . This choice is a fixed point of the message passing iteration: when started at this point the iteration will remain there indefinitely. Looking at Eq. (26), we see that this fixed point corresponds to magnetization . If the message passing iteration converges to this trivial fixed point, therefore, it tells us that the magnetization is zero and we are above the critical temperature; if it settles elsewhere then the magnetization is nonzero and we are below the critical temperature. Thus the phase transition corresponds to the point at which the fixed point changes from being attracting to being repelling.

This behavior is well known in standard belief propagation, where it has been shown that on networks with long loops only there is a critical temperature  below which the trivial fixed point becomes unstable and hence the system develops nonzero magnetization, and that this temperature corresponds precisely to the conventional zero-field continuous phase transition on these networks mooij2005properties . Extending the same idea to the present case, we expect the phase transition on a loopy network to fall at the corresponding transition point between stable and unstable in our message passing formulation.

Moreover, because the values of the messages at the trivial fixed point are known, we can compute an expression for the phase transition point without performing any message passing. We treat the message passing iteration as a dynamical system and perform a linear stability analysis of the trivial fixed point. Perturbing around (shorthand for setting all ) and keeping terms to linear order, we find that the dynamics is governed by the Jacobian

(29)

where is a generalization of the so-called non-backtracking matrix Krzakala13 ; KNZ14 to our loopy message passing formulation:

(30)

and is a correlation function between the spins and within the neighborhood —see Section A.3 of the appendix for details.

When the magnitude of the leading eigenvalue

of this Jacobian is less than 1, the trivial fixed point is stable; when it is greater than 1 the fixed point unstable. Hence we can locate the phase transition temperature by numerically evaluating the Jacobian and locating the point at which crosses 1, for instance by binary search.

iii.2 A model network

As a first example application, we examine the behavior of our method on a model network created precisely to have short loops only up to a specified maximum length. The network has short primitive cycles only of length and less for a given choice of , though it can also have long loops—it is “locally dense but globally sparse” in the sense discussed previously. Indeed this turns out to be a crucial point. It is well known that the Ising model does not have a normal phase transition on a true tree, because at any finite temperature there is always a nonzero density of defects in the spin state (meaning pairs of adjacent spins that are oppositely oriented), which on a tree divide the network into finite sized regions, imposing a finite correlation length and hence no critical behavior. Similarly in the case of a network with only short loops and no long ones there is no true phase transition. The long loops are necessary to produce criticality, a point discussed in detail in allard2019accuracy .

To generate networks that have short primitive cycles only up to a certain length, we first generate a random bipartite network—a network with two types of nodes and connections only between unlike kinds—then “project” down onto one type of node, producing a network composed of a set of complete subgraphs or cliques. In detail, the procedure is as follows.

  1. We first specify the degrees of all the nodes, of both types, in the bipartite network.

  2. We represent these degrees by “stubs” of edges emerging, in the appropriate numbers, from each node, then we match stubs at random in pairs to create our random bipartite network.

  3. We project this network onto the nodes of type 1, meaning that any two such nodes that are both connected to the same neighbor of type 2 are connected directly with an edge in the projection. After all such edges have been added, the type-2 nodes are discarded.

  4. Finally, we remove a fraction of the edges in the projected network at random.

If , the network is composed of fully connected cliques, but when some cliques will be lacking some edges, and hence the network is composed of a collection of subgraphs of size equal to the degrees of the corresponding nodes of type 2 from which they were projected. If we limit these degrees to a maximum value of  then there will be no short loops of length longer than this.

Figure 2 shows the magnetization per spin, entropy, and heat capacity for the ferromagnetic Ising model on an example network of approximately nodes generated using this procedure with . By also limiting the degrees of the type-1 nodes in the bipartite graph we ensure that no neighborhood in the projection is too large to prevent a complete summation over states and hence that Monte Carlo estimation of the sums in the message passing equations is unnecessary.

Results are shown for belief propagation calculations with , 1 and 2, the last of which should, in principle, be exact except for the weak correlations introduced by the presence of long loops in the network. We also show in the figure the magnitude of the leading eigenvalue of for each value of . The points at which this eigenvalue equals 1, which give estimates of the critical temperature for each , are indicated by the vertical lines. Also shown in the figure for comparison are results from a direct Monte Carlo simulation of the system, with the entropy calculated from values of the heat capacity computed from energy fluctuations and then numerically integrated using the identity

(31)

The message passing simulations offer significantly faster results for this system: for the message passing was about 100 times faster than the Monte Carlo simulations.

Looking at Fig. 2, we can see that as we increase , the message passing results approach those from the direct Monte Carlo, except close to the phase transition, where the Monte Carlo calculations suffer from finite size effects that smear the phase transition, to which the message passing approach appears largely immune. While the results for are quite far from the direct Monte Carlo results, most of the improvement in accuracy from our method is already present even at . Going to offers only a small additional improvement in this case.

The apparent position of the phase transition aligns well with the predictions derived from the value of the Jacobian for each value of . The transition is particularly clear in the gradient discontinuity of the magnetization. For and 2 the heat capacity appears to exhibit a discontinuity at the transition, which differs from the divergence we expect on low-dimensional lattices but bears a resemblance to the behavior seen on Bethe lattices and other homogeneous tree-like networks bethe1935statistical ; mancini2006equations ; dorogovtsev2008critical .

Figure 2: Results for the ferromagnetic Ising model in zero field as a function of temperature  in a large network generated using the procedure described in Section III.2. The top panel shows the average magnetization, while the bottom one shows the heat capacity and the entropy (the latter shifted up for visualization purposes). The magnitude of the leading eigenvalue for the Jacobian is also shown in the top panel for all three values of , and we can see that the apparent positions of the phase transition, revealed by discontinuities in the physical quantities or their gradients, correspond closely to the temperatures at which the associated eigenvalues are equal to 1.

iii.3 Real-world networks

For our next example we look at an application on a real-world network, where we do not expect the method to be exact though, as we will see, it nonetheless performs well. The network we examine has larger local neighborhoods than our synthetic example, which means we are not able to sum exhaustively over all configurations of the spins  in Eq. (12) (and similarly in Eq. (11)) so, as described in Section II.3, we instead make use of Monte Carlo sampling to estimate the messages  and marginals .

The summation over local spins in Eq. (12) is equivalent to computing the expectation in Eq. (15). To calculate we fix the values of its incoming messages and perform Monte Carlo sampling over the states of the spins in the neighborhood with the Hamiltonian of Eq. (9). Then we compute the average in Eq. (15) separately for the cases and and normalize to ensure that the results sum to one. The resulting values for can then be used as incoming messages for calculating other messages in other neighborhoods. We perform the Monte Carlo using the Wolff cluster algorithm wolff1989collective , which makes use of the Fortuin-Kasteleyn percolation representation of the Ising model to flip large clusters of spins simultaneously and can significantly reduce the time needed to obtain independent samples, particularly close to the critical point. Once the messages have converged to their final values we compute the marginals  by performing a second Monte Carlo sampling, this time over the spins  with the Hamiltonian of Eq. (8). More details on the procedure are given in Section A.2 of the appendix.

The Monte Carlo approach combines the best aspects of message passing and traditional Monte Carlo calculations. Message passing reduces the sums we need to perform to sets of spins much smaller than the entire network, while the Monte Carlo approach reduces dramatically the number of spin states that need to be evaluated. The approach has other advantages too. For instance, because of the small neighborhood sizes it shows improved performance in systems with substantial energy barriers that might otherwise impede ergodicity, such as antiferromagnetic systems. But perhaps its biggest advantage is that it effectively allows us to sample very large numbers of states of the network without taking very large samples of individual neighborhoods. If we sample configurations from one neighborhood and configurations from another, then in effect we are summing over possible combinations of states in the union of the two neighborhoods. Depending on the value of , there are at least neighborhoods in a network, where is the number of edges, and hence we are effectively summing over at least  states overall, a number that increases exponentially with network size. Effective sample sizes of or more are easily reachable, far beyond what is possible with traditional Monte Carlo methods.

Figure 3 shows the results of applying these methods to a network from davis2011university representing the structure of an electric power grid, along with results from direct Monte Carlo simulations on the same network. As the figure shows, the magnetization is again poorly approximated by the traditional () message passing algorithm, but improves as  increases. In particular, the behavior in the region of the phase transition is quite poor for and does not provide a good estimate of the position of the transition, but for and 2 one can extract much better estimates, with the critical temperature falling somewhere in the region of in this case. We also see a much clearer phase transition in the message passing results than in the standard Monte Carlo, because of finite size effects in the latter. These results all suggest that for real systems our method can give substantial improvements over both ordinary belief propagation and direct Monte Carlo simulation, and in some cases show completely different behavior altogether.

Figure 3: Results for the ferromagnetic Ising model in zero field as a function of temperature  on a network representing the structure of an electric power grid. Again the message passing results approximate the real solution progressively better as  grows larger.

Iv Discussion

In this paper we have presented a new class of message passing algorithms for solving probabilistic models on networks that contain a high density of short loops. Taking the Ising model as an example, we have shown that our methods give substantially improved results in calculations of magnetization, heat capacity, entropy, marginal spin probabilities, and other quantities over standard message passing methods that do not account for the presence of loops. Our methods are exact on networks with short loops up to a fixed maximum length which we can choose, and can give good approximations on networks with loops of any length.

There are many ways in which the methods and results of this paper could be extended. We have studied only one application in detail, the Ising model, but the formalism we present is a general one that could be applied to many other models. In principle, any model with sparse pairwise interactions (i.e., interactions whose number scales sub-quadratically with the number of variables) could be studied using these methods. For example, there is a large class of generative models of networks in which each edge is a random variable, drawn independently from a distribution that depends on the properties of the adjacent nodes. Examples include the Chung-Lu model 

CL02b and the stochastic block model and its variants HLL83 ; KN11a . If we assume an observed network to be drawn from such a model then we can use statistical inference to estimate the values of hidden node attributes that influence edge probability, such as community membership. Our message passing methods could be applied to such inference calculations and could in principle give more accurate results in the common case where the observed network contains many short loops.

Another potential application in the realm of statistical inference is the inverse Ising model, the problem of inferring the parameters of an Ising or Ising-like model from an observed sequence of spin states, which has numerous applications including the reconstruction of neural pathways schneidman2006weak , the inference of protein structure morcos2011direct , and correlations within financial markets bury2013market . It can be shown that the one- and two-point correlation functions of the observed spins are sufficient statistics to reliably estimate coupling and external field parameters nguyen2017inverse

and our method could be used to compute these statistics on loopy networks to greater accuracy than with traditional message passing and faster than standard Monte Carlo simulation. Other potential applications, further afield from traditional statistical physics, include the solution of constraint satisfaction problems, coding theory, and combinatorial optimization.

Acknowledgements.
This work was funded in part by the US Department of Defense NDSEG fellowship program (AK) and by the National Science Foundation under grants DMS–1710848 and DMS–2005899 (MEJN).

References

  • (1) M. E. J. Newman, Networks, 2nd ed., Oxford University Press, Oxford (2018).
  • (2) I. Z. Kiss, J. C. Miller, and P. L. Simon, Mathematics of Epidemics on Networks, Springer International Publishing, Berlin (2017).
  • (3) B. A. Carreras, V. E. Lynch, I. Dobson, and D. E. Newman, Dynamical and probabilistic approaches to the study of blackout vulnerability of the power transmission grid. Proceedings of the 37th Annual Hawaii International Conference on System Sciences, pp. 7–14, Institute of Electrical and Electronics Engineers, New York (2004).
  • (4) S. N. Dorogovtsev, A. V. Goltsev, and J. F. F. Mendes, Ising model on networks with an arbitrary distribution of connections. Phys. Rev. E 66, 016104 (2002).
  • (5) M. I. Jordan, Learning in Graphical Models, Kluwer Academic Publishers, Dordrecht (1998).
  • (6) D. Koller and N. Friedman, Probabilistic Graphical Models: Principles and Techniques, MIT Press, Cambridge, MA (2009).
  • (7) G. Mussardo, Statistical Field Theory: An Introduction to Exactly Solved Models in Statistical Physics, Oxford University Press, Oxford (2010).
  • (8) C. N. Yang, The spontaneous magnetization of a two-dimensional Ising model. Physical Review 85, 808 (1952).
  • (9) H. A. Bethe, Statistical theory of superlattices. Proc. R. Soc. London A 150, 552–575 (1935).
  • (10) M. Mezard and A. Montanari, Information, Physics, and Computation, Oxford University Press, Oxford (2009).
  • (11) D. J. Watts and S. H. Strogatz, Collective dynamics of small-world networks. Nature 393, 440 (1998).
  • (12) M. Eckstein, M. Kollar, K. Byczuk, and D. Vollhardt, Hopping on the Bethe lattice: Exact results for densities of states and dynamical mean-field theory. Phys. Rev. B 71, 235119 (2005).
  • (13) F. L. Metz, I. Neri, and D. Bollé, Spectra of sparse regular graphs with loops. Phys. Rev. E 84, 055101 (2011).
  • (14) D. Bollé, F. L. Metz, and I. Neri, On the spectra of large sparse graphs with cycles. Preprint arxiv:1206.1512 (2012).
  • (15) M. E. J. Newman, Random graphs with clustering. Phys. Rev. Lett. 103, 058701 (2009).
  • (16) S. Yoon, A. V. Goltsev, S. N. Dorogovtsev, and J. Mendes, Belief-propagation algorithm and the Ising model on networks with arbitrary distributions of motifs. Physical Review E 84, 041144 (2011).
  • (17) A. Montanari and T. Rizzo, How to compute loop corrections to the Bethe approximation. Journal of Statistical Mechanics: Theory and Experiment 2005, 10011 (2005).
  • (18) M. Chertkov and V. Y. Chernyak, Loop calculus in statistical physics and information science. Physical Review E 73, 065102 (2006).
  • (19) M. E. J. Newman and J. Park, Why social networks are different from other types of networks. Physical Review E 68, 036122 (2003).
  • (20) J. S. Yedidia, W. T. Freeman, and Y. Weiss, Generalized belief propagation. Proceedings of the 13th International Conference on Neural Information Processing Systems, pp. 668–674, MIT Press, Cambridge, MA (2000).
  • (21) M. Welling, On the choice of regions for generalized belief propagation.

    Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence

    , pp. 585–592, AUAI Press, Banff, CA (2004).
  • (22) G. T. Cantwell and M. Newman, Message passing on networks with loops. Proc. Natl. Acad. Sci. USA 116, 23398 (2019).
  • (23) W. O. Kermack and A. G. McKendrick, A contribution to the mathematical theory of epidemics. Proc. R. Soc. London A 115, 700–721 (1927).
  • (24) R. Durrett, Spatial Epidemic Models: Their structure and Relation to Data, Cambridge University Press, Cambridge (1995).
  • (25) D. Stauffer and S. Solomon, Ising, Schelling and self-organising segregation. Eur. Phys. J. B 57, 473–479 (2007).
  • (26) A. Pelizzola, Cluster variation method in statistical physics and probabilistic graphical models. J. Phys. A 38, R309 (2005).
  • (27) F. Krzakala, F. Ricci-Tersenghi, L. Zdeborová, R. Zecchina, E. W. Tramel, and L. F. Cugliandolo, Statistical Physics, Optimization, Inference, and Message-passing Algorithms, Oxford University Press, Oxford (2016).
  • (28)

    S. Geman and C. Graffigne, Markov random field image models and their applications to computer vision.

    Proceedings of the International Congress of Mathematicians, p. 2, International Congress of Mathematicians, Berkeley, CA (1986).
  • (29)

    M. Yasuda and T. Horiguchi, Triangular approximation for Ising model and its application to Boltzmann machine.

    Physica A 368, 83–95 (2006).
  • (30)

    A. Decelle, F. Krzakala, C. Moore, and L. Zdeborová, Asymptotic analysis of the stochastic block model for modular networks and its algorithmic applications.

    Phys. Rev. E 84, 066106 (2011).
  • (31) W.-X. Zhou and D. Sornette, Self-organizing Ising model of financial markets. Eur. Phys. J. B 55, 175–181 (2007).
  • (32) S. Galam, Rational group decision making: A random field Ising model at . Physica A 238, 66–80 (1997).
  • (33) D. Stauffer, Social applications of two-dimensional Ising models. American Journal of Physics 76, 470 (2008).
  • (34) D. J. MacKay and D. J. Mac Kay, Information Theory, Inference and Learning Algorithms, Cambridge University Press, Cambridge (2003).
  • (35) H. S. Migon, D. Gamerman, and F. Louzada, Statistical Inference: An Integrated Approach CRC Press, Boca Raton, FL (2014).
  • (36) N. Friel and J. Wyse, Estimating the evidence—a review. Statistica Neerlandica 66, 288–308 (2012).
  • (37) S. Salinas, Introduction to Statistical Physics, Springer, New York (2001).
  • (38) R. J. Baxter, Exactly Solved Models in Statistical Mechanics, Academic Press, London (1982).
  • (39) C. E. Shannon, Prediction and entropy of printed English. Bell System Technical Journal 30, 50–64 (1951).
  • (40) W. Bialek, Biophysics: Searching for Principles, Princeton University Press, Princeton, NJ (2012).
  • (41) P. Cabral, G. Augusto, M. Tewolde, and Y. Araya, Entropy in urban systems. Entropy 15, 5223–5236 (2013).
  • (42) J. S. Yedidia, W. T. Freeman, and Y. Weiss, Understanding belief propagation and its generalizations. Exploring Artificial Intelligence in the New Millennium 8, 236–239 (2003).
  • (43) J. S. Yedidia, W. T. Freeman, and Y. Weiss, Constructing free-energy approximations and generalized belief propagation algorithms. IEEE Transactions on Information Theory 51, 2282–2313 (2005).
  • (44) R. Kikuchi, A theory of cooperative phenomena. Physical Review 81, 988–1003 (1951).
  • (45) H. J. Kappen and W. Wiegerinck, Novel iteration schemes for the cluster variation method. Advances in Neural Information Processing Systems, pp. 415–422, MIT Press, Cambridge, MA (2002).
  • (46) P. Pakzad and V. Anantharam, Minimal graphical representation of Kikuchi regions. Proceedings of the Annual Allerton Conference on Communication Control and Computing, pp. 1586–1595, University of Illinois, Champaign, IL (2002).
  • (47) J. Mooij and H. Kappen, On the properties of the Bethe approximation and loopy belief propagation on binary networks. J. Stat. Mech 2005, P11012 (2005).
  • (48) F. Krzakala, C. Moore, E. Mossel, J. Neeman, A. Sly, L. Zdeborová, and P. Zhang, Spectral redemption in clustering sparse networks. Proc. Natl. Acad. Sci. USA 110, 20935–20940 (2013).
  • (49) B. Karrer, M. E. J. Newman, and L. Zdeborová, Percolation on sparse networks. Phys. Rev. Lett. 113, 208702 (2014).
  • (50) A. Allard and L. Hébert-Dufresne, On the accuracy of message-passing approaches to percolation in complex networks. Preprint arXiv:1906.10377 (2019).
  • (51) U. Wolff, Collective Monte Carlo updating for spin systems. Phys. Rev. Lett. 62, 361–364 (1989).
  • (52) F. Mancini and A. Naddeo, Equations-of-motion approach to the spin-1/2 Ising model on the Bethe lattice. Phys. Rev. E 74, 061108 (2006).
  • (53) S. N. Dorogovtsev, A. V. Goltsev, and J. F. Mendes, Critical phenomena in complex networks. Rev. Mod. Phys. 80, 1275–1335 (2008).
  • (54) T. A. Davis and Y. Hu, The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS) 38, 1 (2011).
  • (55) F. Chung and L. Lu, The average distances in random graphs with given expected degrees. Proc. Natl. Acad. Sci. USA 99, 15879–15882 (2002).
  • (56) P. W. Holland, K. B. Laskey, and S. Leinhardt, Stochastic blockmodels: First steps. Social Networks 5, 109–137 (1983).
  • (57) B. Karrer and M. E. J. Newman, Stochastic blockmodels and community structure in networks. Phys. Rev. E 83, 016107 (2011).
  • (58) E. Schneidman, M. J. Berry II, R. Segev, and W. Bialek, Weak pairwise correlations imply strongly correlated network states in a neural population. Nature 440, 1007–1012 (2006).
  • (59) F. Morcos, A. Pagnani, B. Lunt, A. Bertolino, D. S. Marks, C. Sander, R. Zecchina, J. N. Onuchic, T. Hwa, and M. Weigt, Direct-coupling analysis of residue coevolution captures native contacts across many protein families. Proc. Natl. Acad. Sci. USA 108, 1293–1301 (2011).
  • (60) T. Bury, Market structure explained by pairwise interactions. Physica A 392, 1375–1385 (2013).
  • (61)

    H. C. Nguyen, R. Zecchina, and J. Berg, Inverse statistical problems: From the inverse Ising problem to data science.

    Advances in Physics 66, 197–261 (2017).

Appendix A Supplementary information

a.1 Calculation of the heat capacity
using message passing

The heat capacity, which is given by

(32)

can be calculated from the expression for the internal energy

(33)

where instead of incorporating the dependence into the Hamiltonian as in the main paper, we now display it explicitly. In this expression,  denotes the neighborhood of node  as in the main text,  denotes the node  and its immediately adjacent edges and nodes, and and represent the terms in the Hamiltonian for these subgraphs:

(34)

and

(35)

with the dependence omitted from the definition of the functions. With the dependence written in this way the message passing equations take the form

(36)

and

(37)

with

(38)
(39)

Differentiating (A.1) with respect to  and defining the quantity

(40)

we get

(41)

which can be regarded as a new message passing equation for the derivative . To apply it, we first solve for the in the usual fashion then fix their values and iterate (A.1) from a suitable initial condition until convergence.

For large neighborhoods, where the sums over spins states cannot be performed exhaustively, the local Monte Carlo procedure described in the main text carries over naturally. We define

(42)

and then rewrite Eq. (A.1) as an average

(43)

which can be evaluated using Monte Carlo sampling as previously.

We can also differentiate , Eq. (38), which yields

(44)

which can again be written as an average

(45)

where we have used a shorthand analogous to that of Eq. (42):

(46)

Differentiating Eq. (33) and substituting from Eqs. (A.1) and (45) we now find, after some manipulation, that

(47)

which can be substituted into Eq. (32) to calculate .

a.2 Local Monte Carlo simulation for the Ising model

As discussed in the main text, when neighborhoods are too large to allow us to sum exhaustively over their states we can approximate the message passing equations by Monte Carlo sampling. Taking again the example of the Ising model, the message passing equations are

(48)

where the messages in this case represent the probability of the corresponding spin being . If we divide top and bottom by in the first equation and by in the second, we get

(49)