The Ising model
is a fundamental probability distribution defined in terms of a graphwhose nodes and edges are associated with scalar parameters and
respectively. The distribution samples a vectorwith probability:
serves to provide normalization. Roughly speaking, there is a random variableat every node of , and this variable may be in one of two states, or spins: up () or down (). The scalar parameter models a local field at node . The sign of represents whether this local field favors taking the value , i.e. the up spin, when , or the value , i.e. the down spin, when , and its magnitude represents the strength of the local field. Similarly, represents the direct interaction between nodes and . Its sign represents whether it favors equal spins, when , or opposite spins, when , and its magnitude corresponds to the strength of the direct interaction. Of course, depending on the structure of and the node and edge parameters, there may be indirect interactions between nodes, which may overwhelm local fields or direct interactions.
Many popular models, for example, the usual ferromagnetic Ising model [Ising25, Onsager44], the Sherrington-Kirkpatrick mean field model [SherringtonK75] of spin glasses, and the Hopfield model [Hopfield82]
of neural networks, the Curie-Weiss model[DeserCG68] all belong to the above family of distributions, with various special structures on , the ’s and the
’s. Since its introduction in Statistical Physics, the Ising model has found a myriad of applications in diverse research disciplines, including probability theory, Markov chain Monte Carlo, computer vision, theoretical computer science, social network analysis, game theory, computational biology, and neuroscience; see e.g.[LevinPW09, Chatterjee05, Felsenstein04, DaskalakisMR11, GemanG86, Ellison93, MontanariS10]
and their references. The ubiquity of these applications motivate the problem of inferring Ising models from samples, or inferring statistical properties of Ising models from samples. This type of problem has enjoyed much study in statistics, machine learning, and information theory; see, e.g.,[ChowL68, AbbeelKN06, CsiszarT06, Chatterjee07, RavikumarWL10, JalaliJR11, SanthanamW12, BreslerGS14, Bresler15, VuffrayMLC16, BreslerK16, Bhattacharya16, BhattacharyaM16, MartindelCampoCU16, KlivansM17, HamiltonKM17, DaskalakisDK18].
Despite the wealth of theoretical study and practical applications of this model, outlined above, there are still aspects of it that are poorly understood. In this work, we focus on the important topic of concentration of measure. We are interested in studying the concentration properties of polynomial functions of the Ising model. That is, for a random vector sampled from as above and a polynomial , we are interested in the concentration of around its expectation . Since the coordinates of take values in , we can without loss of generality focus our attention to multi-linear functions (Definition 1).
While the theory of concentration inequalities for functions of independent random variables has reached a high level of sophistication, proving concentration of measure for functions of dependent random variables is significantly harder, the main tools being martingale methods, logarithmic Sobolev inequalities and transportation cost inequalities. One shortcoming of the latter methods is that explicit constants are very hard or almost impossible to get. For the Ising model, in particular, the log-Sobolev inequalities of Stroock and Zegarlinski [StroockZ92], known under high temperature,111High temperature is a widely studied regime of the Ising model where it enjoys a number of useful properties such as decay of correlations and fast mixing of the Glauber dynamics. Throughout this paper we will take “high temperature” to mean that Dobrushin’s conditions of weak dependence are satisfied. See Definition 2. do not give explicit constants, and it is also not clear whether they extend to systems beyond the lattice.
An alternative approach, proposed recently by Chatterjee [Chatterjee05], is an adaptation to the Ising model of Stein’s method of exchangeable pairs. This powerful method is well-known in probability theory, and has been used to derive concentration inequalities with explicit constants for functions of dependent random variables (see [MackeyJCFT14, DaskalakisDK18] for some recent works). Chatterjee uses this technique to establish concentration inequalities for Lipschitz functions of the Ising model under high temperature. While these inequalities are tight (and provide Gaussian tails) for linear functions of the Ising model, they are unfortunately not tight for higher degree polynomials, in that the concentration radius is off by factors that depend on the dimension . For example, consider the function of an Ising model without external fields, where the ’s are signs. Chatterjee’s results imply that this function concentrates at radius , but as we show this is suboptimal by a factor of .
In particular, our main technical contribution is to obtain near-tight concentration inequalities for polynomial functions of the Ising model, whose concentration radii are tight up to logarithmic factors. A corollary of our main result (Theorem 5) is as follows:
Consider any degree- multilinear function with coefficients in , defined on an Ising model without external field in the high-temperature regime. Then there exists a constant (depending only on ) such that for any , we have
The concentration radius is tight up to logarithmic factors, and the tail bound is tight up to a factor in the exponent of the tail bound.
Under existence of external fields, it is easy to see that the above concentration does not hold, even for bilinear functions, as observed in Section 3.7. Motivated by our applications in Section 5 we extend the above concentration of measure result to centered bilinear functions (where each variable appears as in the function) that also holds under arbitrary external fields; see Theorem 3. We leave extensions of this result to higher degree multinear functions to the next version of this paper.
Lastly, like Chatterjee and Stroock and Zegarlinski, we prove our results under high temperature. On the other hand, it is easy to construct low temperature Ising models where no non-trivial concentration holds.222Consider an Ising model with no external fields, comprising two disjoint cliques of half the vertices with infinitely strong bonds; i.e. for all , and if and belong to the same clique. Now consider the multilinear function , wher denotes that and are not neighbors (i.e. belong to different cliques). It is easy to see that the maximum absolute value of is and that there is no concentration at radius better than some .
With our theoretical understanding in hand, we proceed with an experimental evaluation of the efficacy of multilinear functions applied to hypothesis testing. Specifically, given a binary vector, we attempt to determine whether or not it was generated by an Ising model. Our focus is on testing whether choices in social networks can be approximated as an Ising model, a common and classical assumption in the social sciences [Ellison93, MontanariS10]. We apply our method to both synthetic and real-world data. On synthetic data, we investigate when our statistics are successful in detecting departures from the Ising model. For our real-world data study, we analyze the Last.fm dataset from HetRec’11 [CantadorBK11]. Interestingly, when considering musical preferences on a social network, we find that the Ising model may be more or less appropriate depending on the genre of music.
1.1 Related Work
As mentioned before, Chatterjee previously used the method of exchangeable pairs to prove variance and concentration bounds for linear statistics of the Ising model[Chatterjee05]. In [DaskalakisDK18], the authors apply and extend this method to prove variance bounds for bilinear statistics. The present work improves upon this by proving concentration rather than bounding the variance, as well as considering general degrees rather than just . In simultaneous work333The present work was under submission to NIPS 2017 from May 19 to September 4, 2017, and thus not made public until after being accepted., Gheissari, Lubetzky, and Peres proved concentration bounds which are qualitatively similar to ours, though the techniques are somewhat different[GheissariLP17].
In Section 2, we define the notation we use in this paper. We describe and prove our results for concentration of bilinear functions in Section 3. This method serves as a blueprint for our main result, concentration of higher-order multilinear functions, which we show in Section 4. In Section 5, we describe our experimental investigation and results.
We will abuse notation, referring to both the probability distribution and the random vector that it samples in as the Ising model. That is, . We will subscript as follows. At times, we will consider a sequence of ’s at various “time steps” – we will use or to denote random vectors in this sequence. Other times, we will need to consider the value of the vector at a particular node – we will use or
to indicate random variables in this sequence. Whether we index based on time step versus node should be apparent from the choice of subscript variable, and otherwise clear from context. Occationally, we will use both:denotes the variable corresponding to node in the Ising model at some time step . Throughout the paper we will refer to the set .
A degree- multilinear function defined on variables is a polynomial such that
where is a coefficient vector.
When the degree , we will refer to the function as a linear function, and when the degree we will call it a bilinear function. Note that since , any polynomial function of an Ising model is a multilinear function. We will use to denote the coefficient vector of such a multilinear function. Note that we will use permutations of the subscripts to refer to the same coefficient, i.e., is the same as . Also we will use the term -linear function to refer to a multilinear function of degree .
We say an Ising model has no external field if for all . An Ising model is ferromagnetic if for all .
We now give a formal definition of the high-temperature regime, also known as Dobrushin’s uniqueness condition – in this paper, we will use the terms interchangeably.
Definition 2 (Dobrushin’s Uniqueness Condition).
Consider an Ising model defined on a graph with and parameter vector . Suppose for some . Then is said to satisfy Dobrushin’s uniqueness condition, or be in the high temperature regime. In this paper, we use the notation that an Ising model is -high temperature to parameterize the extent to which it is inside the high temperature regime. Note that since for all , the above condition follows from more simplified conditions which avoid having to deal with hyperbolic functions. For instance, either of the following two conditions:
are sufficient to imply Dobrushin’s condition (where and is the maximum degree of ).
In some situations, we may use the parameter implicitly and simply say the Ising model is in the high temperature regime. In general, when one refers to the temperature of an Ising model, a high temperature corresponds to small values, and a low temperature corresponds to large values.
We will use the following lemma which shows concentration of measure for Lipschitz functions on the Ising model in high temperature. It is a well-known result and can be found for instance as Theorem 4.3 of [Chatterjee05].
Lemma 1 (Lipschitz Concentration Lemma).
Suppose that is a function of an Ising model in the high-temperature regime. Suppose the Lipschitz constants of are respectively. That is,
for all values of and for any and . Then,
Note that this immediately implies sharp concentration bounds for linear functions on the Ising model.
We will refer to elements in as both states and configurations of the Ising model. The name states will be more natural when considering Markov chains such as the Glauber dynamics. Glauber dynamics is the canonical Markov chain for sampling from an Ising model. Glauber dynamics define a reversible, ergodic Markov chain whose stationary distribution is identical to the corresponding Ising model. In many relevant settings, including the high-temperature regime, the dynamics are rapidly mixing (i.e., in steps) and hence offer an efficient way to sample from Ising models. We consider the basic variant known as single-site Glauber dynamics. The dynamics are a Markov chain defined on the set . They proceed as follows:
Let denote the state of the dynamics at time . Start at any state .
Let be the set of neighbors of node . Pick a node uniformly at random and update as follows
Glauber dynamics for an Ising model in the high temperature regime are fast mixing. In particular, they mix in steps. To be more concrete, for an Ising model in -high temperature, we define
The dynamics for an Ising model in high temperature also display the cutoff phenomenon. Due to this, we have Lemma 2.
Let be any starting state for the Glauber dynamics and let for some . If is the state reached after steps of the dynamics, then
for all .
This follows in a straightforward manner from the cutoff phenomenon observed with respect to the mixing of the Glauber dynamics in this setting. The bound on the mixing time of Glauber dynamics for high temperature Ising models (Theorem 15.1 of [LevinPW09])444Note that Theorem 15.1 of [LevinPW09] uses a definition of high temperature which is less general than the one we present here. But it can also be shown via very similar calculations to hold for our more general version of the high temperature regime. gives us that to achieve , we must run the dynamics for steps. This implies, that after steps, the total variation distance achieved is
The Hamming distance between is defined as .
Definition 4 (The greedy coupling).
Consider two instances of Glauber dynamics associated with the same Ising model : and . The following coupling procedure is known as the greedy coupling. Start chain 1 at and chain 2 at and in each time step , choose a node uniformly at random to update in both the runs. Let denote the probability that the first chain sets and let be the probability that the second chain sets . Let be a rearrangement of the values in increasing order. Also let and . Draw a number uniformly at random from and couple the updates according to the following rule:
If for some , set for all and for all .
We summarize some properties of this coupling in the following lemma, which appear in Chapter 15 of [LevinPW09].
The greedy coupling (Definition 4) satisfies the following properties.
It is a valid coupling.
If is an Ising model in -high temperature, then
The distribution of , for any , conditioned on is independent of .
We briefly review some definitions from the theory of martingales in this section.
A probability space is defined by a triple where is the possible set of outcomes of the probability space. is a -field which is a set of all measurable events of the space and is a function which maps events in to probability values.
A sequence of random variables on the probability space is a martingale sequence if for all , .
A stopping time with respect to a martingale sequence defined on is a function such that for all . Also, is allowed.
Let be a set of possibly dependent random variables. Consider any function on them. Then the sequence where
is a martingale sequence and is known as the Doob martingale of the function .
A popular set of tools which have been used for showing concentration results such as McDiarmid’s inequality come from the theory of martingales. In our proof, the following two martingale inequalities will be useful. The first is the well-known Azuma’s inequality.
Lemma 4 (Azuma’s Inequality).
Let be a probability space. Let be an increasing sequence of sub--fields of . Let be random variables on such that is -measurable. Suppose they represent a sequence of martingale increments. That is, or forms a martingale sequence defined on the space . Let be such that for all . Then for all ,
The second inequality due to Freedman is a generalization of Azuma’s inequality. It applies when a bound on the martingale increments only holds until some stopping time, unlike Azuma’s, which requires a bound on the martingale increments for all times.
Lemma 5 (Freedman’s Inequality (Proposition 2.1 in [Freedman75])).
Let be a probability space. Let be an increasing sequence of sub--fields of . Let be random variables on such that is -measurable. Suppose they represent a sequence of martingale increments. That is, forms a martingale sequence defined on the space . Let be a stopping time defined on and be such that for . Let and . Then,
3 Concentration of Measure for Bilinear Functions
In this section, we prove our main concentration result for bilinear functions of the Ising model. This is not as technically involved as the result for general-degree multilinear functions, but exposes many of the main conceptual ideas. The theorem statement is as follows:
Consider any bilinear function on an Ising model (defined on a graph such that ) in -high-temperature regime with no external field. Let . If , then for any , we have
Note that, for the sake of convenience in our proof, this theorem is stated for bilinear functions where all terms are of degree . One can immediately obtain concentration for all bilinear functions by combining this result with concentration bounds for linear functions (see, i.e. Lemma 1). Since linear functions concentrate in a much tighter radius (, rather than ), this comes at a minimal additional cost.
3.1 Overview of the Technique
A well known approach to proving concentration inequalities for functions of dependent random variables is the via martingale tail bounds. For instance, Azuma’s inequality yields such bounds without requiring any form of independence among the random variables it considers. It gives useful tail bounds whenever one can bound the martingale increments (i.e., the differences between consecutive terms of the martingale sequence) of the underlying martingale in absolute value. Such an approach is fruitful in showing concentration of linear functions on the Ising model in high temperature. The Glauber dynamics associated with Ising models in high temperature are fast mixing and offer a natural way to define a martingale sequence. In particular, consider the Doob martingale corresponding to any linear function for which we wish to show concentration, defined on the state of the dynamics at some time step , i.e. . If we choose larger than then would be very close to a sample from irrespective of the starting state. We set the first term of the martingale sequence as and the last term is simply . By bounding the martingale increments we can show that concentrates at the right radius with high probability. By making large enough we can argue that . Also, crucially, need not be too large since the dynamics are fast mixing. Hence we don’t incur too big a hit when applying Azuma’s inequality, and one can argue that linear functions are concentrated with a radius of . Crucial to this argument is the fact that linear functions are -Lipschitz (when the entries of are constant), bounding the Doob martingale differences to be .
The challenge with bilinear functions is that they are -Lipschitz – a naive application of the same approach gives a radius of concentration of , which albeit better than the trivial radius of is not optimal. To show stronger concentration for bilinear functions, at a high level, the idea is to bootstrap the known fact that linear functions of the Ising model concentrate well at high temperature.
The key insight is that, when we have a -linear function, its Lipschitz constants are bounds on the absolute values of certain -linear functions. In particular, this implies that the Lipschitz constants of a bilinear function are bounds on the absolute values of certain associated linear functions. And although a worst case bound on the absolute value of linear functions with bounded coefficients would be , the fact that linear functions are concentrated within a radius of , means that bilinear functions are -Lipschitz in spirit. In order to exploit this intuition, we turn to more sophisticated concentration inequalities, namely Freedman’s inequality (Lemma 5). This is a generalization of Azuma’s inequality, which handles the case when the martingale differences are only bounded until some stopping time (very roughly, the first time we reach a state where the expectation of the linear function after mixing is large). To apply Freedman’s inequality, we would need to define a stopping time which has two properties:
The stopping time is larger than with high probability. Hence, with a good probability the process doesn’t stop too early. The harm if the process stops too early (at ) is that we will not be able to effectively decouple from the choice of . is chosen to be larger than the mixing time of the Glauber dynamics precisely because it allows us to argue that .
For all times less than the stopping time, the martingale increments are bounded, i.e. where is the martingale sequence.
We observe that the martingale increments corresponding to a martingale defined on a bilinear function have the flavor of the conditional expectations of certain linear functions which can be shown to concentrate at a radius when the process starts at its stationary distribution. This provides us with a nice way of defining the stopping time to be the first time when one of these conditional expectations deviates by more than from the origin. The stopping time we use is a bit more involved but its formulation is completely guided by the criteria listed above. Once, we have defined the stopping time, the next thing to show before we can apply Freedman’s inequality is a bound on the conditional variance of the martingale increments which we do so again using the property that the martingale increments are bounded up until stopping time. Finally we proceed to apply Freedman’s inequality to bound the desired quantity.
It is worth noting that the martingale approach described above closely relates to the technique of exchangeable pairs exposited by Chatterjee [Chatterjee05]. When we look at differences for the martingale sequence defined using the Glauber dynamics, we end up analyzing an exchangeable pair of the following form: sample from the Ising model. Take a step along the Glauber dynamics starting from to reach . forms an exchangeable pair. This is precisely how Chatterjee’s application of exchangeable pairs is set up. Chatterjee then goes on to study a function of and which serves as a proxy for the variance of and obtains concentration results by bounding the absolute value of this function. The definition of the function involves considering two greedily coupled runs of the Glauber dynamics just as we do in our martingale based approach.
To summarize, our proof of bilinear concentration involves showing various concentration properties for linear functions via Azuma’s inequality (Section 3.3), showing that the martingale has -bounded differences before our stopping time (Section 3.5), proving that the stopping time is larger than the mixing time with high probability (Lemma 8), and combining these ingredients using Freedman’s inequality (Section 3.6).
The organization of this section is as follows. We will first focus on proving concentration for bilinear statistics with no external field. In Section 3.2, we state some additional preliminaries, and describe the martingale sequence and stopping time we will consider. In Section 3.3, we prove certain concentration properties of linear functions of the Ising model – in particular, these will be useful in showing that the stopping time is large. In Section 3.5, we show that our martingale sequence has bounded differences before the stopping time. In Section 3.6, we put the pieces together and prove bilinear concentration. In Section 3.7, we discuss how to prove concentration for bilinear statistics under an external field. Note that under an external field, not all bilinear functions of the Ising model concentrate, and thus our statistics require appropriate recentering. In Section 3.8, we briefly argue that the exponential behavior of the tail is inherent – for example, it could not be improved to a Gaussian tail.
We will consider functions where , Theorem 2 follows by a scaling argument. Let and define as follows:
The quantity of interest which we would like to bound is where is a sample from the Ising model . For the time being, we will focus on the setting with no external field for ease of exposition555Concentration under an external field (with appropriate re-centering) is discussed in Section 3.7..
A crucial quantity to the whole discussion will be where is obtained by taking a single step of the Glauber dynamics associated with a high temperature Ising model starting from . Define . These linear functions will arise as a result of looking at , as shown in the following claim:
If is obtained by taking a step of the Glauber dynamics starting from , then
In each step of the Glauber dynamics, a node is chosen uniformly at random and updated according to the distribution of conditioned on its neighbors under the Ising model. With some probability , the dynamics leave node unchanged (i.e. update it to its current value ). In this scenario, . If, on the other hand, the dynamics flip the sign of node , then . Since , . ∎
Next, we define a martingale sequence associated with any bilinear function of the Ising model. A sufficiently strong tail inequality on the difference between the first and last terms of the martingale will get us very close to the desired concentration result.
Let . Let be a sample from the Ising model . Consider a walk of the Glauber dynamics starting at and running for steps: . can be viewed as a function of all the random choices made by the dynamics up to that point. That is, where is a random variable representing the random choices made by the dynamics in step . More precisely, represents the realization of the random choice of which node to (attempt to) update and a random variable (based upon which we decide whether or not to update the node’s variable). Hence where . Consider the Doob martingale associated with defined on the probability space where is the set of all possible values of the variables under the above described stochastic process and is the function which assigns probability to events in according to the underlying stochastic process. Also consider the increasing sequence of sub--fields where is the set of all possible values of the variables . The terms in the martingale sequence are as follows.
Since the dynamics are Markovian, we can also write as follows:
Note that we deliberately choose to skip the term and start the martingale sequence at instead. This is crucial because it enables us to obtain strong bounds on the martingale increments. We have a good understanding over the behavior of the difference in values of conditioned on versus but apriori we can’t bound .
At this point, we could try and apply Azuma’s inequality by bounding the martingale increments . However, these increments can be in magnitude which would yield a radius of concentration of from Azuma’s inequality. As was remarked earlier, this is weak and we will see how we can show a radius of concentration by harnessing the fact that the martingale increments are rarely, if ever, of the order . This is because of concentration of linear functions on the Ising model. To harness this fact, we appeal to Freedman’s inequality (Lemma 5) and the first order of business in applying Freedman’s inequality effectively is to define a stopping time on the martingale sequence such that two things hold:
The stopping time is larger than with high probability. Hence, with a good probability the process doesn’t stop too early. The harm if the process stops too early (at ) is that we will not be able to effectively decouple from the choice of . was chosen to be larger than the mixing time of the Glauber dynamics precisely because it allows us to argue that .
For all less than the stopping time, .
With the above criterion in mind, we define a stopping time on the martingale sequence.
Consider the martingale sequence defined in Definition 9. Define the set to be the following set of configurations:
where , for all , is defined as 0 for . Let be a stopping time defined as follows:
Note that the event lies in the -field and hence the above definition is a valid stopping time.
3.3 Properties of Linear Functions of the Ising Model
In this section, we prove the following lemma, concerned primarily with a particular type of concentration of linear functions on the Ising model.
Let be a sample from an Ising model at -high temperature with no external field, and be obtained by taking steps along the Glauber dynamics corresponding to with the condition that the dynamics start at . For any linear function such that , define . Then the following hold for any ,
First, if , then since is the stationary distribution of the associated Glauber chain, as well. Hence, for all .
For showing the second property, we will first bound the Lipschitz constants of the function . We denote by the vector of Lipschitz constants of . Since is a linear function, . We have for any such that ,
where (11) holds for any valid coupling of the two chains starting at and respectively, in particular, we use the greedy coupling (Definition 4) here. (12) follows because , and (13) follows because the expected Hamming distance between and , due to the contracting nature of the Glauber dynamics under the greedy coupling (Lemma 3), is smaller than which is equal to 1. Also note that . Hence, applying Lemma 1 to , we get
Note that we could apply Lemma 1 to because was drawn from the stationary distribution of the Glauber dynamics.
To show the third property, we will define a martingale similar to the one defined in Definition 9 and apply Azuma’s inequality to it. Consider a run of the Glauber dynamics starting at and running for steps. We will view as a function of all the random choices made by the dynamics up to step . That is, where denotes the random choices made by the dynamics during step . More precisely, represents the realization of the random choice of which node to (attempt to) update and a random variable (based upon which we decide whether or not to update the node’s variable). Hence where . Consider the Doob martingale defined on :
Since the dynamics are Markovian, we can also write as follows:
Next we will bound the increments of the above martingale and apply Azuma’s inequality to get the desired tail bound. In the following calculation, we will use the notation where , to denote that is a possible transition according to a single step of the dynamics starting from . For any ,