 # Joint estimation of parameters in Ising model

We study joint estimation of the inverse temperature and magnetization parameters (β,B) of an Ising model with a non-negative coupling matrix A_n of size n× n, given one sample from the Ising model. We give a general bound on the rate of consistency of the bi-variate pseudolikelihood estimator. Using this, we show that estimation at rate n^-1/2 is always possible if A_n is the adjacency matrix of a bounded degree graph. If A_n is the scaled adjacency matrix of a graph whose average degree goes to +∞, the situation is a bit more delicate. In this case estimation at rate n^-1/2 is still possible if the graph is not regular (in an asymptotic sense). Finally, we show that consistent estimation of both parameters is impossible if the graph is Erdös-Renyi with parameter p>0 free of n, thus confirming that estimation is harder on approximately regular graphs with large degree.

## Authors

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## 1 Introduction

Suppose are unknown parameters, and is an symmetric matrix with non-negative entries with on the diagonal. For , define a p.m.f. by setting

 Pn,β,B(X=x)=1Zn(β,B)eβ2x′Anx+B∑ni=1xi. (1.1)

This is the Ising model with coupling matrix , and inverse temperature parameter and magnetization parameter

. Study of Ising models is a growing area which has received significant attention in Statistics and Machine Learning in recent years. The theoretical investigation into the properties of Ising models can be broadly classified into two categories. One of the branches assumes that the matrix

is the unknown parameter of interest, and focuses on estimating under the assumption that i.i.d. copies are available from the model described in 1.1 (c.f. [1, 7, 21, 22] and references there-in). Another branch works under the assumption that only one observation is available from the model in (1.1) (c.f. [5, 8, 11, 15, 17] and references there-in). In this setting, estimation of the whole matrix (which has

entries) is impossible from a vector

of size . As such, the standard assumption is that the matrix is completely specified, and the focus is on estimating the parameters . In this direction, the behavior of the MLE for the Curie-Weiss model (when is the scaled adjacency matrix of the complete graph) was studied in , where the authors showed that in the regime , the MLE of is consistent for if is known, and vice versa. They also show that if both

are unknown, then the joint MLE for the model does not exist with probability 1. This raises the natural question as to whether there are other estimators which work in this case. Focusing on the case when

is known,  gave general sufficient conditions under which the pseudo-likelihood estimate for is consistent. Developing on this approach,  studies the behavior of the rate of consistency of the pseudo-likelihood estimator at all values of

, demonstrating interesting phase transition properties in the rate of the pseudo-likelihood estimator. The question of joint estimation of

for a general matrix was raised in . Up to the best of the our knowledge, this question has not been addressed in the literature. This will be the focus of this paper.

### 1.1 Main results

Throughout this paper we will assume that are unknown parameters of interest, and the coupling matrix has non-negative entries and is completely known. We will also assume the following two conditions

 maxi∈[n]n∑j=1An(i,j)≤ γ, (1.2) liminfn→∞1nn∑i,j=1An(i,j) >0. (1.3)

Here is a finite constant free of . Note that (1.2) implies that satisfies [2, Eqn (1.10)], as well as ([8, Cond (a),Thm 1.1]), where is the operator norm of . The most common examples of the coupling matrix are scaled adjacency matrices of labelled simple graphs, defined as follows:

###### Definition 1.1.

For a graph with vertices labelled by , define the coupling matrix by setting

 An(i,j):=n2E(Gn)1{vertices i and j are % connected in Gn},

where is the number of edges in the graph .

The scaling of the adjacency matrix by ensures that the resulting Ising model has non-trivial phase transition properties (see for e.g. ), which is of much interest in Statistical Physics and Applied Probability. The influence of phase transition on Inference has received recent attention (c.f. [5, 19]). Under this scaling, (1.3) holds trivially, as . Condition (1.2) demands that the maximum degree of is of the same order as the average degree. Below we give examples of some graphs for which (1.2) holds. We will also use these as running examples for all our future assumptions.

###### Definition 1.2.

For a simple graph on vertices, let denote the labelled degrees of .

1. is a regular graph for some . In this case we have for all , and so (1.2) holds with .

2. is an Erdos-Renyi graph with parameter , where is fixed. In this case we have

 maxi∈[n]di(Gn)npnp→1,2E(Gn)n2pnp→1

and so (1.2) holds with probability tending to for .

3. is a convergent sequence of dense graph converging to the graphon which is not (see  for a survey on the literature on graphons/graph limits). In this case we have

 maxi∈[n]di(Gn)≤n−1,2E(Gn)n2→∫[0,1]2W(x,y)dxdy,

and so (1.2) holds for all large with .

4. is a bi-regular bipartite graph with parameters defined as follows:

###### Definition 1.3.

has bipartition sets and , with sizes and respectively, and each vertex in has degree , and each vertex in has degree . Finally assume that

 limn→∞ann=p∈(0,1),

Note that the parameters are related, as we have , and .

In this case we have and , and so (1.2) holds for all large with .

We will now introduce the bivariate pseudo-likelihood estimator.

###### Definition 1.4.

For any we have

 Pn,β,B(Xi=1|Xj,j≠i)=eβmi(x)+Beβmi(x)+B+e−βmi(x)−B,

where . Define the pseudo-likelihood as the product of the one dimensional conditional distributions: (see [3, 4])

 n∏i=1Pn,β,B(Xi=xi|Xj,j≠i)=2−n % exp {n∑i=1(βximi(x)+Bxi−logcosh(βmi(x)+B))}

On taking and differentiating this with respect to we get the vector , where

 Qn(β,B|x):= n∑i=1mi(x)(xi−tanh(βmi(x)+B)), Rn(β,B|x):= n∑i=1(xi−tanh(βmi(x)+B)).

The bi-variate equation

 PLn(β,B|x):=(Qn(β,B|x),Rn(β,B|x))=(0,0)

will be referred to as the pseudo-likelihood equation in this paper. If the pseudo-likelihood equation has a unique root in , denote it by . This is the pseudo-likelihood estimator for the parameter vector .

###### Definition 1.5.

Suppose and

are two non-negative random variables on the probability space

where is the Ising p.m.f. given in (1.1). We will say if the sequence is tight. In particular this implies that We will say , if both and . We will say if .

###### Definition 1.6.

Let denote the set of all parameters such that .

Our first result gives a general upper bound on the error of the pseudo-likelihood estimator.

###### Theorem 1.7.

Suppose is an observation from the Ising model (1.1), where the coupling matrix satisfies (1.2) and (1.3), and . Set

 Tn(x):=1nn∑i=1(mi(x)−¯m(x))2.
1. The pseudo-likelihood estimator exists iff , where

 A1,n:= {x∈{−1,1}n:Tn(x)=0}, A2,n:= {x∈{−1,1}n:mi(x)xi=|mi(x)| for all i∈[n]}, A3,n:= {x∈{−1,1}n:mi(x)xi=−|mi(x)| for all i∈[n]}, A4,n:= {1,−1}.
2. If the true parameter is , then we have

 limn→∞Pn,β0,B0(Ac2,n∩Ac3,n∩Ac4,n)=1.
3. Further, if , then

 ||^βn−β0,^Bn−B0||=Op(1√nTn(X)).

In particular, is jointly consistent for in .

An immediate corollary of Theorem 1.7 is the following corollary.

###### Corollary 1.8.

In the setting of Theorem 1.7, if we further have

 Tn(X)=Θp(1), (1.4)

then under , i.e. the pseudo-likelihood estimator is jointly consistent.

Corollary 1.8 shows that (1.4) is a sufficient condition for consistency of the pseudo-likelihood estimate. Note that condition (1.4) is an implicit condition, and it is not clear when this will hold. We will now give an exact characterization for (1.4) in terms of the matrix for “mean field” matrices, introduced in the following definition.

###### Definition 1.9.

We say that a sequence of matrices satisfies the mean field condition, if we have

 limn→∞1nn∑i,j=1An(i,j)2=0. (1.5)

Condition 1.5 was first introduced in  to study the limiting behavior of normalizing constant of Ising and Potts models. In particular, if , where is the adjacency matrix of a graph, then (1.5) holds iff . Indeed, this is because

 n∑i,j=1An(i,j)2=n24E(Gn)2n∑i,j=1Gn(i,j)2=n24E(Gn),

which is iff . Thus (1.5) holds in the following examples:

1. is a regular graph with . In this case we have .

2. is an Erdos-Renyi graph with parameter . In this case we have .

3. is a convergent sequence of dense graph converging to the graphon which is not identically . In this case we have .

4. is a bi-regular bipartite graph with parameters as in Definition 1.5, such that . In this case we have , and so .

###### Definition 1.10.

Given the coupling matrix , let denote the row sum of .

Our next result now gives a simple sufficient condition for joint consistency of the pseudo-likelihood estimator.

###### Theorem 1.11.

Suppose is an observation from the Ising model (1.1), where the coupling matrix satisfies (1.2) and (1.5). If

 liminfn→∞1nn∑i=1(Rn(i)−¯¯¯¯¯Rn)2>0, (1.6)

then we have for all . Consequently we have .

Note that (1.6) and (1.2) together imply (1.3). Thus if is the scaled adjacency matrix of a graph with , the pseudo-likelihood is consistent whenever the graph is slightly irregular. In particular, the pseudo-likelihood is consistent in the following examples:

1. is a convergent sequence of dense graphs converging to the graphon such that the function is not constant almost surely Lebesgue measure. In this case we have

 limn→∞1n(Rn(i)−¯Rn)2=∫10(R(x)−∫10R(y)dy)2dx>0,

and so (1.6) holds. Also, as previously verified, (1.2) and (1.5) holds as well. Thus we have the pseudo-likelihood estimator is consistent on .

2. is a bi-regular bipartite graph with parameters as defined above, such that , and .

In this case (1.2) and (1.5) were verified before. Finally, we have for , and for . This gives

 1nn∑i=1(Rn(i)−¯Rn)2∼ann(12p−1)2+bnn(12(1−p)−1)2→(2p−1)24p(1−p)>0.

Thus we have the pseudo-likelihood estimator is consistent on . Note that if the graph is asymptotically regular.

This raises the natural question as to what happens for regular graphs. The following theorem addresses this question by showing that whenever the coupling matrix is mean field and asymptotically regular, the random variable is .

###### Theorem 1.12.

Suppose is an observation from the Ising model (1.1), where the coupling matrix satisfies (1.2) and (1.5). If

 limn→∞1nn∑i=1(Rn(i)−¯¯¯¯¯Rn)2=0, (1.7)

then we have for all .

Theorem 1.12 along with the upper bound of Theorem 1.7 together suggest that consistency may not be attained by the pseudo-likelihood estimator for asymptotically regular graphs with degree going to . The following theorem confirms this conjecture for the special case when is an Erdös-Renyi graph with parameter , free of .

###### Theorem 1.13.

Suppose is an observation from the Ising model (1.1), where the coupling matrix is , where is a random graph from , the Erdös-Renyi graph with parameter , free of . Let be fixed, and let

 Θt:={(θ,β)∈(0,∞)2:t=tanh(βt+B)}.

Let denote the joint law of and on . Then, setting to be product measure on under which , we have that is contiguous to for every . Consequently, under there does not exist any sequence of estimates (functions of ) which is consistent for in (and hence in ).

###### Remark 1.14.

It was pointed out in  that the MLE for doesn’t exist for the Curie Weiss model. The above Theorem extends this by showing that consistent estimates do not exist when the underlying graph is Erdös-Renyi. Note that if we set in the Erdös-Renyi model we get a complete graph on vertices, which corresponds to the Curie-Weiss model. We conjecture that there are no consistent estimates for both parameters whenever the graph sequence is regular with degree going to .

If the average degree of a graph sequence does not go to , joint estimation of both parameters at rate is always possible, as shown in the following theorem.

###### Theorem 1.15.

Suppose is an observation from the Ising model (1.1), where the coupling matrix satisfies (1.2) and

 liminfn→∞1nn∑i,j=1A2n(i,j)>0. (1.8)

Then, for any we have , i.e. the joint pseudo-likelihood estimator is consistent.

Note that (1.2) and (1.8) together imply (1.3). To see how (1.8) captures sparse graphs, recall that for any graph with adjacency matrix and we have Thus if (1.8) holds, then we must have , which says that the graph sequence is sparse. In particular Theorem 1.15 shows consistency when the underlying graph has a uniformly bounded degree sequence, irrespective of whether is regular or not.

To complete the picture, we show that if one of the two parameters are known, then the pseudo-likelihood estimator for the other parameter is consistent, for all . Thus joint estimation is indeed a much harder problem than estimation of the individual parameters. The proof of this proposition appears in the appendix.

###### Proposition 1.16.

Suppose is an observation from the Ising model (1.1), where the coupling matrix satisfies (1.2) and (1.3).

1. If is known, then the equation has a unique root which satisfies under .

2. If is known, then the equation has a unique root which satisfies under .

### 1.2 Interpretation of results for graphs

Even though all our results apply for general matrices with non-negative entries, the most interesting examples for our theorems are the cases when is the scaled adjacency matrix of a simple graph as in Definition 1.1. Also the conditions take a simpler form. This subsection describes all our results in this special case. Recall that are the degrees of , and let denote the average degree. Also assume that , as has been the case throughout the paper. Finally note that (1.2) is equivalent to , which we will assume throughout this subsection.

• For any graph , the pseudo-likelihood estimate of is consistent if is known, and vice versa (Proposition 1.16).

• If , and the graph is somewhat irregular as captured by the condition

 liminfn→∞1nn∑i=1[di(Gn)¯d(Gn)−1]2>0,

then the pseudo-likelihood estimator for is jointly consistent (Theorem 1.11).

• If and the graph is somewhat regular as captured by the condition

 limsupn→∞1nn∑i=1[di(Gn)¯d(Gn)−1]2=0,

then we believe that the pseudo-likelihood estimator for is not jointly consistent (Theorem 1.12). The only reason this statement is suggestive and not rigorous is that Theorem 1.7 only provides an upper bound and not a matching lower bound.

• For the particular case when is an Erdös-Renyi random graph with parameter free of , there are no estimators which are jointly consistent for (Theorem 1.13). Thus indeed the estimation problem is harder on asymptotically regular graphs with large degree.

• If is a graph with bounded, then the pseudo-likelihood estimator for is jointly consistent (Theorem 1.15), irrespective of whether is regular or not.

Figure 1 gives a gist of the above discussion on a summary tree.

### 1.3 Simulation

Our results demonstrate a dichotomy in the joint -consistency of based on whether the coupling matrix is approximately regular or not. In what follows, we address this dichotomy using simulation. At first, we fix different values of the pair on the line for . Next, we draw two random -regular graphs and with and , with nodes. For each value of , we generate a sample from the Ising model with scaled adjacency matrices for the graphs and . On each of those different samples, we estimate by solving the bivariate pseudolikelihood equation. We repeat the same experiment with number of nodes , and random -regular graphs (). In Figure 2, we plot the corresponding pseudolikelihood estimates of for and respectively. In both the figures, plots of the estimates for the case (resp. ) are colored in green (resp. red). Notice that the fit in the case when is more prominent in comparison to the case when . Figure 2: Plot of the pseudo-likelihood estimate (^βn,^Bn) for n=100 (on left) and n=200 (on right) respectively where (β,B) lie on the line m=tanh(mβ+B) (black line in the plot) for m=0.3.

The rest of the paper is outlined as follows: Section 2 details the proof of Theorem 1.7. Section 3 proves Theorem 1.11 and Theorem 1.12 with the help of Theorem 3.2, the proof of which is deferred to the appendix. Finally, section 4 gives the proof of Theorem 1.13 and Theorem 1.15. The proof of Proposition 1.16 is also deferred to the appendix.

## Acknowledgement

We thankfully acknowledge helpful discussions with Bhaswar B. Bhattyacharya and Sourav Chatterjee at various stages of this work.

## 2 Proof of Theorem 1.7

The following Lemma is a collection of estimates to be used throughout the rest of this paper.

###### Lemma 2.1.

Suppose is an observation from the Ising model (1.1), where the coupling matrix satisfies (1.2) and (1.5).

Then setting

 fn(x):=β2x′Anx+Bn∑i=1xi

and

 bi(x):=E(Xi|Xj=xj,j≠i)=tanh(βmi(x)+B),

the following hold:

 limsupn→∞1n2E[fn(X)−fn(b(X))]2=0, (2.1) limsupn→∞1nE[n∑i=1(Xi−bi(X))mi(X)]2<∞, (2.2) limsupn→∞1nE[n∑i=1(Xi−bi(X))]2<∞. (2.3)
###### Proof of Lemma 2.1.

Various versions of these estimates exist already in the literature. In particular, (2.1) follows on invoking [9, Lemma 3.1] or [2, Lemma 3.2] along with the assumption that satisfies (1.5), and (2.2) follows on invoking [9, Lemma 3.2] along with the assumption that satisfies (1.2). Finally, (2.3) follows as an easy consequence of [19, Lemma 1].

We also need the following lemma for proving Theorem 1.7 and Propositon 1.16. The proof of the lemma is deferred to the appendix.

###### Lemma 2.2.

Suppose is an observation from Ising model as in (1.1) such that (1.2) and (1.3) holds. If the true parameter is , then there exists such that

 limsupn→∞1nlogPn,β0,B0(|n∑i=1Ximi(X)|

### 2.1 Proof of Theorem 1.7

1. Setting

 ˜PLn(β,B|x):=n∑i=1(βximi(x)+Bxi−logcosh(βmi(x)+B)) (2.4)

note that Differentiating the function twice we get the negative Hessian matrix given by

 Hn(β,B|x)=[∑ni=1mi(x)2θi(β,B|x)∑ni=1mi(x)θi(β,B|x)∑ni=1mi(x)θi(β,B|x)∑ni=1θi(β,B|x)]. (2.5)

where . The determinant of the Hessian is given by

 [n∑i=1mi(x)2θi(β,B|x)]×[n∑i=1θi(β,B|x)]−[n∑i=1mi(x)θi(β,B|x)]2 = 12n∑i,j=1θi(β,B|x)θi(β,B|x)(mi(x)−mj(x))2 ≥ 12sech4(βγ+|B|)n∑i,j=1(mi(x)−mj(x))2=sech4(βγ+|B|)n2Tn(x),

which gives

 |Hn(β,B|x)|=λn(β,B|X)μn(x)≥sech4(βγ+|B|)n2Tn(x). (2.6)

Since on we have it follows that the Hessian is negative definite, and so the function is strictly concave. To show that there exists a global maximizer , it thus suffices to show that

 limβ→+±∞˜PLn(β,B|x)=−∞,limB→±∞˜PLn(β,B|x)=−∞.

To see this, note that implies there exists such that , and . Since we have

 ˜PLn(β,B|x)≤βximi(x)−log(eβmi(x)xi+B+e−βmi(x)+B),

on letting gives A similar argument shows that if , then Finally, it is immediate that for any . Thus there exists a unique global maximum for the function , and so is the unique root of .

We will now show that if for some , then the pseudo-likelihood estimator is not defined.

• On this set we have which implies for all . This implies that , and so the equation is equivalent to

 Rn(β,B|x)=0⇔¯x=tanh(β¯m(x)+B).

Since the function is convex, it follows that any satisfying this equation is a global maximizer, and hence in this case the set of maximizers is a line in the two dimensional plane and hence not unique. Thus the pseudo-likelihood estimator is not defined.

• On this set we have

 Qn(β,B|x)=n∑i=1|mi(x)|−n∑i=1mi(x)tanh(βmi(x)+B)>0,

and so the equation has no roots in , and so the pseudo-likelihood estimator is not defined.