1 Introduction
Suppose are unknown parameters, and is an symmetric matrix with nonnegative entries with on the diagonal. For , define a p.m.f. by setting
(1.1) 
This is the Ising model with coupling matrix , and inverse temperature parameter and magnetization parameter
. Study of Ising models is a growing area which has received significant attention in Statistics and Machine Learning in recent years. The theoretical investigation into the properties of Ising models can be broadly classified into two categories. One of the branches assumes that the matrix
is the unknown parameter of interest, and focuses on estimating under the assumption that i.i.d. copies are available from the model described in 1.1 (c.f. [1, 7, 21, 22] and references therein). Another branch works under the assumption that only one observation is available from the model in (1.1) (c.f. [5, 8, 11, 15, 17] and references therein). In this setting, estimation of the whole matrix (which hasentries) is impossible from a vector
of size . As such, the standard assumption is that the matrix is completely specified, and the focus is on estimating the parameters . In this direction, the behavior of the MLE for the CurieWeiss model (when is the scaled adjacency matrix of the complete graph) was studied in [11], where the authors showed that in the regime , the MLE of is consistent for if is known, and vice versa. They also show that if bothare unknown, then the joint MLE for the model does not exist with probability 1. This raises the natural question as to whether there are other estimators which work in this case. Focusing on the case when
is known, [8] gave general sufficient conditions under which the pseudolikelihood estimate for is consistent. Developing on this approach, [5] studies the behavior of the rate of consistency of the pseudolikelihood estimator at all values of, demonstrating interesting phase transition properties in the rate of the pseudolikelihood estimator. The question of joint estimation of
for a general matrix was raised in [8]. Up to the best of the our knowledge, this question has not been addressed in the literature. This will be the focus of this paper.1.1 Main results
Throughout this paper we will assume that are unknown parameters of interest, and the coupling matrix has nonnegative entries and is completely known. We will also assume the following two conditions
(1.2)  
(1.3) 
Here is a finite constant free of . Note that (1.2) implies that satisfies [2, Eqn (1.10)], as well as ([8, Cond (a),Thm 1.1]), where is the operator norm of . The most common examples of the coupling matrix are scaled adjacency matrices of labelled simple graphs, defined as follows:
Definition 1.1.
For a graph with vertices labelled by , define the coupling matrix by setting
where is the number of edges in the graph .
The scaling of the adjacency matrix by ensures that the resulting Ising model has nontrivial phase transition properties (see for e.g. [2]), which is of much interest in Statistical Physics and Applied Probability. The influence of phase transition on Inference has received recent attention (c.f. [5, 19]). Under this scaling, (1.3) holds trivially, as . Condition (1.2) demands that the maximum degree of is of the same order as the average degree. Below we give examples of some graphs for which (1.2) holds. We will also use these as running examples for all our future assumptions.
Definition 1.2.
For a simple graph on vertices, let denote the labelled degrees of .

is a regular graph for some . In this case we have for all , and so (1.2) holds with .

is an ErdosRenyi graph with parameter , where is fixed. In this case we have
and so (1.2) holds with probability tending to for .

is a biregular bipartite graph with parameters defined as follows:
Definition 1.3.
has bipartition sets and , with sizes and respectively, and each vertex in has degree , and each vertex in has degree . Finally assume that
Note that the parameters are related, as we have , and .
In this case we have and , and so (1.2) holds for all large with .
We will now introduce the bivariate pseudolikelihood estimator.
Definition 1.4.
For any we have
where . Define the pseudolikelihood as the product of the one dimensional conditional distributions: (see [3, 4])
On taking and differentiating this with respect to we get the vector , where
The bivariate equation
will be referred to as the pseudolikelihood equation in this paper. If the pseudolikelihood equation has a unique root in , denote it by . This is the pseudolikelihood estimator for the parameter vector .
Definition 1.5.
Suppose and
are two nonnegative random variables on the probability space
where is the Ising p.m.f. given in (1.1). We will say if the sequence is tight. In particular this implies that We will say , if both and . We will say if .Definition 1.6.
Let denote the set of all parameters such that .
Our first result gives a general upper bound on the error of the pseudolikelihood estimator.
Theorem 1.7.
An immediate corollary of Theorem 1.7 is the following corollary.
Corollary 1.8.
In the setting of Theorem 1.7, if we further have
(1.4) 
then under , i.e. the pseudolikelihood estimator is jointly consistent.
Corollary 1.8 shows that (1.4) is a sufficient condition for consistency of the pseudolikelihood estimate. Note that condition (1.4) is an implicit condition, and it is not clear when this will hold. We will now give an exact characterization for (1.4) in terms of the matrix for “mean field” matrices, introduced in the following definition.
Definition 1.9.
We say that a sequence of matrices satisfies the mean field condition, if we have
(1.5) 
Condition 1.5 was first introduced in [2] to study the limiting behavior of normalizing constant of Ising and Potts models. In particular, if , where is the adjacency matrix of a graph, then (1.5) holds iff . Indeed, this is because
which is iff . Thus (1.5) holds in the following examples:

is a regular graph with . In this case we have .

is an ErdosRenyi graph with parameter . In this case we have .

is a convergent sequence of dense graph converging to the graphon which is not identically . In this case we have .

is a biregular bipartite graph with parameters as in Definition 1.5, such that . In this case we have , and so .
Definition 1.10.
Given the coupling matrix , let denote the row sum of .
Our next result now gives a simple sufficient condition for joint consistency of the pseudolikelihood estimator.
Theorem 1.11.
Note that (1.6) and (1.2) together imply (1.3). Thus if is the scaled adjacency matrix of a graph with , the pseudolikelihood is consistent whenever the graph is slightly irregular. In particular, the pseudolikelihood is consistent in the following examples:

is a convergent sequence of dense graphs converging to the graphon such that the function is not constant almost surely Lebesgue measure. In this case we have
and so (1.6) holds. Also, as previously verified, (1.2) and (1.5) holds as well. Thus we have the pseudolikelihood estimator is consistent on .

is a biregular bipartite graph with parameters as defined above, such that , and .
This raises the natural question as to what happens for regular graphs. The following theorem addresses this question by showing that whenever the coupling matrix is mean field and asymptotically regular, the random variable is .
Theorem 1.12.
Theorem 1.12 along with the upper bound of Theorem 1.7 together suggest that consistency may not be attained by the pseudolikelihood estimator for asymptotically regular graphs with degree going to . The following theorem confirms this conjecture for the special case when is an ErdösRenyi graph with parameter , free of .
Theorem 1.13.
Suppose is an observation from the Ising model (1.1), where the coupling matrix is , where is a random graph from , the ErdösRenyi graph with parameter , free of . Let be fixed, and let
Let denote the joint law of and on . Then, setting to be product measure on under which , we have that is contiguous to for every . Consequently, under there does not exist any sequence of estimates (functions of ) which is consistent for in (and hence in ).
Remark 1.14.
It was pointed out in [11] that the MLE for doesn’t exist for the Curie Weiss model. The above Theorem extends this by showing that consistent estimates do not exist when the underlying graph is ErdösRenyi. Note that if we set in the ErdösRenyi model we get a complete graph on vertices, which corresponds to the CurieWeiss model. We conjecture that there are no consistent estimates for both parameters whenever the graph sequence is regular with degree going to .
If the average degree of a graph sequence does not go to , joint estimation of both parameters at rate is always possible, as shown in the following theorem.
Theorem 1.15.
Note that (1.2) and (1.8) together imply (1.3). To see how (1.8) captures sparse graphs, recall that for any graph with adjacency matrix and we have
Thus if (1.8) holds, then we must have , which says that the graph sequence is sparse. In particular Theorem 1.15 shows consistency when the underlying graph has a uniformly bounded degree sequence, irrespective of whether is regular or not.
To complete the picture, we show that if one of the two parameters are known, then the pseudolikelihood estimator for the other parameter is consistent, for all . Thus joint estimation is indeed a much harder problem than estimation of the individual parameters. The proof of this proposition appears in the appendix.
1.2 Interpretation of results for graphs
Even though all our results apply for general matrices with nonnegative entries, the most interesting examples for our theorems are the cases when is the scaled adjacency matrix of a simple graph as in Definition 1.1. Also the conditions take a simpler form. This subsection describes all our results in this special case. Recall that are the degrees of , and let denote the average degree. Also assume that , as has been the case throughout the paper. Finally note that (1.2) is equivalent to , which we will assume throughout this subsection.

For any graph , the pseudolikelihood estimate of is consistent if is known, and vice versa (Proposition 1.16).

If , and the graph is somewhat irregular as captured by the condition
then the pseudolikelihood estimator for is jointly consistent (Theorem 1.11).

If and the graph is somewhat regular as captured by the condition
then we believe that the pseudolikelihood estimator for is not jointly consistent (Theorem 1.12). The only reason this statement is suggestive and not rigorous is that Theorem 1.7 only provides an upper bound and not a matching lower bound.

For the particular case when is an ErdösRenyi random graph with parameter free of , there are no estimators which are jointly consistent for (Theorem 1.13). Thus indeed the estimation problem is harder on asymptotically regular graphs with large degree.

If is a graph with bounded, then the pseudolikelihood estimator for is jointly consistent (Theorem 1.15), irrespective of whether is regular or not.
Figure 1 gives a gist of the above discussion on a summary tree.
1.3 Simulation
Our results demonstrate a dichotomy in the joint consistency of based on whether the coupling matrix is approximately regular or not. In what follows, we address this dichotomy using simulation. At first, we fix different values of the pair on the line for . Next, we draw two random regular graphs and with and , with nodes. For each value of , we generate a sample from the Ising model with scaled adjacency matrices for the graphs and . On each of those different samples, we estimate by solving the bivariate pseudolikelihood equation. We repeat the same experiment with number of nodes , and random regular graphs (). In Figure 2, we plot the corresponding pseudolikelihood estimates of for and respectively. In both the figures, plots of the estimates for the case (resp. ) are colored in green (resp. red). Notice that the fit in the case when is more prominent in comparison to the case when .
The rest of the paper is outlined as follows: Section 2 details the proof of Theorem 1.7. Section 3 proves Theorem 1.11 and Theorem 1.12 with the help of Theorem 3.2, the proof of which is deferred to the appendix. Finally, section 4 gives the proof of Theorem 1.13 and Theorem 1.15. The proof of Proposition 1.16 is also deferred to the appendix.
Acknowledgement
We thankfully acknowledge helpful discussions with Bhaswar B. Bhattyacharya and Sourav Chatterjee at various stages of this work.
2 Proof of Theorem 1.7
The following Lemma is a collection of estimates to be used throughout the rest of this paper.
Lemma 2.1.
Suppose is an observation from the Ising model (1.1), where the coupling matrix satisfies (1.2) and (1.5).
Then setting
and
the following hold:
(2.1)  
(2.2)  
(2.3) 
Proof of Lemma 2.1.
Various versions of these estimates exist already in the literature. In particular, (2.1) follows on invoking [9, Lemma 3.1] or [2, Lemma 3.2] along with the assumption that satisfies (1.5), and (2.2) follows on invoking [9, Lemma 3.2] along with the assumption that satisfies (1.2). Finally, (2.3) follows as an easy consequence of [19, Lemma 1].
∎
We also need the following lemma for proving Theorem 1.7 and Propositon 1.16. The proof of the lemma is deferred to the appendix.
Lemma 2.2.
2.1 Proof of Theorem 1.7

Setting
(2.4) note that Differentiating the function twice we get the negative Hessian matrix given by
(2.5) where . The determinant of the Hessian is given by
which gives
(2.6) Since on we have it follows that the Hessian is negative definite, and so the function is strictly concave. To show that there exists a global maximizer , it thus suffices to show that
To see this, note that implies there exists such that , and . Since we have
on letting gives A similar argument shows that if , then Finally, it is immediate that for any . Thus there exists a unique global maximum for the function , and so is the unique root of .
We will now show that if for some , then the pseudolikelihood estimator is not defined.

On this set we have which implies for all . This implies that , and so the equation is equivalent to
Since the function is convex, it follows that any satisfying this equation is a global maximizer, and hence in this case the set of maximizers is a line in the two dimensional plane and hence not unique. Thus the pseudolikelihood estimator is not defined.

On this set we have
and so the equation has no roots in , and so the pseudolikelihood estimator is not defined.

Similarly, on this set
