1. Introduction
With the ever increasing demand for modeling dependent network data in modern statistics, there has been a noticeable rise in the necessity for introducing appropriate statistical frameworks for modeling dependent data in the recent past. One such useful and mathematically tractable model which was originally coined by physicists for describing magnetic spins of particles, and later used by statisticians for modeling dependent binary data, is the Ising model [1]. It has found immense applications in diverse places such as image processing [26]
[25], spatial statistics [23], and disease mapping in epidemiology [24].The Ising model is a discrete exponential family on the set of all binary tuples of a fixed length, with sufficient statistic given by a quadratic form, designed to capture pairwise dependence between the binary variables, arising from an underlying network structure. However, in most reallife scenarios, pairwise interactions are not enough to capture all the complex dependencies in a network data. For example, the behavior of an individual in a peer group depends not just on pairwise interactions, but is a more complex function of higher order interactions with colleagues. Similarly, in physics, it is known that the atoms on a crystal surface do not just interact in pairs, but in triangles, quadruples and higher order tuples. A useful framework for capturing such higher order dependencies is the tensor Ising model [13], where the quadratic interaction term in the sufficient statistic is replaced by a multilinear polynomial of degree . Although constructing consistent estimates of the natural parameter in general tensor Ising models is possible [13]
, more exact inferential tasks such as constructing confidence intervals and hypothesis testing is not possible, unless one imposes additional constraints on the underlying network structure. One such useful structural assumption is that all
tuples of nodes in the underlying network interact, and that too with the same intensity. The corresponding model is called the tensor CurieWeiss model [12], which is a discrete exponential family on the hypercube, with probability mass function given by:
(1) 
Here is a normalizing constant required to ensure that , and . It is precisely this inexplicit normalizing constant , that hinders estimation of the parameter using the maximum likelihood approach. In this context, and also for more general Ising models, an alternative, computationally efficient algorithm was suggested by Chatterjee [11] for the case , which goes by the name of Maximum Pseudolikelihood (MPL) Estimation [16, 17], and is based on computing explicit conditional distributions. To elaborate, the MPL estimate is obtained by maximizing the pseudolikelihood function:
where is simulated from the model (1). The consistency of the MPL estimator in the so called low temperature regime (high values of the parameter ) was established in [11], and later extended to tensor Ising models for in [13]. More precisely, it is shown in [13] and [12] that there exists , such that for all , both and are tight, where is the Maximum Likelihood (ML) estimator of . Further, consistent estimation (and consistent testing) is impossible in the regime . The exact asymptotics of both and for the model (1) for above the threshold , was worked out in [13]
, where it is shown that both these statistics converge in distribution to the same normal distribution (see Figure
1). Consequently, both the ML and MPL estimates have the same asymptotic variance everywhere above the threshold, and in fact, both saturate the CramerRao information lower bound in this regime. The goal of this paper is to calculate and compare another notion of effiency of estimators, called
Bahadur efficiency for the ML and MPL estimators in model (1).In his seminal paper [2], Bahadur introduced the concept of slope
of a test statistic to calculate the minimum sample size required to ensure its signifance at a given level. The setting considered in
[2] involved i.i.d. samples coming from a certain parametric family, and the goal was to detect the minimum sample size required, so that a test (function of the samples) becomes (and remains) significant at level for all , i.e. the value corresponding to becomes (and remains) bounded by for all. If one considers testing a simple null hypothesis
, then the above discussion may be quantified by defining:where and
is the cumulative distribution function of
under . The value typically converges to exponentially fast with probability under alternatives for , and this rate is often an indication of the asymptotic efficiency of against [3, 4, 5, 6, 7]. In particular, if under we have the following with probability :(2) 
then one can easily verify that (see Proposition 8 in [2]) as . is called the Bahadur slope of at . However, as mentioned in [2], it is in general a nontrivial problem to determine the existence of the Bahadur slope in (2), and to evaluate it. This issue is addressed in two steps in [2], where it is shown that if satisfies the following two conditions:

For every alternative , as under with probability , for some parametric function defined on the alternative space,

as for every in an open interval which includes each value of , where is a continuous function on the interval, with ,
then the Bahadur slope exists for every alternative , and is given by (see [2]). In this context, let us mention that if the convergence (2) holds in probability, then is called the weak Bahadur slope of (see [8]). Finally, if we have two competiting estimators and estimating the same parameter , then the Bahadur asymptotic relative efficiency (ARE) is given by the ratio of their Bahadur slopes (see [8]):
In this paper, we will derive the weak Bahadur slopes of both and in the tensor CurieWeiss model 1. This will in turn, enable one to compute the Bahadur relative efficiency between any of these two estimators with some other reference estimator. Similar results have been derived in [14] in the context of Markov random fields on lattices, and in [22] in the context of dimensional nearest neighbor isotropic Ising models, but to the best of our knowledge, this is the first such work on tensor CurieWeiss models. Our basic tool will be some recent results on large deviation of the average magnetization in the CurieWeiss model, established in [9] and [10]. Also, throughout the rest of the paper, we will view the entries of the tuple as dependent samples, and refer to the length of as the sample size (although technically speaking, we have just one multivariate sample from the model (1)). One of our most interesting findings is that for , the Bahadur slopes and optimal sample sizes for the tests based on the MPL and ML estimators do not agree for each value of the null parameter in , provided the alternative parameter is sufficiently large. This loss of Bahadur efficiency for the MPL estimator near threshold, can be attributed to its functional form, which derives false signal from a regime where the average magnetization is very close to .
The rest of the paper is organized as follows. In Section 2, we derive the Bahadur slopes and the optimal sample sizes for the tests based on the ML and the MPL estimators. In Section 3, we provide numerical illustrations of our theoretical findings in various settings. Finally, proofs of some technical results needed for showing the main theorem (Theorem 1) in Section 2, are given in the appendix.
2. Theoretical Results
In this section, we derive the weak Bahadur slopes of the MPL and ML estimators of in the model 1. The ML estimator does not have an explicit form, but it is shown in [13] that the MPL estimator is given by:
Furthermore, it is shown in [12] and [13] that both the ML and MPL estimators have the same asymptotic normal distribution:
for all , where is either or ,
(3) 
is the unique positive global maximizer of , and
A few initial values of the threshold are and . The exact value of is in general inexplicit, but as (see Lemma A.1 in [13]).
In this paper, we will consider testing the hypothesis
for some known . The most powerful test is based on the sufficient statistic , and its asymptotic power is derived in [18]. Clearly, one can think of using the statistic for testing the above hypotheses, where is either or , and large values of will denote significance. The main goal of this section is to prove conditions (1) and (2) in [2] for deriving the exact Bahadur slope of . We begin with the proof of condition (1) with almost sure convergence replaced by convergence in probability.
Lemma 1.
Under every , we have:
where is either or .
Proof.
Note that , where is either or . It follows from [12] and [13], that under every , . This proves Lemma 1
∎
We will now prove condition 2. For this, we will need the following lemma on the large deviation of , which follows from [10].
Lemma 2.
Proof.
We now state and prove the main result in this paper about the Bahadur slope of the test based on the MPL and ML estimators, and the minimum sample size required to ensure its significance. Towards this, we define a function as:
Theorem 1.
The Bahadur slopes of and for the model (1) at an alternative are respectively given by:
Consequently, the minimum sample sizes required, so that the tests and become (and remain) significant at level , are respectively given by:
Proof.
We deal with the test based on the MPL estimator first. To begin with, note that . Fix , whence we have by Lemma 2:
where the last step follows from Lemma 2, since it follows from the proof of Lemma 5, that the set is a union of finitely many disjoint, nondegenerate intervals, and hence, its interior is dense in its closure.
In view of the above discussion, we conclude that the function in condition (2) is given by:
Since and are continuous functions (by Lemma 6) on and respectively, we conclude (in view of Lemma 5) that is continuous on an open neighborhood of . Also, in view of Lemma 5, the argument given below for the ML estimator, and the fact that , it will follow that on a nonempty open neighborhood of .
The Bahadur slope of at an alternative is then given by . This completes the proof of Theorem 1 for the test based on the MPL estimator.
For the test based on the ML estimator, note that for every , we have:
The last step follows from the following facts:

The function is strictly convex in (Lemma C.5 in [12]) and hence, is strictly increasing in ;

The ML equation is given by

.
Now, it follows from [12] and the dominated convergence theorem, that
Fix , to begin with. Then, there exists , such that
for all . Let us first consider the case
is odd. We then have the following by Lemma
2:Similarly, we also have:
Since can be arbitrarily small, and is continuous, we must have for all odd :
(5) 
Next, suppose that is even. In this case, and have the same distribution, and hence, so do and . Hence, for every positive real number , we have:
Hence, the same argument as for the case of odd also works here, showing that (5) holds when is even, too.
In view of the above discussion, we conclude that the function in condition (2) is given by:
The following result compares that Bahadur slopes and the optimal sample sizes for the tests based on the MPL and the ML estimators.
Corollary 1.
For every and , we have
For , and , we have the following:
(6) 
A sufficient condition for (6) to hold for all , is . On the other hand, for every , for all large enough, in which case, the Bahadur slopes and optimal sample sizes for the tests based on the MPL and ML estimators do not agree. Further, for every and every fixed , we have:
(7) 
Proof.
The result for follows directly from Lemma 5. For , it follows from Lemma 4 that on , and hence, on . Consequently,
(8) 
(6) now follows from Lemma 5. Now, it follows from (8) that
This shows that the condition is sufficient to ensure equality of the Bahadur slopes and the optimal sample sizes. On the other hand, if , then . Since (by Lemma 6), we must have for all large enough, which shows, in view of (6), that the Bahadur slopes and optimal sample sizes for the tests based on the MPL and ML estimators do not agree in this case.
Note that for and fixed , the set of all satisfying equality of the Bahadur slopes and the optimal sample sizes for the ML and MPL estimates, is given by:
where . The reason behind the discrepancy between the efficiencies of the ML and MPL estimators near the threshold, is the functional form of the latter. For , unlike the ML estimator, the MPL estimator takes very high values if the average magnetization is close to . This false signal coming from the average magnetization lying in a region very close to , leads to an increase in the null probability of the MPL estimator exceeding the observed MPL estimate, thereby inflating its value. This inflation occurs only in a close neighborhood of the threshold, because for lower values of the parameter , there is a higher probability that the average magnetization is small.
It follows from Lemma C.12 in [12] that
Since eventually becomes as approaches from the right, the rate at which approaches as for , is determined just by the term in the denominator of the formula for , and is given by .
It follows from the proof of Corollary 1 that for a fixed ,
Another interesting phenomenon is that for fixed , is a nonzero constant for for some . This implies that as long as , once the separation between and exceeds a certain finite value, the optimal sample size requirement for the MPL estimator does not decrease with further increase in the separation, unlike , which is an undesirable property of
Comments
There are no comments yet.