Bahadur Efficiency in Tensor Curie-Weiss Models

The tensor Ising model is a discrete exponential family used for modeling binary data on networks with not just pairwise, but higher-order dependencies. In this exponential family, the sufficient statistic is a multi-linear form of degree p≥ 2, designed to capture p-fold interactions between the binary variables sitting on the nodes of a network. A particularly useful class of tensor Ising models are the tensor Curie-Weiss models, where one assumes that all p-tuples of nodes interact with the same intensity. Computing the maximum likelihood estimator (MLE) is computationally cumbersome in this model, due to the presence of an inexplicit normalizing constant in the likelihood, for which the standard alternative is to use the maximum pseudolikelihood estimator (MPLE). Both the MLE and the MPLE are consistent estimators of the natural parameter, provided the latter lies strictly above a certain threshold, which is slightly below log 2, and approaches log 2 as p increases. In this paper, we compute the Bahadur efficiencies of the MLE and the MPLE above the threshold, and derive the optimal sample size (number of nodes) needed for either of these tests to achieve significance. We show that the optimal sample size for the MPLE and the MLE agree if either p=2 or the null parameter is greater than or equal to log 2. On the other hand, if p≥ 3 and the null parameter lies strictly between the threshold and log 2, then the two differ for sufficiently large values of the alternative. In particular, for every fixed alternative above the threshold, the Bahadur asymptotic relative efficiency of the MLE with respect to the MPLE goes to ∞ as the null parameter approaches the threshold. We also provide graphical presentations of the exact numerical values of the theoretical optimal sample sizes in different settings.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 3

08/29/2020

Estimation in Tensor Ising Models

The p-tensor Ising model is a one-parameter discrete exponential family ...
08/27/2021

Limit theorems for dependent combinatorial data, with applications in statistical inference

The Ising model is a celebrated example of a Markov random field, introd...
05/07/2020

Phase Transitions of the Maximum Likelihood Estimates in the Tensor Curie-Weiss Model

The p-tensor Curie-Weiss model is a two-parameter discrete exponential f...
06/07/2021

A sparse p_0 model with covariates for directed networks

We are concerned here with unrestricted maximum likelihood estimation in...
05/14/2018

The Maximum Likelihood Threshold of a Path Diagram

Linear structural equation models postulate noisy linear relationships b...
03/04/2020

Adaptive exponential power distribution with moving estimator for nonstationary time series

While standard estimation assumes that all datapoints are from probabili...
11/07/2019

A Statistically Identifiable Model for Tensor-Valued Gaussian Random Variables

Real-world signals typically span across multiple dimensions, that is, t...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

With the ever increasing demand for modeling dependent network data in modern statistics, there has been a noticeable rise in the necessity for introducing appropriate statistical frameworks for modeling dependent data in the recent past. One such useful and mathematically tractable model which was originally coined by physicists for describing magnetic spins of particles, and later used by statisticians for modeling dependent binary data, is the Ising model [1]. It has found immense applications in diverse places such as image processing [26]

, neural networks

[25], spatial statistics [23], and disease mapping in epidemiology [24].

The Ising model is a discrete exponential family on the set of all binary tuples of a fixed length, with sufficient statistic given by a quadratic form, designed to capture pairwise dependence between the binary variables, arising from an underlying network structure. However, in most real-life scenarios, pairwise interactions are not enough to capture all the complex dependencies in a network data. For example, the behavior of an individual in a peer group depends not just on pairwise interactions, but is a more complex function of higher order interactions with colleagues. Similarly, in physics, it is known that the atoms on a crystal surface do not just interact in pairs, but in triangles, quadruples and higher order tuples. A useful framework for capturing such higher order dependencies is the -tensor Ising model [13], where the quadratic interaction term in the sufficient statistic is replaced by a multilinear polynomial of degree . Although constructing consistent estimates of the natural parameter in general -tensor Ising models is possible [13]

, more exact inferential tasks such as constructing confidence intervals and hypothesis testing is not possible, unless one imposes additional constraints on the underlying network structure. One such useful structural assumption is that all

-tuples of nodes in the underlying network interact, and that too with the same intensity. The corresponding model is called the -tensor Curie-Weiss model [12], which is a discrete exponential family on the hypercube

, with probability mass function given by:

(1)

Here is a normalizing constant required to ensure that , and . It is precisely this inexplicit normalizing constant , that hinders estimation of the parameter using the maximum likelihood approach. In this context, and also for more general Ising models, an alternative, computationally efficient algorithm was suggested by Chatterjee [11] for the case , which goes by the name of Maximum Pseudolikelihood (MPL) Estimation [16, 17], and is based on computing explicit conditional distributions. To elaborate, the MPL estimate is obtained by maximizing the pseudolikelihood function:

where is simulated from the model (1). The -consistency of the MPL estimator in the so called low temperature regime (high values of the parameter ) was established in [11], and later extended to tensor Ising models for in [13]. More precisely, it is shown in [13] and [12] that there exists , such that for all , both and are tight, where is the Maximum Likelihood (ML) estimator of . Further, consistent estimation (and consistent testing) is impossible in the regime . The exact asymptotics of both and for the model (1) for above the threshold , was worked out in [13]

, where it is shown that both these statistics converge in distribution to the same normal distribution (see Figure

1

). Consequently, both the ML and MPL estimates have the same asymptotic variance everywhere above the threshold, and in fact, both saturate the Cramer-Rao information lower bound in this regime. The goal of this paper is to calculate and compare another notion of effiency of estimators, called

Bahadur efficiency for the ML and MPL estimators in model (1).

Figure 1. Histogram of , where is the MPL estimator in the 4-tensor Curie-Weiss model at (above the estimation threshold) [13].

In his seminal paper [2], Bahadur introduced the concept of slope

of a test statistic to calculate the minimum sample size required to ensure its signifance at a given level. The setting considered in

[2] involved i.i.d. samples coming from a certain parametric family, and the goal was to detect the minimum sample size required, so that a test (function of the samples) becomes (and remains) significant at level for all , i.e. the -value corresponding to becomes (and remains) bounded by for all

. If one considers testing a simple null hypothesis

, then the above discussion may be quantified by defining:

where and

is the cumulative distribution function of

under . The -value typically converges to exponentially fast with probability under alternatives for , and this rate is often an indication of the asymptotic efficiency of against [3, 4, 5, 6, 7]. In particular, if under we have the following with probability :

(2)

then one can easily verify that (see Proposition 8 in [2]) as . is called the Bahadur slope of at . However, as mentioned in [2], it is in general a non-trivial problem to determine the existence of the Bahadur slope in (2), and to evaluate it. This issue is addressed in two steps in [2], where it is shown that if satisfies the following two conditions:

  1. For every alternative , as under with probability , for some parametric function defined on the alternative space,

  2. as for every in an open interval which includes each value of , where is a continuous function on the interval, with ,

then the Bahadur slope exists for every alternative , and is given by (see [2]). In this context, let us mention that if the convergence (2) holds in probability, then is called the weak Bahadur slope of (see [8]). Finally, if we have two competiting estimators and estimating the same parameter , then the Bahadur asymptotic relative efficiency (ARE) is given by the ratio of their Bahadur slopes (see [8]):

In this paper, we will derive the weak Bahadur slopes of both and in the tensor Curie-Weiss model 1. This will in turn, enable one to compute the Bahadur relative efficiency between any of these two estimators with some other reference estimator. Similar results have been derived in [14] in the context of Markov random fields on lattices, and in [22] in the context of -dimensional nearest neighbor isotropic Ising models, but to the best of our knowledge, this is the first such work on tensor Curie-Weiss models. Our basic tool will be some recent results on large deviation of the average magnetization in the Curie-Weiss model, established in [9] and [10]. Also, throughout the rest of the paper, we will view the entries of the tuple as dependent samples, and refer to the length of as the sample size (although technically speaking, we have just one multivariate sample from the model (1)). One of our most interesting findings is that for , the Bahadur slopes and optimal sample sizes for the tests based on the MPL and ML estimators do not agree for each value of the null parameter in , provided the alternative parameter is sufficiently large. This loss of Bahadur efficiency for the MPL estimator near threshold, can be attributed to its functional form, which derives false signal from a regime where the average magnetization is very close to .

The rest of the paper is organized as follows. In Section 2, we derive the Bahadur slopes and the optimal sample sizes for the tests based on the ML and the MPL estimators. In Section 3, we provide numerical illustrations of our theoretical findings in various settings. Finally, proofs of some technical results needed for showing the main theorem (Theorem 1) in Section 2, are given in the appendix.

2. Theoretical Results

In this section, we derive the weak Bahadur slopes of the MPL and ML estimators of in the model 1. The ML estimator does not have an explicit form, but it is shown in [13] that the MPL estimator is given by:

Furthermore, it is shown in [12] and [13] that both the ML and MPL estimators have the same asymptotic normal distribution:

for all , where is either or ,

(3)

is the unique positive global maximizer of , and

A few initial values of the threshold are and . The exact value of is in general inexplicit, but as (see Lemma A.1 in [13]).

Figure 2. Optimal sample size for the tests based on MLE and MPLE with varying (with logarithmic vertical scale).
Figure 3. Optimal sample size for the tests based on MLE and MPLE with varying .
Figure 4. Optimal sample size for the tests based on MLE and MPLE with varying .

In this paper, we will consider testing the hypothesis

for some known . The most powerful test is based on the sufficient statistic , and its asymptotic power is derived in [18]. Clearly, one can think of using the statistic for testing the above hypotheses, where is either or , and large values of will denote significance. The main goal of this section is to prove conditions (1) and (2) in [2] for deriving the exact Bahadur slope of . We begin with the proof of condition (1) with almost sure convergence replaced by convergence in probability.

Lemma 1.

Under every , we have:

where is either or .

Proof.

Note that , where is either or . It follows from [12] and [13], that under every , . This proves Lemma 1

We will now prove condition 2. For this, we will need the following lemma on the large deviation of , which follows from [10].

Lemma 2.

For every subset such that is dense in , we have:

where is as defined in (3).

Proof.

It follows from display (18) in [10] that satisfies a large deviation principle (LDP) with rate function:

Using the identity

we have:

It now follows from Lemma 3 that

(4)

Lemma 2 now follows from (4) and the fact that is a continuous function on . ∎

We now state and prove the main result in this paper about the Bahadur slope of the test based on the MPL and ML estimators, and the minimum sample size required to ensure its significance. Towards this, we define a function as:

Theorem 1.

The Bahadur slopes of and for the model (1) at an alternative are respectively given by:

Consequently, the minimum sample sizes required, so that the tests and become (and remain) significant at level , are respectively given by:

Proof.

We deal with the test based on the MPL estimator first. To begin with, note that . Fix , whence we have by Lemma 2:

where the last step follows from Lemma 2, since it follows from the proof of Lemma 5, that the set is a union of finitely many disjoint, non-degenerate intervals, and hence, its interior is dense in its closure.

In view of the above discussion, we conclude that the function in condition (2) is given by:

Since and are continuous functions (by Lemma 6) on and respectively, we conclude (in view of Lemma 5) that is continuous on an open neighborhood of . Also, in view of Lemma 5, the argument given below for the ML estimator, and the fact that , it will follow that on a non-empty open neighborhood of .

The Bahadur slope of at an alternative is then given by . This completes the proof of Theorem 1 for the test based on the MPL estimator.

For the test based on the ML estimator, note that for every , we have:

The last step follows from the following facts:

  1. The function is strictly convex in (Lemma C.5 in [12]) and hence, is strictly increasing in ;

  2. The ML equation is given by

  3. .

Now, it follows from [12] and the dominated convergence theorem, that

Fix , to begin with. Then, there exists , such that

for all . Let us first consider the case

is odd. We then have the following by Lemma

2:

Similarly, we also have:

Since can be arbitrarily small, and is continuous, we must have for all odd :

(5)

Next, suppose that is even. In this case, and have the same distribution, and hence, so do and . Hence, for every positive real number , we have:

Hence, the same argument as for the case of odd also works here, showing that (5) holds when is even, too.

In view of the above discussion, we conclude that the function in condition (2) is given by:

Since and are continuous functions (by Lemma 6) on and respectively, we conclude that is continuous on an open neighborhood of . Also, (and hence, on a non-empty open neighborhood of ), since is strictly deacreasing on , and (by Lemma 6).

The Bahadur slope of at an alternative is thus given by . This completes the proof of Theorem 1 for the test based on the ML estimator. The proof of Theorem 1 is now complete. ∎

The following result compares that Bahadur slopes and the optimal sample sizes for the tests based on the MPL and the ML estimators.

Figure 5. Optimal sample size for the tests based on MLE and MPLE with varying (with logarithmic vertical scale).
Figure 6. Optimal sample size for the tests based on MLE and MPLE with varying (with logarithmic vertical scale).
Corollary 1.

For every and , we have

For , and , we have the following:

(6)

A sufficient condition for (6) to hold for all , is . On the other hand, for every , for all large enough, in which case, the Bahadur slopes and optimal sample sizes for the tests based on the MPL and ML estimators do not agree. Further, for every and every fixed , we have:

(7)
Proof.

The result for follows directly from Lemma 5. For , it follows from Lemma 4 that on , and hence, on . Consequently,

(8)

(6) now follows from Lemma 5. Now, it follows from (8) that

This shows that the condition is sufficient to ensure equality of the Bahadur slopes and the optimal sample sizes. On the other hand, if , then . Since (by Lemma 6), we must have for all large enough, which shows, in view of (6), that the Bahadur slopes and optimal sample sizes for the tests based on the MPL and ML estimators do not agree in this case.

Finally, towards proving (7), note that

On the other hand, we also have:

Hence,

Corollary 1 now follows from Theorem 1 and Lemma 5. ∎

Figure 7. Optimal sample size for the tests based on MLE and MPLE with varying (with logarithmic vertical scale).

Note that for and fixed , the set of all satisfying equality of the Bahadur slopes and the optimal sample sizes for the ML and MPL estimates, is given by:

where . The reason behind the discrepancy between the efficiencies of the ML and MPL estimators near the threshold, is the functional form of the latter. For , unlike the ML estimator, the MPL estimator takes very high values if the average magnetization is close to . This false signal coming from the average magnetization lying in a region very close to , leads to an increase in the null probability of the MPL estimator exceeding the observed MPL estimate, thereby inflating its -value. This inflation occurs only in a close neighborhood of the threshold, because for lower values of the parameter , there is a higher probability that the average magnetization is small.

It follows from Lemma C.12 in [12] that

Since eventually becomes as approaches from the right, the rate at which approaches as for , is determined just by the term in the denominator of the formula for , and is given by .

It follows from the proof of Corollary 1 that for a fixed ,

Another interesting phenomenon is that for fixed , is a non-zero constant for for some . This implies that as long as , once the separation between and exceeds a certain finite value, the optimal sample size requirement for the MPL estimator does not decrease with further increase in the separation, unlike , which is an undesirable property of