Graphical models are a powerful tool in high dimensional statistical inference. The graph structure of a graphical model gives a simple way to visualize the dependency among the variables in multivariate random vectors. The analysis of graph structures plays a fundamental role in a wide variety of applications, including information retrieval, bioinformatics, image processing and social networks(Besag, 1993; Durbin et al., 1998; Wasserman and Faust, 1994; Grabowski and Kosiński, 2006)
. Motivated by these applications, theoretical results on graph estimation(Meinshausen and Bühlmann, 2006; Liu et al., 2009; Montanari and Pereira, 2009; Ravikumar et al., 2011; Cai et al., 2011), single edge inference (Jankova et al., 2015; Ren et al., 2015; Neykov et al., 2015; Gu et al., 2015) and combinatorial inference (Neykov et al., 2016; Neykov and Liu, 2017) have been studied in the literature.
In this paper we are concerned with the distinct problem of structure detection
. In structure detection problems one is interested in testing whether the underlying graph is empty, (i.e., the random variables are independent) versus the alternative that the graph contains a subgraph of a certain structure. A variety of detection problems have been previously considered in the literature(see for example Addario-Berry et al., 2010; Arias-Castro et al., 2012, 2015b, 2015a). These works mainly focus on covariance or precision matrix detection problems and establish minimax lower and upper bounds.
While covariance and precision matrix detection problems are inherently related to the Gaussian graphical model, in this paper we focus on detection problems under the zero-field ferromagnetic Ising model
. The Ising model is a probability model for binary data originally developed in statistical mechanics(Ising, 1925) and has wide range of modern applications including image processing (Geman and Geman, 1984), social networks and bioinformatics (Ahmed and Xing, 2009). Below we formally introduce the model and problems of interest.
Zero-field ferromagnetic Ising model. Under a zero-field Ising model, the binary vector follows a distribution with probability mass function given by
where is a symmetric interaction matrix with zero diagonal entries and is the partition function defined as
The non-zero elements of the symmetric matrix specify a graph with vertex set and edge set . We will refer to the graph as whenever it is clear what the underlying matrix is. It is not hard to check that by the definition of , the vector is Markov with respect to , that is, each two elements and are independent given the remaining values of if and only if .
Here, the term zero-field specifies that there is no external magnetic field affecting the system, meaning that the energy function consists purely the terms of degree (i.e., there are no main effects). In this paper, we further focus on zero-field ferromagnetic models, where we also assume that , . In addition, our analysis is under the high-temperature setting, where the magnitudes of ’s are under a certain level. More specifically, throughout this paper we assume that , where is the Frobenius norm of .
Structure detection problems. As described in the previous paragraph, a zero-field ferromagnetic Ising model specifies a graph . In a structure detection problem, we are interested in testing whether the underlying graph is an empty graph versus the alternative that belongs to a set of graphs with a certain structure. Specifically, let be the empty graph, and let be a class of graphs not containing . The following hypothesis testing problem is an example of a detection problem. Given a sample of independent observations from a zero-field ferromagnetic Ising model we aim to test
The term “detection” here is used in the sense that if one rejects the null hypothesis, the presence of a non-null graph has been detected. In (1) the graph class can be arbitrary, which makes the hypothesis testing problem (1) a very general problem. We now give a specific instance of this problem which is of particular importance. Let be a fixed graph with 111For two positive sequences and we write if non-isolated vertices which represents some specific graph structure. The structure detection problem that considers all possible “positions” of is of the following form:
where is the class of all graphs that contain a size- subgraph isomorphic to .
While problems (1) and (2) give a good intuition what a detection problem is, in order to facilitate testing we need to impose certain assumptions on the matrix , as otherwise even with graphs vastly different from the empty graph there might not be enough “separation” between the null and the alternative hypothesis. Since the underlying graph is specified by the matrix , we can reformulate problems (1) and (2) into testing problems on . Given a class of graphs , we define the corresponding parameter space with minimum signal strength as
The results of our paper cover the following examples.
Empty graph versus non-empty graph. We consider testing whether the underlying graph of the Ising model is empty or not. Clearly, since our null hypothesis is that the graph is empty, this is a detection problem. We have .
Clique detection. A clique is a set of vertices such that every two distinct vertices are adjacent. We consider detecting graphs that contain a clique of size . We have . This is a more general version of the previous example, since one can think of a non-empty graph as a graph containing a clique of size .
Star detection. A star is a tree in which all leaves are connected to the same node. We consider detecting graphs that contain an star. In this example, we have .
Community structure detection. In this example we consider a class of graphs with more complex structure. Let and be positive integers. A community is represented by a -clique, which means that every two members in the same community are connected. For a community , we select one fixed representative vertex and denote it as . We consider the class of graphs that contains graphs with at least disjoint communities, such that for every two different communities and , there exists an edge connecting and . In this example we set .
|(a) non-empty graph||(b) -clique||(c) -star||(d) community|
1.1 Main Contributions
There are three major contributions of this paper.
First, we develop a novel technique to derive minimax lower bounds of structure detection problems in Ising models. Our proof technique relates the Ising model probability mass function and the -divergence between two distributions to the number of certain Eulerian subgraphs of the underlying graph. With this technique, we are able to obtain a general information-theoretic lower bound for arbitrary alternative hypothesis, which can be immediately applied to examples including any of the four examples described in the previous section.
Second, we propose a linear scan test on the sample covariance matrix that matches our minimax lower bound for arbitrary structure detection problems, in certain regimes. Along with our general minimax lower bound result, this procedure reveals the fact that a quantity called arboricity, (i.e., a certain maximum edge to vertex ratio of graphs in the alternative hypothesis) essentially determines the information-theoretic limit of the testing problem. This matches the intuition that in order to distinguish a graph with small signal strength from the empty graph, one need to examine the densest part of the graph. Furthermore, the denser the graph is, the easier it is to detect it, where the precise measurement of graph density turns out to be graph arboricity.
In addition, we also study the computational lower bound of structure detection problems. Based on a conjecture on the computational hardness of sparse Principal Component Analysis (PCA), which has been studied by recent works (Berthet and Rigollet, 2013b, a; Gao et al., 2014), we prove that no polynomial time linear test on the sample covariance matrix can detect structures successfully unless there is a sufficiently large signal strength. In addition to this result, we also derive another computational lower bound result under the oracle computational model studied by Feldman et al. (2015a, b); Wang et al. (2015).
1.2 Related Work
Plenty of work has been done on graph estimation (also known as graph selection) in Ising models. Santhanam and Wainwright (2012) gave the first information-theoretic lower bounds of graph selection problems for bounded edge cardinality and bounded vertex degree models. Later, Tandon et al. (2014) proposed a general framework for obtaining information-theoretic lower bounds for graph selection in ferromagnetic Ising models, and showed that the lower bound is specified by certain structural conditions. On the other hand, Ravikumar et al. (2010) proposed an algorithm for structure learning based on
-regularized logistic regression that works in the high temperature regime(Montanari and Pereira, 2009). Bresler (2015) gave a polynomial time algorithm that works for both low and high temperature regimes. Compared to graph estimation, structure detection is a statistically easier problem. As a consequence, the limitations on signal strength that we exhibit in this paper are weaker than the corresponding requirements used in the graph estimation literature.
Structure detection problems have been studied in Addario-Berry et al. (2010); Arias-Castro et al. (2012, 2015b, 2015a). However, all these works focus on Gaussian random vectors. Specifically, Addario-Berry et al. (2010) study testing the existence of specific subsets of components in a Gaussian vector whose means are non-zero based on a single observation. Arias-Castro et al. (2012) consider the correlation graph of a Gaussian random vector and establish upper and lower bounds for detecting certain classes of fully connected cliques based on one sample. In a follow up work, Arias-Castro et al. (2015b) generalize the result to multiple i.i.d. samples. Arias-Castro et al. (2015a)
give another related result on detecting a region of a Gaussian Markov random field against a background of white noise. The major difference between these existing works and our work is that we focus on detection in the Ising model, and our results not only work for cliques, but also for general graph structures. Recently,(Neykov et al., 2016; Lu et al., 2017; Neykov and Liu, 2017) proposed a novel problem where one considers testing whether the underlying graph obeys certain combinatorial properties. We stress that while related to structure detection, these problems are fundamentally different as structure detection is a statistically simpler task. It is not surprising therefore that the algorithms we develop are very different from those in the aforementioned works, and the proofs of our lower bounds use different techniques.
Our result on computational lower bound follows the recent line of work on computational barriers for statistical models (Berthet and Rigollet, 2013b, a; Ma et al., 2015; Gao et al., 2014; Brennan et al., 2018) based on the planted clique conjecture. Berthet and Rigollet (2013b) focus on the testing method based on Minimum Dual Perturbation (MDP) and semidefinite programming (SDP) and prove that such polynomial time testing methods cannot attain the minimax optimal rate for sparse PCA. Berthet and Rigollet (2013a) prove the computational lower bound on a generalized sparse PCA problem which includes all multivariate distributions with certain tail probability assumptions on the quadratic form. Ma et al. (2015)
consider the Gaussian submatrix detection problem and propose a framework to analyze computational limits of continuous random variables via constructing a sequence of asymptotically equivalent discretized models. Inspired by the results inMa et al. (2015), Gao et al. (2014)
consider the computational lower bound for Gaussian sparse Canonical Correlation Analysis (CCA) as well as sparse PCA problems. Our computational lower bound result is based on the previous studies on the sparse PCA problem. We summarize these results and directly base our result for Ising models on a sparse PCA conjecture. By doing this, we are able to use a novel proof technique that utilizes the high-dimensional central limit theorems ofChernozhukov et al. (2014).
Other related works on Ising models include the following. Berthet et al. (2016) study the Ising block model by providing efficient methods for block structure recovery as well as information-theoretic lower bounds. Mukherjee et al. (2018) study the upper and lower bounds for detection of a sparse external magnetic field in Ising models. Daskalakis et al. (2018) consider goodness-of-fit and independence testing in Ising models using pairwise correlations. Gheissari et al. (2017) establish concentration inequalities for polynomials of a random vector in contracting Ising models.
We use the following notations in our paper. For a vector and a number , let . We also define . For a matrix , we denote , and for .
We also use the standard asymptotic notations and . Let and be two sequences and assume that is non-zero for large enough . We write if and if .
Let be the complete vertex set. In this paper we consider graphs with vertices over the vertex set . For a graph , let , where are undirected pairs. Moreover, we denote by the set of non-isolated vertices of .
1.4 Organization of the Paper
Our paper is organized as follows. In Section 2, we present our main information-theoretic lower bound result as well as its applications to various detection problems. In Section 3 we develop a general procedure to construct optimal linear scan tests on the sample covariance matrix. In Section 4 we examine the computational limit of the linear tests on the sample covariance matrix by comparing the covariance matrices of Ising and sparse PCA models. Sections 5 and 6 contain the proofs of the main results of Sections 2 and 3 respectively. The remaining detailed proofs are all placed in Section A.1. In Section B we provide an additional proof of a computational lower bound under the oracle computational model.
2 Lower Bounds
The minimax risk of detection problem (4) is defined as
where and are the joint probability measures of i.i.d. samples under null and alternative hypotheses respectively. The infimum in (6) is taken over all measurable test functions . If , we say that any test is asymptotically powerless.
In this section, we derive necessary conditions on the signal strength required for detection problems to admit tests which are not asymptotically powerless. Our results will show that the difficulty of testing an empty graph against is determined by a quantity called arboricity, which was originally introduced in graph theory by Nash-Williams (1961) to quantify the minimum number of forests into which the edges of a given graph can be partitioned.
For a graph and a vertex set , let be the graph obtained by restricting on the vertices in (i.e., removing all edges which are connected to vertices ). The arboricity of is defined as follows:
where is the ceiling function, and is understood as . The arboricity of a graph measures how dense the graph is. For an illustration of arboricity see Figure 2. Let denote the empty graph. By definition .
For a given graph the larger is, the more different and are. We further define
to measure the difference in graph density between and in a worst case sense. Let be a nonempty subset of such that all graphs in have arboricity . By the definition of , such nonempty exists, and may not be unique. Our analysis works for arbitrary choices of which satisfy the incoherence condition (Neykov et al., 2016) defined as follows. (Negative association and incoherence condition) For , we say the random variables are negatively associated if for any with , any distinct indices , and any coordinate-wise non-decreasing functions and , we have
We say that the graph set is incoherent if for any fixed graph , the binary random variables
are negatively associated with respect to uniformly sampling . For a graph , we denote by the adjacency matrix of . Then given , we define the corresponding parameter set with minimal signal strength as
Let , , and . Then for , by definition (recall (3)) we have , and therefore
By (8), it follows that to give a lower bound on it suffices to lower bound . We are ready to introduce our main theorem.
Let be a non-empty subset of such that all graphs in have arboricity . Define , where
is the uniform distribution over. If is incoherent, , and
then we have
Inequality (9) shows that the necessary signal strength of detection problems is determined by the minimum of three terms. While the first term is related to both the structural properties of graphs in and the sample size , the second term and third term are independent of . Therefore when the sample size is large enough, is the leading term determining the necessary signal strength, and the other two terms mainly serve as scaling conditions of .
The condition (9) given by Theorem 2 is comparable to the “multi-edge” results given in Neykov et al. (2016), where the authors give minimax lower bounds of combinatorial inference problems in Gaussian graphical models. Unlike our results in Theorem 2, the necessary signal strength for Gaussian graphical models given by Neykov et al. (2016) does not explicitly involve graph arboricity. It is also not very clear under what condition the lower bound given by Neykov et al. (2016) is sharp. In comparison, in this paper we show that graph arboricity is an appropriate quantity that gives sharp lower bounds for any structure detection problems under the incoherence condition and the sparsity assumption for some . It is also worth comparing Theorem 2 to the results of Neykov and Liu (2017). The lower bounds on the signal of Neykov and Liu (2017), typically involve the quantity which is generally much larger than the right hand side of (9) when is large enough. This is intuitively clear since detection problems are statistically easier than graph property testing. Our proof strategy is also completely different than the one used by Neykov and Liu (2017), and relies on high temperature expansions rather than Dobrushin’s comparison theorem.
In Theorem 2, the incoherence condition of
is not always easy to check. However, it is known that this condition is satisfied by a various discrete distributions including the multinomial and hypergeometric distributions(Joag-Dev and Proschan, 1983; Dubhashi and Ranjan, 1998). In particular, Theorem 2.11 in Joag-Dev and Proschan (1983) states that negative association holds for all permutation distributions. Therefore, for detection problems of the form (5), incoherence condition is always satisfied by picking to be the set of all graphs isomorphic to . This leads to the following corollary (recall that we are assuming ).
Let be a graph with vertices and be the class of all graphs that contain a size- subgraph isomorphic to . Let . If
then we have
In this section we apply Corollary 2 to specific detection problems.
[Empty graph versus non-empty graph] Consider testing empty graph versus non-empty graph defined in Section 1. If
we have .
[Clique Detection] For the clique detection problem defined in Section 1, if
we have .
[Star Detection] For the star detection problem defined in Section 1, if and
[Community structure detection] For the community structure detection problem defined in Section 1, if , and
we have .
To calculate , we utilize the fact that arboricity equals the minimum number of forests into which the edges of a given graph can be partitioned (Nash-Williams, 1961). Let be the communities. For , we know that is a -clique, and the arboricity is . Therefore inside , we can partition the graph into forests. There is also an -clique in consisting of the cross-community edges. This clique can be partitioned into forests. Note that this -clique shares only one vertex with the community . Therefore for any forest in the partition of this -clique and any forest in the partition of , we can merge them into a single forest because the resulting graph is still acyclic. We can keep merging forests from other communities. Eventually, we can merge forests from distinct communities to a forest in the -clique, without introducing any cycles. If , we will obtain forests that form a partition of ; if , then the partition will contain forests. Therefore by the equivalent definition of arboricity given in (Nash-Williams, 1961) we have . On the other hand, since contains an clique, obviously . Similarly, and hence we have . Therefore .
3 Upper Bounds
In this section we construct upper bounds for the hypothesis testing problem (1). We propose a general framework for testing an empty graph against an arbitrary graph set . We remind the reader that the arboricity of a graph is defined in (7) as
where is the graph obtained by restricting on the vertex set . The arboricity of is then defined as
We now introduce the concept of witnessing subgraph and witnessing set. Before that, we remind the reader, that in this paper all graphs have vertices (i.e., all graphs are over the vertex set ), unless otherwise specified. Therefore a subgraph of a graph is a graph with vertices whose edge set is a subset of the edge set of the larger graph, i.e., where . Importantly, the notation and refer to the non-isolated vertices of and which may be strict subsets of .
[Witnessing Subgraph] For a graph we call the graph a witnessing subgraph of with respect to , if is a subgraph of and . Here we remark that for to be a witnessing subgraph of , it is unnecessary to have . Instead, we only require that , which is a weaker requirement since by definition we have for any . This implies that every graph has at least one witnessing graph, which may be obtained from the densest subgraph of (with potential edge pruning). [Witnessing Set] We call the collection of graphs a witnessing set of , if for every , there exists such that is a witnessing subgraph of . By the definition of , and as we previously argued, every graph must have at least one witnessing subgraph. Therefore at least one witnessing set of exists. We define the set of witnessing graphs in order to facilitate the development of scan tests
. Below we will formalize a test statistic which scans over all graphs in. Importantly, in order to match the lower bound result given by Theorem 2, it is not sufficient to scan directly over the graphs from the set . This is because the graphs in may contain non-essential edges which may introduce noise during the testing. In contrast, the graphs from trim down those non-essential edges and focus only on the essential parts of the graphs in .
We now introduce our general testing procedure. Our test is based on a witnessing set . For we define
where is the -th sample and , are the -th and -th components of respectively. Our test then scans over all possible and calculates the corresponding . We define
and is a large enough absolute constant. The following theorem justifies the usage of the test defined in (15). Given any fixed , suppose that and . If
for a large enough absolute constant , when is large enough we have that the test of (15) satisfies
The detailed proof of Theorem 3 is given in Section 6. We can compare our upper bound result with Corollary 2. For testing problems of the form (5), we can always choose a subgraph of with as a witnessing subgraph (if there are multiple such subgraphs pick any of them), and construct to be the set consisting of all graphs isomorphic to . For this we have . Therefore
[Empty graph versus non-empty graph] Consider testing empty graph versus non-empty graph defined in Section 1. If , and
for a large enough constant , then when is large enough, we have
[Clique Detection] For the clique detection problem defined in Section 1, if , and
for a large enough constant , then when is large enough, we have
[Star Detection] For the star detection problem defined in Section 1, if , and
for a large enough constant , then when is large enough, we have
[Community structure detection] Consider the community structure detection problem defined in Section 1. If , and
for a large enough constant , then when is large enough, we have
If , we have , and we can choose as a witnessing set of ; if , we have , and is a witnessing set of . The rest of proof is identical to the clique detection problem, and we omit the details. ∎
4 Computational Lower Bound
Our results in Section 3 suggests that in order to match the information-theoretic lower bound, one should first determine the densest subgraphs of graphs in , and then scan over all possible positions of such subgraphs. However, such tests may not be computationally efficient: for the structure detection problem (5), if the densest part of contains vertices, then our test requires scanning over at least different positions, and cannot be done in polynomial time if for some constant . On the other hand, one can always relax the testing problem into the “empty graph versus non-empty graph” problem, which, according to Section 3.1, can be tested by scanning over single edges in polynomial time. However, it will require signal strength for some constant to distinguish the null and the relaxed alternative, which does not match the information theoretic lower bound in Theorem 2 for the original detection problem with large maximum arboricity . In this section, we give a detailed analysis of such computational-statistical tradeoffs, and show that the signal strength requirement , up to a logarithmic factor, cannot be improved for polynomial time linear tests.
Let be the sample covariance matrix calculated with samples from the Ising model. We define polynomial time linear tests on as follows. [Polynomial time Linear Test] We call a test polynomial time linear test if there exist an integer for some constant , a binary function and linear functions such that
and since for each , is a linear function of the above is of the form (22). However, the test (15) may not be a polynomial time linear test according to our definition since the number of graphs in may not be bounded by for a constant .
4.1 Main Computational Lower Bound Result
In this section we give our main result on the computational lower bound of structure testing problems in Ising models. Our result is based on a sparse PCA conjecture. Denote by the vector whose -th entries are and other entries are . Let
be the set of covariance matrices from Gaussian spiked model. In sparse PCA, we consider the hypothesis testing problem for i.i.d samples :
We denote by and the probability measure under and respectively. [Computational Hardness of Sparse PCA] Let be an absolute constant. If for some small enough constant , then for any polynomial time test , we have
Conjecture 4.1 is derived by Gao et al. (2014) under the widely believed planted clique conjecture and additional assumptions which essentially require that for some constant and for some small enough constant . It is also studied in Berthet and Rigollet (2013a) and Brennan et al. (2018). We now give our main theorem on the computational lower bound of hypothesis testing problems of the form (5). Under Conjecture 4.1, if for some small enough constant , then for any polynomial time linear test as in (22) and any with non-isolated vertices, we have