With prevalence of network data in recent years, the problem of comparing two populations of networks is gaining increasing attention (Kolaczyk, 2009). Our motivation is brain connectivity analysis, which studies functional and structural brain architectures through neurophysiological measures of brain activities and synchronizations (Varoquaux and Craddock, 2013). Accumulated evidences have suggested that, compared to a healthy brain, the brain connectivity network alters in the presence of numerous neurological disorders, for example, Alzheimer’s disease, attention deficit hyperactivity disorder, autism spectrum disorder, among others. Such alternations are believed to hold crucial insights of disease pathologies (Fox and Greicius, 2010)
. A typical brain connectivity study collects imaging scans, such as functional magnetic resonance imaging, or diffusion tensor imaging, from groups of subjects with and without disorder. Based on the imaging scan, a network is constructed for each individual subject, with the nodes corresponding to a common set of brain regions, and the edges encoding the functional or structural associations between the regions. A fundamental scientific question of interest is to compare the brain networks and to identify local connectivity patterns that alter between the two populations. Network comparison is equally interesting in many other scientific areas as well, for instance, clinical genomics, where of crucial interest is to understand and compare gene regulatory networks of patients with and without cancer(Luscombe et al., 2004).
In the context of brain connectivity analysis, there has been a rich literature on network estimation methods (Ahn et al., 2015; Qiu et al., 2016; Han et al., 2016; Wang et al., 2016; Zhu and Li, 2018, among many others). There is, however, a relative paucity of network inference
methods. Even though both can produce, in effect, a concise representation of the network structure, network inference is a fundamentally different problem than network estimation. Among the few existing network inference solutions,Kim et al. (2014) first summarized the network through a set of network metrics then employed a standard two-sample test. This strategy is commonly employed in the neuroscience literature and is easy to implement. However, it remains unclear to what extent each network metric provides a meaningful representation of brain function and structure (Fornito et al., 2013). Chen et al. (2015) developed a method to detect differentially expressed connectivity subnetworks under different clinical conditions by searching clusters of the graph. They resorted to a permutation test to obtain the -value of the selected subnetwork, but did not provide the statistical significance of individual links and their differences. Besides, they only controlled the family-wise error rate, instead of the false discovery rate. Xia et al. (2015) first encoded the connectivity network by a partial correlation matrix computed from vector-valued data under a normal distribution. They then proposed a multiple testing procedure to compare the partial correlation matrices from the two populations, along with a proper false discovery control. Xia and Li (2019)
further extended the test to matrix-valued data under a matrix normal distribution. In both cases, the test statistics were constructed based on the vector or matrix-valued data, which, as we explain next, may not be directly observable. Moreover, the underlying data distribution may not always be normal or matrix normal.Durante and Dunson (2018) developed a fully Bayesian solution for network comparison, which is very flexible and can handle the data format of our problem, but it requires specification of a series of prior distributions and can be computationally intensive.
Applications such as brain connectivity analysis actually raise new challenges for network inference. First of all, the observed data come in the form of matrices, each of which encodes the network structure for one individual subject, and is the number of network nodes. For instance, in brain structural connectivity, what one observes are the numbers of white matter fibers between pairs of brain anatomical regions, and this matrix of counts forms a network, with brain regions constituting the nodes and the fiber counts the links. This is ultimately different from the data format studied in most network methods, where a network structure is inferred from some vector-valued or matrix-valued data and usually takes the form a Pearson correlation or partial correlation matrix. This fundamental difference in terms of the available data format would thus require a completely new problem formulation and inferential procedure. Second, in a multitude of applications including brain connectivity analysis, the sample size is usually very small, e.g., in tens. This calls for a testing procedure that is powerful enough to detect differentially expressed links under a limited sample size. In this article, we address the problem of comparing two populations of network data, more precisely, the two population means of networks. Meanwhile, we aim to tackle the new data format, and to explicitly enhance the power of the test.
Specifically, suppose we observe two groups of network data, and , where , are the number of network samples for the two groups, respectively. Suppose , , where is some distribution with a symmetric mean matrix . Our goal is to test whether the two population means are the same; i.e.,
If the global null in (1) is rejected, we further aim to identify at which locations the two mean matrices are different. That is, we wish to simultaneously test,
In Xia et al. (2015), the observed data, , , , is of the vector form, and is assumed to follow a normal distribution with the covariance matrix . Let denote the corresponding partial correlation matrix, i.e., the standardized version of . The network structure is then encoded by , and the problem turns to test if . Xia and Li (2019) followed a similar setup, except that the observed data becomes a matrix, and is assumed to follow a matrix normal distribution with the covariance , and the network is still encoded by the standardized version of . The key difference for our setting is that, we do not always observe directly, but instead only. This difference in data format completely distinguishes our method from nearly all existing solutions such as Xia et al. (2015) and Xia and Li (2019). Moreover, we do not impose that the underlying data follows a normal or matrix normal distribution. Instead, we consider a general class of distributions for satisfying some moment condition. Our method works for many different types of network links, for instance, binary links when follows a light tailed distribution, or count links when follows a heavy-tailed distribution.
For the global test (1), we develop a global test statistic as the maximum of a set of individual test statistics, which in our application context may not have a clear correlation structure. We then derive its limiting null distribution, and show the resulting global test is power minimax optimal asymptotically. For the simultaneous test (2), we first develop a multiple testing procedure, and show that it can asymptotically control the false discovery at the pre-specified level. Next we propose a method to substantially enhance the power of the simultaneous inference procedure for (2). Specifically, we extend the grouping-adjusting-pooling idea of Xia et al. (2019), and modify it for our inference of network data. This enhancement in power is particularly useful in applications such as brain connectivity analysis, where the sample size is often limited.
Our proposal makes multiple contributions. First, to the best of our knowledge, there has been no existing inference procedure that directly targets the network data in the form of . Our method bridges this gap, and offers a timely solution to a range of scientific applications where this form of inference problem is commonly encountered. Second, we develop a power enhancement approach that is of practical importance under the limited sample size. Although motivated by Xia et al. (2019), our power enhancement method differs from Xia et al. (2019) considerably in several ways. We explicitly compare the two power enhancement procedures in Section 4.5. As such, our method provides a useful addition to the general toolbox of network inference.
We adopt the following notation throughout this article. For a symmetric matrix , let and
denote the largest and smallest eigenvalues of, respectively. For a set , let denote its cardinality. Write and assume that . For two sequences of real numbers and , write if there exists a constant such that holds for all , write if , and write if there are positive constants and such that for all .
The rest of the article is organized as follows. Section 2 presents the moment conditions for the distribution of and show they are easily satisfied in numerous types of network data. Section 3 develops the global testing and the simultaneous testing procedures for the two-sample network comparison, and Section 4 studies power enhancement, both of which are key to our proposal. Section 5 presents the simulations, and a brain connectivity analysis as an illustration. The Supplementary Appendix collects all technical proofs.
2 Moment Conditions and Examples
We begin with some moment conditions imposed on . We then give a number of examples and show that those conditions are easily satisfied in numerous types of network data.
2.1 Moment conditions
We assume that the distribution of the network data satisfies one of the following two conditions: a sub-Gaussian-type tail, or a polynomial-type tail, as stated below.
(Sub-Gaussian-tail). Suppose that . Suppose that there exist some constants and , such that, for ,
(Polynomial-tail). Suppose that for some constants and . Suppose that there exist some constants and , such that, for ,
We first comment that, both conditions are common, and similar conditions have been often assumed in the high-dimensional setting (Cai et al., 2014; Van de Geer et al., 2014). These moment conditions are much weaker than the Gaussian assumption as usually required in the testing literature (Schott, 2007; Srivastava and Yanagihara, 2010). Next we discuss a number of network data examples that satisfy the above moment conditions, including Bernoulli and mixture Bernoulli data, Poisson data, as well as correlation and partial correlation data. Furthermore, we discuss some examples where the distributions are heavy-tailed, but after some data transformation, they still satisfy the moment conditions. Examples include transformed normal count data and transformed Wishart count data.
2.2 Binary network data
Binary network is arguably the most commonly seen network data type, where each link is a binary indicator. The Bernoulli distribution is often assumed; i.e., for, follows a Bernoulli distribution with mean , for a constant , , and . In such case, satisfies the sub-Gaussian-tail condition in (C1), e.g., with and .
The same holds true for the mixture Bernoulli distribution as discussed in Durante and Dunson (2018). That is, for some integer and randomly selected subject to and , , with for some constant , , , and . For this example, again satisfies the sub-Gaussian-tail condition in (C1), with and .
2.3 Correlation network data
Correlation network is another equally common network data type. In brain functional connectivity analysis and many other applications, the network is often encoded by a Pearson correlation or a partial correlation matrix. Take the Pearson correlation network as an example. The functional imaging data is usually summarized as a spatial-temporal matrix. That is, for the th subject in the th group, the observed data is of the form , , where is the number of brain regions, and is the number of repeated measures. Then the brain functional connectivity network is encoded by the sample correlation matrix , where denotes the th column of the matrix and denotes the sample mean vector (Fornito et al., 2013). Next we show that, as long as satisfies one of the conditions in Lemma 1, then satisfies the sub-Gaussian-tail condition (C1). Let .
Suppose satisfies one of the following conditions: (i) , and there exist constants such that ; (ii) , for some , and there exist constants such that , for . Then satisfies the sub-Gaussian-tail condition in (C1), with and .
We remark that a similar result as Lemma 1 can be obtained for the partial correlation network, by using the inverse regression techniques as in Liu (2013). Xia and Li (2019) tackled the network comparison problem assuming is directly observable and follows a matrix normal distribution. Lemma 1 suggests that, the test we develop later is still applicable when is available, even though it may not be as powerful as the test of Xia and Li (2019) in this case. On the other hand, the main focus of this article is to develop a test of comparing two networks even when is not observed, but only is. As such, our test is more general than that of Xia and Li (2019).
2.4 Poisson network data
Count network is another common network data type, where each link is a count. For instance, in brain structural connectivity analysis, the link is the number of white matter fibers between anatomical brain regions. The Poisson distribution is often imposed; i.e.,follows a Poisson distribution with mean , , , . For any constant , let be the smallest integer that is no smaller than , where is as defined in (C2). Then satisfies the polynomial-tail condition (C2), with upper bounded by , and is the number of ways to partition a set of objects into non-empty subsets.
2.5 Transformed network data
We next consider some examples where the original network data , , and is some heavy-tailed distribution that only differs in the mean matrix between the two groups. In such cases, testing the means of the original samples are equivalent to testing the means of the transformed data, , where is some one-to-one transformation function. We next give two examples, where after transformation, the transformed data satisfies (C1) or (C2) or both.
One example is the log-normal count network. After the logarithmic transformation of , the transformed data follows a normal distribution, and thus both (C1) and (C2) are satisfied. This can be further extended to the transformed normal mixture network.
Another example is the transformed Wishart count network, where the transformed data follows the Wishart distribution with the scale matrix
and the degrees of freedom. For this case, satisfies the sub-Gaussian-tail condition (C1). Moreover, in this case, the testing problems (1) and (2) are closely related to the covariance matrix testing problems studied in Li and Chen (2012) and Cai et al. (2013). The key difference between our method and the existing ones is that, we only observe , but not the original vector samples. This example can also be further extended to the case of the product of Gaussian mixtures network, or the Wishart mixtures network.
3 Two-sample Test on Network Data
We begin with the construction of test statistic for the two testing problems (1) and (2). We then develop a global testing procedure for (1), and a simultaneous testing procedure for (2). For each test, we derive its corresponding asymptotic properties.
3.1 Test statistics
3.2 Global test
In brain connectivity analysis and many other applications, it is generally postulated that the differences between two network structures concentrate on a small number of brain regions. This translates to a sparse alternative in our global test. Correspondingly, we construct the global test statistic as,
We remark that, in some applications, the test statistics ’s may be correlated, and a global test statistic that utilizes such correlations may result in a more powerful test. However, this may not always be the case. For instance, in our brain connectivity application, the nodes are usually the brain anatomical regions, which can scatter at distant locations of the brain. As a result, there is no obvious correlation structure for ’s.
Let denote the covariance matrix of vech(), where , and vech is the operator that turns the upper triangular part of into a vector. Let denote the corresponding correlation matrix. We introduce two conditions.
for some constant , .
for some constant .
Both conditions are mild. Particularly, Condition (A1) on the eigenvalues of the covariance matrix is common in the high dimensional setting (Bickel et al., 2008; Rothman et al., 2008; Yuan, 2010; Cai et al., 2014). Condition (A2) is also mild, because if , then is singular. We next obtain the limiting distribution of our test statistic .
Suppose that (A1)-(A2), and one of (C1) and (C2) hold. Then for any ,
Based on this limiting null distribution, we define the asymptotic -level test as,
We next study the power and the asymptotic optimality of the test . Toward that end, define the sparsity of as . We also introduce a class of ,
Suppose that one of (C1) and (C2) holds. Then,
Furthermore, suppose that for some . Let and . Then there exists a constant such that, for all sufficiently large and ,
where is the set of all -level tests, i.e., for all .
This theorem shows that the null hypothesis in (1) can be rejected by
with a high probability if the pair of the network means belong to the class. In addition, with the mild sparsity condition , the lower bound rate of cannot be further improved, because for a sufficiently small , any -level test is unable to reject the null correctly uniformly over with probability tending to 1. Henceforth, the global test reaches the power minimax optimality asymptotically.
3.3 Simultaneous test
We next develop a multiple testing procedure for (2) based on the test statistic in (4). Let be the threshold level such that is rejected if . Let be the set of true nulls, and the set of true alternatives, where . Denote by and the total number of false positives and rejections, respectively. Then we define the false discovery proportion and false discovery rate by
An ideal choice of would reject as many true positives as possible while controlling the FDP at the pre-specified level . That is, we select . Since is unknown, we estimate it conservatively by , where
is the standard normal cumulative distribution function. This leads to our multiple testing procedure as summarized in Algorithm1.
We next show that this testing procedure controls the FDR and FDP asymptotically at the pre-specified level. For notation simplicity, we write FDP=FDP() and FDR=FDR(), where is obtained in Algorithm 1. Define , and . We further introduce some conditions.
, for some constant and any sufficiently small constant .
for some constants and .
for some constant .
Condition (B1) on is mild, as it only requires a small number of and having standardized difference with the order of for any sufficiently small constant . Condition (B2) is mild, as it requires that not too many are highly correlated, but still allows the number of highly correlated pairs to grow in the order of . Condition (B3) is also a natural and mild assumption, because if it does not hold, i.e., , then we can simply reject all the hypotheses. As a result, we would have , , and the FDR would tend to zero. Under these conditions, we obtain the asymptotic properties of our multiple testing procedure in terms of false discovery control.
Suppose that (A1)-(A2), (B1)-(B3), and one of (C1) and (C2) hold. Then,
4 Power Enhancement
In brain connectivity analysis and many other applications, the sample size is often small, whereas the number of nodes can be moderate to large. This results in a limited power for the proposed testing procedure. We explore in this section an explicit power enhancement method that has potential to substantially improve the power of the simultaneous inference developed in Section 3.3. We borrow the idea of grouping, adjusting and pooling (GAP) that was first proposed in Xia et al. (2019). However, our method differs from Xia et al. (2019) in many ways, including a different, and actually less restrictive, assumption, a different set of primary and auxiliary statistics, and a different modification of the multiple testing procedure. We show that the modified procedure is asymptotically more powerful, meanwhile it can still control FDR and FDP asymptotically. We obtain these properties assuming the sub-Gaussian-tail condition (C1). Parallel results can be obtained under the polynomial-tail condition (C2) as well, but are technically more involved. We begin by describing the intuition behind our power enhancement solution, then derive the proper auxiliary statistic for our inference problem. We then develop the modified simultaneous testing procedure, and study its asymptotic properties in terms of power improvement and false discovery control. We also compare in detail our method with the GAP method of Xia et al. (2019).
We recognize that there exists additional information in the data that is potentially useful to improve the simultaneous testing procedure of Algorithm 1. We first discuss our intuition, then use some simple example to illustrate where the auxiliary information is and how it can facilitate our multiple testing procedure.
In a multitude of applications including brain connectivity analysis, it is often believed that the difference between the two networks under different biological conditions is small. This means is sparse. Accordingly, one can find a baseline matrix , such that and are individually sparse. Let denote the support of , , and denote the union support. Note that the set of alternative hypotheses defined in Section 3.3 is the same as , if for every . In general, is a proper subset of . Since and are both sparse, we realize that the cardinality of is small. Moreover, the following relationship holds true:
Therefore, the knowledge about is useful to help narrow down the search in multiple testing. In other words, if one can find a way to identify possible entries in , it would provide useful information about the set of true alternatives , or equivalently, the set of true nulls . As a consequence, it can potentially increase the power of the testing procedure.
A key observation is then, while the test statistic is built on the difference between and as defined in Section 3.1, the sum of and can provide crucial information about . Consider a toy example where the network data is binary, and follows a Bernoulli distribution with mean , . Assume that for of the pairs, for of the pairs, and for the rest of the pairs, and . In this example, for the pairs , the sum of and is either very small, which is 0.2, or very large, i.e., 1.8. Meanwhile, for the pairs , the sum is in between. Henceforth, this sum contains useful information about , and can potentially enhance the power of the multiple testing procedure.
Based on the above discussion, we can see that, the more sparsity structure information the auxiliary statistics can capture, the more information they can provide about the union support , and the more substantial power gain the test can achieve. In general, the sparser the true difference is, the more information the auxiliary statistics can offer.
4.2 Auxiliary statistics
We next formally construct the auxiliary statistic that provides useful information about the union support . It is important to note that, the auxiliary statistic should be constructed so that they are asymptotically independent of the test statistic in (4). This way the null distribution of would not be distorted by the incorporation of the auxiliary statistic.
Recall that is the sample variance of as defined in (3). We construct the auxiliary statistic as,
where . The next proposition shows that the test statistic and the auxiliary statistic are asymptotically independent under the null hypothesis. Define
Suppose (C1) holds. For any constants and , we have
uniformly for , , and , with . Furthermore, for all with an integer constant ,
uniformly for and , where .
4.3 Power enhanced simultaneous test
Based on , we now modify the simultaneous testing procedure of Algorithm 1. We first describe the main idea. We next summarize the modified testing procedure in Algorithm 2. Finally, we discuss some specific choices of the key parameters of the algorithm.
Since there are totally tests to carry out simultaneously, we rearrange the pairs of into . After obtaining all the -values, , from Algorithm 1, our basic idea is to adjust those -values by , with being the adjusting weights, . We utilize the auxiliary statistics to help compute the adjusting weights , by groups. Specifically, we consider a set of grid values, , where , and are some pre-specified constants. We divide the index set into groups according to the auxiliary statistics . As an example, we take . That is, we choose two grid points in , and obtain groups of indices, , , and . For each group , we compute its cardinality, . We also estimate the proportion, , of alternatives in , . To do so, we employ the method of Schweder and Spjøtvoll (1982) and Storey (2002) to obtain an estimate first, then stablize it by , where is a small positive number; we set . Then for all the indices in , we compute the group-wise adjusting weight:
This idea of adjusting the weights by groups is motivated by our intuition discussed in Section 4.1. After obtaining the weights, we adjust the -values and apply the Benjamini-Hochberg procedure (Benjamini and Hochberg, 1995, BH) to the adjusted -values . Finally, we search all possible choices of among , and find the one that yields the largest number of rejections. We apply the BH procedure again to the adjusted -values under this choice of to obtain the final adjusted rejection region. We summarize this modified simultaneous testing procedure in Algorithm 2.
Step 1.1: Compute the test statistics and auxiliary statistics .
Step 1.2: Compute the -values: , .
Step 1.3: Input the pre-specified constants , , and .
Step 1.4: Compute the grid set:
Step 2.1: Construct , .
Step 2.2: For each , compute the cardinality, .
Step 2:3: For each , estimate the proportion, , of alternatives in .
Step 2.4: Compute the adjusting weights , , according to (5).
Step 2.5: Adjust the -values: , .
Step 2.6: Apply the BH procedure, and record the total number of rejections.
Step 3.1: Choose that yields the largest number of rejections.
Step 3.2: Compute the corresponding adjusted -values: , .
Step 3.3: Reorder all the adjusted -values: .
Step 3.4: Output the rejection region , where .
We discuss some specific choices of the parameters in Algorithm 2. First, the number of groups is usually set at . As shown in Xia et al. (2019), when , there is little additional power gain, but a more expensive computation. Second, the constants and can be chosen so that is equal to the smallest value of the auxiliary statistics and is equal to the largest value of the auxiliary statistics. If the absolute values of the smallest and largest auxiliary statistics exceed , we truncate at and to stabilize and expedite the computation. We note here that, if the network data are non-negative, such as the binary and poisson network data, then both and are non-negative. By contrast, in Xia et al. (2019), and were fixed at and . Finally, can be any integer for the theoretical validity. Numerically, a larger value of implies a more precise grid search, but at the cost of a heavier computational burden. We choose such that the gap between two adjacent grid points, , equals approximately.
4.4 FDR control and power enhancement
Denote the adjusted -values from Algorithm 2, and the ordered adjusted -values. The corresponding adjusted FDP is:
where is the cutoff obtained from Step 3.4 of Algorithm 2, and is the indicator function. Accordingly, . The next theorem shows that the modified procedure can still control FDR and FDP asymptotically.
Suppose (A1)-(A2), (B1)-(B3), and (C1) hold. Suppose for some constant . Then,