1 Introduction
Alzheimer’s disease (AD) is the most common neurodegenerative disease. For AD diagnosis, neuroscience researchers often resort to Brain Connectivity Analysis (BCA) to reveal the underlying pathogenic mechanism through correlations in the neurophysiological measurement of brain activity. Functional Magnetic Resonance Imaging (fMRI), which records the blood oxygen level dependent (BOLD) time series, has been the mainstream imaging modalities to study brain functional connectivity. The observed fMRI data have the spatial by temporal matrix structure, where the rows correspond to the brain regions while the columns correspond to the time points, see the fMRI and matrix data in part (A) of Figure
1.Growing evidence has shown that the brain connectivity network experiences alterations with the presence of Alzheimer’s disease (Ryali et al., 2012; Higgins et al., 2019; Xia and Li, 2019). Differential network analysis or network comparison has been an important way to provide deep insights into disease pathologies, see Yuan et al. (2015); Tian et al. (2016); Ji et al. (2016); Yuan et al. (2016); Ji et al. (2017); Zhang et al. (2017); He et al. (2018a); Grimes et al. (2019). Most literature focuses on differential network modelling for vectorform data. For the observed fMRI spatial by temporal matrix data, directly applying the existing differential network analysis methods for vectorform data would ignore the intrinsic matrix structure and may result in unreliable conclusion. Zhu and Li (2018)
employed nonconvex penalization to tackle the estimation of multiple graphs from matrixvalued data, while
Xia and Li (2019)formulated the problem as testing the equality of individual entries of partial correlation matrices, both under a matrix normal distribution assumption.
An ensuing problem is the imageguided medical diagnosis of AD, i.e., to classify a new subject into the AD or control group from the fMRI data. Classification or discriminant analysis is a classical problem in statistics. There have been a good number of standard vectorvalued predictor classification methods, such as logistic regression and linear discriminant analysis, and corresponding modified versions to cope with the highdimensionality of predictors, see for example,
Bickel and Levina (2004); Zhu and Hastie (2004); Zou and Hastie (2005); Park Mee and Hastie (2008); Fan and Fan (2008); Cai and Liu (2011); Mai et al. (2012); He et al. (2016); Ma et al. (2020). For the fMRI matrix data, the naive idea, which first vectorizes the matrix and then adopts the standard vectorvalued predictor classification, does not take advantage of the intrinsic matrix structure and thus leads to low classification accuracy. To address this issue, several authors presented logistic regressionbased methods and linear discriminant criterion for classification with matrixvalued predictors: Zhou and Li (2014) proposed a nuclear norm penalized likelihood estimator of the regression coefficient matrix in a generalized linear model; Li and Yuan (2005); Zhong and Suslick (2015); Molstad and Rothman (2019) modified Fisher’s linear discriminant criterion for matrixvalued predictors.In the article, we propose a method which achieves Simultaneous Differential Network analysis and Classification for MatrixVariate data (SDNCMV). The SDNCMV is indeed an ensemble learning algorithm, which involves three main steps. Firstly, we propose an individualspecific spatial graphical model to construct the betweenregion network measures (connection strengths), see part (A) in Figure 1. In practice, we measure the betweenregion network strengths by a specific transformation function of the partial correlations in order to separate their values between two groups well. In the second step, with the constructed individual betweenregion network strengths and confounding factors, we adopt the bootstrap technique and train the Penalized Logistic Regression (PLR) models with bootstrap samples, see part (B) in Figure 1. Finally, we ensemble the results from the bootstrap PLRs to boost the classification accuracy and network comparison power, see part (C) in Figure 1.
The advantages of SDNCMV over the existing stateoftheart methods lie in the following aspects. First, as far as we know, the SDNCMV is the first proposal in the neuroscience literature to achieve network comparison and classification for matrix fMRI data simultaneously. Second, the SDNCMV performs much better than the existing competitors in terms of classification accuracy and network comparison power, especially when the two populations share similar mean (usually demeaned) while differ in population (partial) correlation structure  note that almost all the existing classification methods focus on the population mean difference with common correlation structure. Third, the SDNCMV is an ensemble machine learning procedure, which makes a strategic decision based on the different fitted PLRs with bootstrap samples. It is thus more robust and powerful. Fourth, the SDNCMV addresses an important issue in brain network comparison, i.e., it can adjust confounding factors such as age and gender, which has not taken into full account in the past literature. The results are illustrated both through simulation studies and the real fMRI data of Alzheimer’s disease.
The rest of the paper is organized as follows. In Section 2, we present the twostage procedure which achieves classification and network comparison simultaneously. In Section 3, we assess the performances of our method and some stateoftheart competitors via the Monte Carlo simulation studies. Section 4 illustrates the proposed method through an AD study. We summarize our method and present some future directions in Section 5.
2 Methodology
2.1 Notations and Model Setup
In this section, we introduce the notations and model setup. We adopt the following notations throughout of the paper. For any vector , let , . Let be a square matrix of dimension , define , , . We denote the trace of as and let be the vector obtained by stacking the columns of . Let be the operator that stacks the columns of the upper triangular elements of matrix excluding the diagonal elements to a vector. For instance, , then . The notation represents Kronecker product. For a set , denote by the cardinality of . For two real numbers , define .
Let be the spatialtemporal matrixvariate data from fMRI with spatial locations and time points. We assume that the spatialtemporal matrixvariate follows the matrixnormal distribution with the Kronecker product covariance structure defined as follows.
We say follows a matrix normal distribution with the Kronecker product covariance structure , denoted as
if and only if follows a multivariate normal distribution with mean and covariance , where and denote the covariance matrices of spatial locations and times points, respectively.
The matrixnormal distribution framework is scientifically relevant in neuroimaging analysis and BCA study, see, for example, Yin and Li (2012); Leng and Tang (2012); Zhou (2014); Xia and Li (2017); Zhu and Li (2018); Xia and Li (2019). Under the matrixnormal distribution assumption of , we have
where and denote the spatial precision matrix and temporal precision matrix, respectively. and are only identifiable up to a scaled factor. In fact, in the brain connected network analysis, the partial correlations or equivalently the scaled precision matrix elements is a commonly adopted correlation measure (Peng et al., 2009; Zhu and Li, 2018). In addition, the primary interest in BCA study is to infer the connectivity network characterized by the spatial precision matrix while the temporal precision matrix is of little interest. Under the matrix normal framework, a regionbyregion spatial partial correlation matrix, , characterizes the brain connectivity graph, where is the diagonal matrix of . Brain connectivity analysis is in essence to infer the spatial partial correlation matrix .
In the current article, we focus on the Brain Connectivity Alteration Detection (BCAD) of Alzheimer’s disease, i.e., to identify the differences in the spatial partial correlation matrices of the AD group and the control group. The most notable feature that differentiates our proposal from the existing BCADrelated literature is that we take the variations of the strengths of spatial network connections across subjects into account. At the same time, we also focus on the fMRIguided medical diagnoses of Alzheimer’s disease. Let be the subjects from the AD group and from the control group, which are all spatialtemporal matrices. We assume that and , which indicates the individualspecific betweenregion connectivity strengths. Most existing literature assume that the individuals in the same group share the same spatial matrix while our proposal allows individual heterogeneity. Recent medical studies provide evidence for the presence of substantial heterogeneity of covariance network connections between individuals and subgroups of individuals, see for example, Chen et al. (2011); AlexanderBloch et al. (2013); Xie et al. (2020).
2.2 The SDNCMV Procedure
In this section, we present the detailed procedure for Simultaneous Differential Network analysis and Classification for MatrixVariate data (SDNCMV). The SDNCM involves the following three main steps: (1) construct the individualspecific betweenregion network measures; (2) train the Penalized Logistic Regression (PLR) models with bootstrap samples; (3) ensemble results from the bootstrap PLRs to boost the classification accuracy and network comparison power. We give the details in the following subsections.
2.2.1 Individualspecific betweenregion network measures
In the section, we introduce the procedure to construct the individualspecific betweenregion network measures. We focus on the AD group while the control group can be dealt similarly. Recall that is the th subject from the AD group, and let be the th column of , respectively. The individual spatial sample covariance matrix is simply
(2.1) 
where . We assume that is larger than or comparable to , which is typically the case for fMRI data. We also assume that the is a sparse matrix and estimate it by the Constrained minimization for Inverse Matrix Estimation (CLIME) method by Cai et al. (2011). That is,
(2.2) 
where is a tuning parameter. In the optimization problem in (2.2), the symmetric condition is not imposed on , and the solution is not symmetric in general. We construct the final CLIME estimator by symmetrizing as follows. We compare the pair of nondiagonal entries at symmetric positions and , and assign the one with smaller magnitude to both entries. That is,
One may also use other individualgraph estimation methods such as Graphical Lasso (Friedman et al., 2008)
. We adopt CLIME for the sake of computational efficiency, which is very attractive for highdimensional data as it can be easily implemented by linear programming. We select the tuning parameter
by the Dens criterion function proposed by Wang et al. (2016). We use the R package “DensParcorr” to implement the procedure for inferring individual partial correlation matrix, which allows the user to specify the desired density level . We set the the desired density level at in practice.For each subject in the AD group, we can obtain an estimator of the individual spatial precision matrix by the CLIME procedure, that is, . Parallelly, we can obtain for the control group subjects. We then scale the spatial precision matrices to obtain the partial correlation matrices
where , are the diagonal matrix of , , respectively. We then define the individualspecific betweenregion network measures as follows:
where and are the th element of ,
, respectively. In fact, the defined betweenregion network measures are the Fisher transformation of the estimated partial correlation. Fisher transformation is well known as an approximate variancestabilizing transformation, which alleviates the possible effects of skewed distributions and/or outliers and contributes to the outstanding performance of the PLR in the second stage.
2.2.2 Penalized logistic regression
Logistic regression is a classical statistical model for a binary classification problem. In this section we introduce the penalized logistic regression model and the bootstrap procedure.
Let
and let the binary response variable be denoted as
and its observations are , whereDenote
as the probability of the event
, i.e., . The secondstage logistic model for Alzheimer’s disease outcome is:(2.3) 
where denote the confounder covariates (e.g. age and gender) and is the Fisher transformation of the spatial partial correlation between the th and th regions , i.e.,
Note that it adds up to variables in (2.3), which could be very large when is large. Thus we are motivated to consider a sparse penalty on the coefficients . Finally, the Penalized Logistic Regression (PLR) model for estimating is as follows:
(2.4) 
where , is the th observation of , is a penalty function and is the individualspecific network strengths estimated from the firststage model, i.e.,
The penalty function is selected as the elastic net penalty (Zou and Hastie, 2005) to balance the and penalties of the lasso and ridge methods, i.e.,
where is a tuning parameter. The tuning parameters can be selected by the CrossValidation (CV) in practice.
If the coefficient , there exists an edge between the brain regions and in the differential network, which can be used by PLR to discriminant the AD subjects from the control subjects. Thus, with the estimated support set of , we can recover the differential edges and at the same time, the subjects in the test set are classified based on the fitted . That is how our proposal achieves Simultaneous Differential Network analysis and Classification for MatrixVariate data (SDNCMV). It is worth pointing out that we can modify the model in (2.3) such that it includes the original variables as a part of the confounders and also add a penalty for . In the current paper, we mainly focus on the predictive ability of the individualspecific network strengths irrespective of the original variables.
2.2.3 Ensemble Learning with bootstrap
In this section, we introduce a bootstrapbased ensemble learning method which further boosts the classification accuracy and the network comparison power. With the subjects from the AD group and subjects from the control group, we randomly sample subjects from the AD group and samples from the control group separately, both with replacement, and then conduct the two steps as introduced in the last two sections. We repeat the resampling times and then we denote the regression coefficients as and the outsample classification label . We classify the new test sample as AD if . Similarly, to boost the network comparison power, we calculate the differential edge weight for each pair of nodes , defined as . For a prespecified threshold , we believe that there exists the edge in the differential network if . A subjective way to determine is simply to take the value . We also recommend to determine by a scree plot, see the real example in Section 4. The simulation study in the following section shows that the bootstrapassisted ensemble learning boosts the classification accuracy and the network comparison power considerably. The entire SDNCMV procedure is summarized in Algorithm 1.
3 Simulation Study
In this section, we conducted simulation studies to investigate the performance of the proposed method SDNCMV in terms of classification accuracy and network comparison power. In section 3.1, we introduce the synthetic data generating settings, and show the classification accuracy and network comparison results in Section 3.2 and Section 3.3, respectively.
3.1 Simulation Settings
In this section we introduce the data generating procedure for numerical study. We mainly focus on the predictive ability of the betweenregion network measures without regard to the confounder factors. To this end, the data were generated from matrix normal distribution with mean zero, i.e., for AD group, we generated independent samples from with ; and for the control group, we generated independent samples from with and the scenarios for the covariance matrices are introduced in detail below.
For the temporal covariance matrices: and , the first structural type is the AutoRegressive (AR) correlation, where and . The second structural type is Band Correlation (BC), where for and 0 otherwise and for and 0 otherwise. The temporal covariance matrices of the individuals in the same group are exactly the same for simplicity.
For the spatial covariance matrices, we first construct the graph structure . We consider two types of graph structure: the Hub structure and the SmallWorld structure. We resort to R package “huge” to generate Hub structure with 5 equallysized and nonoverlapping graph and use R package “rags2ridges” to generate SmallWorld structure with 10 smallworld graph and 5% probability of rewiring. The two graph structures and the corresponding heat maps of the partial correlation matrix are shown in Figure 2. For further details of these two graph structures, one may refer to Zhao et al. (2012) and Wieringen and Peeters (2016). Then, based on the graph structure , we generate two base matrices, for AD group and for control group. In detail, we determine the positions of nonzero elements of matrices and by . Then we filled the nonzero positions in matrix
with random numbers from a uniform distribution with support
. We randomly selected two blocks of and changed the sign of the element values to obtain . To ensure that these two matrices are positive definite, we let and . The differential network is in essence modelled as . With the base matrices , we then generate the individualspecific precision matrices. We generated independent matrices by changing the nonzero elements in matrix to random values from the normal distribution . Finally, we get individualspecific precision matrices by , with the same structure but different elements. The covariance matrices are set to be for . By a similar procedure, we get precision matrices and covariance matrices for .To sum up, we considered four covariance structure scenarios for and :

Scenario 1: and , where and are AR covariance structure type and and are based on with Hub structure.

Scenario 2: and , where and are AR covariance structure type and and are based on with SmallWorld structure.

Scenario 3: and , where and are BC covariance structure type and and are based on with Hub structure.

Scenario 4: and , where and are BC covariance structure type and and are based on with SmallWorld structure.
We set , and , the Bootstrap times and all simulation results are based on 100 replications. For the assessment of classification accuracy, we also generate test samples independently for each replication, where are the sizes of test samples from the AD group and the control group respectively. Also note that even , there are variables in the secondstage penalized logistic regression.
3.2 Classification Accuracy Assessment
To evaluate the classification performance of our method, we considered four competitors including the Matrixvalued Linear Discriminant Analysis (MLDA) by Molstad and Rothman (2019)
and three classical vectorvalued machine learning methods: Random Forest (vRF), Support Vector Machine (vSVM) and vectorvalued Penalized Logistic Regression (vPLR). To implement the vectorvalued machine learning methods, we simply vectorize the matrices
of dimension and treat them as observations of a random vector of dimension . For fair comparison, we also consider the Random Forest (RF), Support Vector Machine (SVM) and Penalized Logistic Regression (PLR) methods with the same input variables as SDNCMV, i.e., the variables . It is worth noting that these methods have not been used for classification with in neuroscience community. We adopted the existing R packages to implement these competitors, i.e., “MatrixLDA” for MLDA, “randomForest” for RF and vRF, “e1071” for SVM and vSVM, and “glmnet” for PLR and vPLR. We compute the misclassification error rate on the same test set of size for each replication to compare classification accuracy.Table 1 shows the misclassification rates averaged over 100 replications for Scenario 1  Scenario 4 with , from which we can clearly see that SDNCMV performs slightly better than RF, SVM and PLR while shows overwhelming superiority over the vectorvalued competitors (vRF, vSVM, vPLR) and MLDA in various scenarios. The methods MLDA, vSVM, vPLR and vPLR yield results close to a coin toss. The proposed SDNCMV, in contrast, has misclassification rates 0.0% in all simulation settings. RF, SVM and PLR also perform well, and RF seems perform better than SVM and PLR. This indicates that the constructed network strength variables are powerful for classification of imagingtype data.
With fixed , increasing from 50 to 100 results in much better performance for SDNCMV, RF, SVM and PLR while with fixed , increasing from 100 to 150 results in worse performance. A question naturally aries, why the MLDA, vSVM, vPLR, vRF perform so poorly? In the data generating procedure, note that the data in two groups both have mean zero, and MLDA, vSVM, vPLR, vRF rely on the mean difference for classification. Our proposal provides an effective solution to a rarely studied problem: when two multivariate populations share the same or similar mean but different covariance structure, with the observed training data, how to classify the test samples to the correct class? Most existing literature on classification assume that the two populations have mean difference but common covariance matrix.
3.3 Network Comparison Assessment
In order to assess the performance of differential network estimation, we compare our method with two joint multiple matrix Gaussian graphs estimation approaches proposed by Zhu and Li (2018), which are denoted as Nonconvex and Convex. Zhu and Li (2018) compared the performance of differential network estimation between their methods and some stateoftheart vectorvalued methods. They concluded that the former is better, so we do not compare our method with any vectorvalued approaches in this study.
The criteria that we employ to evaluate the performance of different approaches are true positive rate (TPR), true negative rate (TNR) and true discovery rate (TDR). Denote the true differential matrix by and its estimate by . Then, the TPR, TNR and TDR are defined as:
To evaluate the performance of the differential network estimators by these methods, we present the average values of TPR, TNR and TDR over 100 replications, as well as the PrecisionRecall (PR) curves.
Table 2 shows the TPRs, TNRs and TDRs by different methods averaged over 100 replications for Scenario 1  Scenario 4 with . From Table 2, it can be seen that the three methods have comparable TPRs in all scenarios, while SDNCMV has comparable TNRs with Nonconvex method, which are almost all higher than those of the Convex method. The superiority of SDNCMV is clearly shown from the TDRs, which are higher than those of the Nonconvex and Convex methods by a large margin. Figure 3 show the PrecisionRecall curves for Scenario 14 with varied combinations of , , where red line represents SDNCMV, blue line represents Nonconvex method, and green line represents Convex methods. From the PR curves, we can see the great advantages of SDNCMV over the Nonconvex and Convex methods in terms of differential network structure recovery.
In summary, the simulation results show that the our method SDNCMV outperforms its competitors in terms of both classification accuracy and network comparison power, illustrating its advantage in various scenarios.
4 The fMRI Data of Alzheimer’s disease
4.1 Description of the dataset and the preprocessing procedures
We applied the method to Alzheimer’s Disease Neuroimaging Initiative (ADNI) study. The ADNI was launched in 2003 as a publicprivate partnership, led by Principal Investigator Michael W. Weiner, MD. One of the main purposes of the ADNI project is to examine differences in neuroimaging between Alzheimer’s Disease(AD) patients and normal controls (NC). Data used in our analysis were downloaded from ADNI website (http://www.adni.loni.usc.edu) and included resting state fMRI (rsfMRI) images collected at screening for AD and CN participants. A T1weighted highresolution anatomical image (MPRAGE) and a series of resting state functional images were acquired with 3.0 Tesla MRI scanner (Philips Systems) during longitudinal visits. The rsfMRI scans were acquired with 140 volumnes, TR/TE = 3000/30 ms, flip angle of 80 and effective voxel resolution of 3.3x3.3x3.3 mm. More details can be found at ADNI website (http://www.adni.loni.usc.edu). Quality control was performed on the fMRI images both by following the Mayo clinic quality control documentation (version 02022015) and by visual examination. After the quality control, 61 subjects were included for the analysis including AD patients and NC (normal control) subjects. For gender distribution, there are 14 (47%) males for AD, and 14 (45%) males for CN. The mean (SD) of age for each group is 72.88 (7.12) for AD and 74.38 (5.93) for CN).
Standard procedures were taken to preprocess the rsfMRI data. Skull stripping was conducted on the T1 images to remove extracranial material. The first 4 volumes of the fMRI were removed to stabilize the signal, leaving 136 volumes for subsequent prepossessing. We registered each subject’s anatomical image to the 8th volume of the slicetimecorrected functional image and then the subjects’ images were normalized to MNI standard brain space. Spatial smoothing with a 6mm FWHM Gaussian kernel and motion corrections were applied to the function images. A validated confound regression approach (Satterthwaite et al., 2014; Wang et al., 2016; Kemmer et al., 2015) was performed on each subject’s rsfMRI time series data to remove the potential confounding factors including motion parameters, global effects, white matter (WM) and cerebrospinal fluid (CSF) signals. Furthermore, motionrelated spike regressors were included to bound the observed displacement and the functional time series data were bandpass filtered to retain frequencies between 0.01 and 0.1 Hz which is the relevant range for rsfMRI.
To construct brain networks, we considered Power’s 264 node system (Power et al., 2011) that provides good coverage of the whole brain. Therefore, the number of brain regions is , and the temporal dimension is . For each subject, we first an efficient algorithm implemented by R package “DensParcorr” (Wang et al., 2016) to obtain the partial correlations between each pair of the 264 brain regions.
4.2 Classification results of the fMRI data
We applied the proposed SDNCMV method and other methods to classify AD and CN group based on the rsfMRI connectivity. Since the number of betweenregion network measures is very large, i.e., , we first adopt a Sure Independence Screening procedure (Fan and Lv, 2008) on the connectivity variables to avoid possible overfitting and improve computational efficiency. We used the R package “SIS” to filter out 85% of the variables, and retained the remaining variables in the active set. For a fair comparison, we adopt the same variables for the methods SDNCMV, RF, SVM, PLR. For the vectorvalued methods vRF, vSVM and vPLR, we applied the screening methods to filter 85% of the total . The MLDA is computationally intractable for this real dataset and is thus omitted. For all the methods, we take confounders into account: age, education level, gender, and APOlipoprotein E (APOE).
To assess the classification accuracy of different methods, we randomly select 20 subjects from the AD group and 21 subjects from the NC group as training subjects, leaving 10 subjects in each group as the test subjects. That is, we had 41 subjects in the training set and 20 in the test set. We repeat this random sampling procedure for 100 times and report the average misclassification errors in Table 3, and further show the boxplots of the errors in Figure 4. From Table 3, we can see that the proposed SDNCMV does not make any mistake in classifying the test subjects, neither does the SVM method. RF method also performs very well, compared with the vectorvalued methods. In summary, the proposed SDNCMV has comparable performance with these well accepted machine learning methods.
4.3 Network comparison results of the fMRI data
For network comparisons between the AD and NC group, we used all the 30 AD subjects and 31 NC groups. We assigned the 264 nodes in the Power’s sytem to 14 functional modules. Figure 6 visualized the location and number of nodes for each functional module and nodes with different colors indicated that they belonged to different function modules. For more information, see https://github.com/brainspaces/power264. All the brain visualizations in this article were created using BrainNet Viewer proposed by Xia et al. (2013). For the SDNCMV method, we adopt the same screening procedure as described in the last section and we determine the threshold as follows. We draw a scree plot in Figure 5, the horizontal axis is the value of and the vertical axis is the number of the estimated differential edges by SDNCMV. From Figure 5 , we can see that when is greater than about 100, the change in the number of differential edges began to be stable. Therefore, we selected 100 as the predetermined value for threshold . For the Nonconvex and Convex methods, the tuning parameter was selected by minimizing a prediction criterion by using five fold crossvalidation as described in Zhu and Li (2018). Finally, there were totally 105 differential edges identified by SDNCMV, 3316 differential edges identified by Nonconvex and 14826 differential edges identified by Convex. The convex and nonconvex methods select so many differential edges that we only show the top 10% edges with the largest absolute values in Figure 7. As we can see from Figure 7, the differential edges identified by the Nonconvex and Convex methods were too dense to be biologically interpretable. When examining the 105 differential edges identified by the proposed SDNCMV, we find that they mainly involve nodes located in the SomatomotorHand, Default Mode, and Cinguloopercular modules. In Table 4, we showed the top 15 differential edges identified by SDNCMV and their number of occurrences over 200 bootstrap replicates. The SomatomotorHand is the hand movement controlling part of the somatic motor area which occupies most of the central anterior gyrus and executes movements selected and planned by other areas of the brain. Suva et al. (1999) indicated that the Somatomotor area significantly affected AD and suggested that motor dysfunction occurs in late and terminal stages of AD. The default mode is a group of brain regions that show lower levels of activity when we are engaged in a particular task like paying attention, but higher levels of activity when we are awake and not involved in any specific mental exercise. Abundance of literature has shown that default mode changes are closely related to AD (Grieder et al., 2018; Banks et al., 2018; Pei et al., 2018). Cinguloopercular are composed of anterior insula/operculum, dorsal anterior cingulate cortex, and thalamus, and the function of Cinguloopercular has been particularly difficult to characterize due to the network’s pervasive activity and frequent coactivation with other controlrelated networks. Nevertheless, some scholars have studied its relationship with Alzheimer’s disease. For example, Tumati et al. (2020) found that loss of segregation in Cinguloopercular network was associated with apathy in AD and suggested that networklevel changes in AD patients may underlie specific neuropsychiatric symptoms. In summary, the findings from the proposed SDNCMV are consistent with evidences from a wide range of neuroscience and clinical studies.
5 Summary and Discussions
In the article, we focus on the network comparison and twoclass classification problem for matrixvalued fMRI data. We propose an effective method which fulfills the goals of network comparison and classification simultaneously. Numerical study shows that the SDNCMV performs advantageously than the stateoftheart methods for classification and network comparison methods. We also illustrate the utility of the SDNCMV by analyzing the fMRI Data of Alzheimer’s disease. Our method can be widely applied in brain connectivity alteration detection and imageguided medical diagnoses of complex diseases.
The matrix normal assumption of the matrixvalued data can be relaxed so that we only assume a Kronecker product covariance structure. In the case that outliers or heavytailedness of data exist, we can propose a similar robust version of SDNCMV. In detail, the sample covariance matrix in (2.1) can be replaced by some robust estimators such as the adaptive Huber estimator and the median of means estimator (AvellaMedina et al., 2018). Another way to relax the matrixnormal assumption is the matrixnonparanormal distribution in Ning and Liu (2013), which can be viewed as a latentvariable model. That is, the latent variables follow a matrixnormal distribution and must be symmetric, while the observed variables need not be symmetric. In this case, we adopt the Kendall’s tau based correlation estimators in (2.1), which has been widely studied in literature such as Liu et al. (2012); He et al. (2018b); Fan et al. (2018); He et al. (2019); Yu et al. (2019), etc.
In the secondstage model, we focus on a binary outcome response and adopt the logistic regression for classification. It is straightforward to extend to a general clinical outcome response with a more flexible Generalized Linear Model (GLM).
Some future work remains to develop inferential tools on the significance of a selected edge, e.g., in both global testing and multiple testing problems based on the logistic regression model in (2.3).
Acknowledgements
This work was supported by grants from the National Natural Science Foundation of China [grant number 11801316, 81803336, 11971116]; Natural Science Foundation of Shandong Province [grant number ZR2018BH033, ZR2019QA002], and National Center for Advancing Translational Sciences (grant number UL1TR002345 to L.L. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative (ADNI) (National Institutes of Health Grant U01 AG024904) and DOD ADNI (Department of Defense award number W81XWH1220012). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc.; Biogen; BristolMyers Squibb Company; CereSpir, Inc.; Cogstate; Eisai Inc.; ElanPharmaceuticals, Inc.; Eli Lilly and Company; EuroImmun; F. HoffmannLa Roche Ltd and its affiliated company Genentech, Inc.; Fujirebio; GE Healthcare; IXICO Ltd.; Janssen Alzheimer Immunotherapy Research Development, LLC.; Johnson Johnson Pharmaceutical Research Development LLC.; Lumosity; Lundbeck; Merck Co., Inc.; Meso Scale Diagnostics, LLC.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; andTransition Therapeutics. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada. Private sector contributions are facilitated by the Foundation for the National Institutes of Health (www.fnih.org). The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California. ADNI data are disseminated by the Laboratory for NeuroImaging at the University of SouthernCalifornia.).
References
 AlexanderBloch et al. (2013) AlexanderBloch, A., Giedd, J.N., Bullmore, E., 2013. Imaging structural covariance between human brain regions. Nature Reviews Neuroscience 14, 322–336.
 AvellaMedina et al. (2018) AvellaMedina, M., Battey, H.S., Fan, J., Li, Q., 2018. Robust estimation of highdimensional covariance and precision matrices. Biometrika 105, 271–284.
 Banks et al. (2018) Banks, S.J., Zhuang, X., Bayram, E., Bird, C., Cordes, D., Caldwell, J.Z.K., Cummings, J.L., for the Alzheimer’s Disease Neuroimaging Initiative, 2018. Default mode network lateralization and memory in healthy aging and alzheimer’s disease 66, 1223–1234. doi:10.3233/JAD180541.

Bickel and Levina (2004)
Bickel, P.J., Levina, E.,
2004.
Some theory for fisher’s linear discriminant function,‘naive bayes’, and some alternatives when there are many more variables than observations.
Bernoulli 10, 989–1010.  Cai and Liu (2011) Cai, T., Liu, W., 2011. A direct estimation approach to sparse linear discriminant analysis. Journal of the American Statistical Association 106, 1566–1577.
 Cai et al. (2011) Cai, T., Liu, W., Luo, X., 2011. A constrained minimization approach to sparse precision matrix estimation. Journal of the American Statistical Association 106, 594–607.
 Chen et al. (2011) Chen, Z.J., He, Y., RosaNeto, P., Gong, G., Evans, A.C., 2011. Agerelated alterations in the modular organization of structural cortical network by using cortical thickness from MRI. Nuroimage 56, 235–245.
 Fan and Fan (2008) Fan, J., Fan, Y., 2008. High dimensional classification using features annealed independence rules. Annals of statistics 36, 2605–2637.
 Fan et al. (2018) Fan, J., Liu, H., Wang, W., 2018. Large covariance estimation through elliptical factor models. The Annals of Statistics 46, 1383–1414.
 Fan and Lv (2008) Fan, J., Lv, J., 2008. Sure independence screening for ultrahigh dimensional feature space. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 70, 849–911.
 Friedman et al. (2008) Friedman, J., Hastie, T., Tibshirani, R., 2008. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9, 432–441.
 Grieder et al. (2018) Grieder, M., Wang, D.J.J., Dierks, T., Wahlund, L.O., Jann, K., 2018. Default mode network complexity and cognitive decline in mild alzheimer’s disease. Frontiers in Neuroscience 12, 770. doi:10.3389/fnins.2018.00770.
 Grimes et al. (2019) Grimes, T., Potter, S.S., Datta, S., 2019. Integrating gene regulatory pathways into differential network analysis of gene expression data. Scientific reports 9, 5479.
 He et al. (2018a) He, Y., Ji, J., Xie, L., et al., 2018a. A new insight into underlying disease mechanism through semiparametric latent differential network model. BMC Bioinformatics 19, 493.

He et al. (2019)
He, Y., Zhang, L., Ji,
J., Zhang, X., 2019.
Robust feature screening for elliptical copula
regression model.
Journal of Multivariate Analysis 173, 568–582.
 He et al. (2016) He, Y., Zhang, X., Wang, P., 2016. Discriminant analysis on high dimensional gaussian copula model. Statistics & Probability Letters 117, 100–112.
 He et al. (2018b) He, Y., Zhang, X., Zhang, L., 2018b. Variable selection for high dimensional gaussian copula regression model: An adaptive hypothesis testing procedure. Computational Statistics Data Analysis 124.
 Higgins et al. (2019) Higgins, I.A., Kundu, S., Choi, K.S., Mayberg, H.S., Guo, Y., 2019. A difference degree test for comparing brain networks. Human Brain Mapping 40, 4518–4536.
 Ji et al. (2017) Ji, J., He, D., Feng, Y., et al., 2017. Jdinac: joint densitybased nonparametric differential interaction network analysis and classification using highdimensional sparse omics data. Bioinformatics 33, 3080–3087.
 Ji et al. (2016) Ji, J., Yuan, Z., Zhang, X., et al., 2016. A powerful scorebased statistical test for group difference in weighted biological networks. BMC bioinformatics 17, 86.
 Kemmer et al. (2015) Kemmer, P.B., Guo, Y., Wang, Y., Pagnoni, G., 2015. Networkbased characterization of brain functional connectivity in zen practitioners. Frontiers in psychology 6.
 Leng and Tang (2012) Leng, C., Tang, C.Y., 2012. Sparse matrix graphical models. Journal of the American Statistical Association 107, 1187–1200.
 Li and Yuan (2005) Li, M., Yuan, B., 2005. 2DLDA: A statistical linear discriminant analysis for image matrix. Pattern Recognition Letters 26, 527–532.
 Liu et al. (2012) Liu, H., Han, F., Yuan, M., et al., 2012. Highdimensional semiparametric gaussian copula graphical models. The Annals of Statistics 40, 2293–2326.
 Ma et al. (2020) Ma, R., Cai, T.T., Li, H., 2020. Global and simultaneous hypothesis testing for highdimensional logistic regression models. Journal of the American Statistical Association , in press.
 Mai et al. (2012) Mai, Q., Zou, H., Yuan, M., 2012. A direct approach to sparse discriminant analysis in ultrahigh dimensions. Biometrika 99, 29–42.
 Molstad and Rothman (2019) Molstad, A.J., Rothman, A.J., 2019. A penalized likelihood method for classification with matrixvalued predictors. Journal of Computational & Graphical Statistics 28, 11–22.
 Ning and Liu (2013) Ning, Y., Liu, H., 2013. Highdimensional semiparametric bigraphical models. Biometrika 100, 655–670.
 Park Mee and Hastie (2008) Park Mee, Y., Hastie, T., 2008. Penalized logistic regression for detecting gene interactions. Biostatistics 9, 30–50.
 Pei et al. (2018) Pei, S., Guan, J., Zhou, S., 2018. Classifying early and late mild cognitive impairment stages of alzheimer’s disease by fusing default mode networks extracted with multiple seeds. BMC Bioinformatics 19, 523. doi:10.1186/s1285901825280.
 Peng et al. (2009) Peng, J., Wang, P., Zhou, N., et al., 2009. Partial correlation estimation by joint sparse regression models. Publications of the American Statistical Association 104, 735–746.
 Power et al. (2011) Power, J.D., Cohen, A.L., Nelson, S.M., Wig, G.S., Barnes, K.A., Church, J.A., Vogel, A.C., Laumann, T.O., Miezin, F.M., Schlaggar, B.L., et al., 2011. Functional network organization of the human brain. Neuron 72, 665–678.
 Ryali et al. (2012) Ryali, S., Chen, T., Supekar, K., Menon, V., 2012. Estimation of functional connectivity in fMRI data using stability selectionbased sparse partial correlation with elastic net penalty. NuroImage 59, 3852–3861.
 Satterthwaite et al. (2014) Satterthwaite, T.D., Wolf, D.H., Roalf, D.R., Ruparel, K., Erus, G., Vandekar, S., Gennatas, E.D., Elliott, M.A., Smith, A., Hakonarson, H., et al., 2014. Linked sex differences in cognition and functional connectivity in youth. Cerebral cortex 25, 2383–2394.
 Suva et al. (1999) Suva, D., Favre, I., Kraftsik, R., Esteban, M., Lobrinus, A., Miklossy, J., 1999. Primary motor cortex involvement in alzheimer disease. Journal of Neuropathology & Experimental Neurology 58, 1125–1134. doi:10.1097/0000507219991100000002.
 Tian et al. (2016) Tian, D., Gu, Q., Ma, J., 2016. Identifying gene regulatory network rewiring using latent differential graphical models. Nucleic Acids Research 44, e140.
 Tumati et al. (2020) Tumati, S., Marsman, J.B.C., Deyn], P.P.D., Martens, S., Aleman, A., 2020. Functional network topology associated with apathy in alzheimer’s disease. Journal of Affective Disorders 266, 473 – 481. doi:https://doi.org/10.1016/j.jad.2020.01.158.
 Wang et al. (2016) Wang, Y., Jian, K., B., K.P., Ying, G., 2016. An efficient and reliable statistical method for estimating functional connectivity in large scale brain networks using partial correlation. Frontiers in Neuroscience 10, 123.
 Wieringen and Peeters (2016) Wieringen, W.N.V., Peeters, C.F.W., 2016. Ridge estimation of inverse covariance matrices from highdimensional data. Computational Statistics & Data Analysis 103, 284–303.
 Xia et al. (2013) Xia, M., Wang, J., He, Y., 2013. Brainnet viewer: a network visualization tool for human brain connectomics. PloS one 8, e68910–e68910. doi:10.1371/journal.pone.0068910.
 Xia and Li (2017) Xia, Y., Li, L., 2017. Hypothesis testing of matrix graph model with application to brain connectivity analysis. Biometrics 73, 780–791.
 Xia and Li (2019) Xia, Y., Li, L., 2019. Matrix graph hypothesis testing and application in brain connectivity alternation detection. Statistica Sinica 29, 303–328.
 Xie et al. (2020) Xie, S., Li, X., McColgan, P., Scahill, R.I., Wang, Y., 2020. Identifying diseaseassociated biomarker network features through conditional graphical model. Biometrics, in press .
 Yin and Li (2012) Yin, J., Li, H., 2012. Model selection and estimation in the matrix normal graphical model. Journal of Multivariate Analysis 107, 119.
 Yu et al. (2019) Yu, L., He, Y., Zhang, X., 2019. Robust factor number specification for largedimensional elliptical factor model. Journal of Multivariate Analysis 174, 104543.
 Yuan et al. (2015) Yuan, H., Xi, R., Deng, M., 2015. Differential network analysis via the lasso penalized Dtrace loss. Biometrika 104, 755–770.
 Yuan et al. (2016) Yuan, Z., Ji, J., Zhang, X., et al., 2016. A powerful weighted statistic for detecting group differences of directed biological networks. Scientific reports 6, 34159.
 Zhang et al. (2017) Zhang, X.F., OuYang, L., Yan, H., 2017. Incorporating prior information into differential network analysis using nonparanormal graphical models. Bioinformatics 33, 2436.
 Zhao et al. (2012) Zhao, T., Liu, H., Roeder, K., Lafferty, J., Wasserman, L., 2012. The huge package for highdimensional undirected graph estimation in r. Journal of Machine Learning Research 13, 1059–1062.
 Zhong and Suslick (2015) Zhong, W., Suslick, K.S., 2015. Matrix discriminant analysis with application to colorimetric sensor array data. Technometrics 57, 524–534.
 Zhou and Li (2014) Zhou, H., Li, L., 2014. Regularized matrix regression. Journal of the Royal Statistical Society: series B (Statistical Methodology) 76, 463–483.
 Zhou (2014) Zhou, S., 2014. Gemini: Graph estimation with matrix variate normal instances. Annals of Statistics 42, 532–562.
 Zhu and Hastie (2004) Zhu, J., Hastie, T., 2004. Classification of gene microarrays by penalized logistic regression. Biostatistics , 427–443.
 Zhu and Li (2018) Zhu, Y., Li, L., 2018. Multiple matrix gaussian graphs estimation. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 80, 927–950.
 Zou and Hastie (2005) Zou, H., Hastie, T., 2005. Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (Statistical Methodology) 67, 301–320.
Comments
There are no comments yet.