Causal inference of brain connectivity from fMRI with ψ-Learning Incorporated Linear non-Gaussian Acyclic Model (ψ-LiNGAM)

06/16/2020 ∙ by Aiying Zhang, et al. ∙ Tulane University 0

Functional connectivity (FC) has become a primary means of understanding brain functions by identifying brain network interactions and, ultimately, how those interactions produce cognitions. A popular definition of FC is by statistical associations between measured brain regions. However, this could be problematic since the associations can only provide spatial connections but not causal interactions among regions of interests. Hence, it is necessary to study their causal relationship. Directed acyclic graph (DAG) models have been applied in recent FC studies but often encountered problems such as limited sample sizes and large number of variables (namely high-dimensional problems), which lead to both computational difficulty and convergence issues. As a result, the use of DAG models is problematic, where the identification of DAG models in general is nondeterministic polynomial time hard (NP-hard). To this end, we propose a ψ-learning incorporated linear non-Gaussian acyclic model (ψ-LiNGAM). We use the association model (ψ-learning) to facilitate causal inferences and the model works well especially for high-dimensional cases. Our simulation results demonstrate that the proposed method is more robust and accurate than several existing ones in detecting graph structure and direction. We then applied it to the resting state fMRI (rsfMRI) data obtained from the publicly available Philadelphia Neurodevelopmental Cohort (PNC) to study the cognitive variance, which includes 855 individuals aged 8-22 years. Therein, we have identified three types of hub structure: the in-hub, out-hub and sum-hub, which correspond to the centers of receiving, sending and relaying information, respectively. We also detected 16 most important pairs of causal flows. Several of the results have been verified to be biologically significant.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Functional magnetic resonance imaging (fMRI) has been commonly used to noninvasively detect human brain activity by measuring the blood oxygenation-level dependent (BOLD) changes. In the past decades, a large number of models have been developed to analyze brain functional connectivity (FC) using fMRI. By incorporating graph theory, the brain can be described as a graph, where a node represents a well-defined region of interest (ROI) and an edge represents a functional interconnection between the nodes (ROIs) it connects. Then, the brain FC can be mathematically estimated and quantified. Based on the representation form of the graph, methods for FC estimation can be roughly divided into two categories: undirected graphical models, and directed acyclic graph (DAG) models. The undirected graphical models characterize the brain FC through the association coefficients among various ROIs, which reflect their spatial relationships. For instance, the Pearson correlation is a typical measurement in undirected graphical models from static to dynamic FC estimations

[1, 2], which describes the linear correlation between the ROIs. In a complex system like the brain, the Pearson correlation is much weaker marginally [3]

. That is, all nodes (ROIs) are directly or indirectly correlated and it is difficult to distinguish significant connections through a dense network constructed by Pearson correlations. To this end, partial correlations have been proposed to explore direct associations between two nodes, removing the effect of other random variables

[4, 5]. Approaches to characterize statistical associations are often used for estimating brain network interactions, but fail to reveal the direction of information flow behind these interactions. It is natural to shift the focus of FC from association to causal interactions for in-depth research [6].

Directed acyclic graph (DAG) models, also known as Bayesian networks, are designed to model causal relationships in complex systems. Causality models pinpoint the key connectivity characteristics and simultaneously remove some redundant features for diagnosis. Methods for DAG identification can be divided into four categories: constraint-based methods, score-based methods, non-Gaussian based methods, and hybrids of these categories. Constraint-based methods, such as the prominent PC algorithm [7], first learn the skeleton of the DAG from conditional independence relationships and then orient the edges. Score-based methods, on the other hand, posit a scoring criterion for each possible DAG model, and then search for the graph with the highest score given the observations. An example is the greedy equivalence search (GES) algorithm proposed by Chickering [8], which greedily optimizes the -penalized likelihood such as the Bayesian Information Criterion (BIC). Both the constraint-based and the score-based methods are not optimal for predicting the direction of causal relationships but are accurate in identifying the causal skeleton (graph structure without directions) in fMRI study [9]. The non-Gaussian based methods, which refer to the Linear Non-Gaussian Acyclic Model (LiNGAM), aim to estimate linear Bayesian networks for continuous data using the non-Gaussian information. The key aspect of LiNGAM is the use of non-Gaussian data, which makes it possible to identify more of the graph structure than the traditional Gaussian setting. Several methods have been proposed in the literature, such as ICA-LiNGAM [10], direct LiNGAM [11], and pairwise LiNGAM [12]. However, compared to the other two categories of methods, LiNGAM requires a larger number of data points in the relevant dimension to converge to the true graph.

In biomedical applications, we are often faced with small sample size problems where the number of variables/nodes greatly exceeds the number of samples/observations, i.e., high dimensional cases. As a result, both computational difficulty and convergence issue often arise based only on the observations. This is especially the case for the estimation of DAG models, which is in general nondeterministic polynomial time hard (NP-hard) [13]. To this end, we propose a -learning incorporated linear non-Gaussian acyclic model (-LiNGAM), particularly for high-dimensional cases. The -learning method [3] was first proposed to fast estimate the undirected graph with the equivalent partial correlations,i.e., the -correlation. The method works well especially for high-dimensional cases, with successful applications to gene regulatory network inference [14] and brain FC analysis by us recently [4]. Herein, we apply the -learning method to refine the set of likely causal connectivity, facilitating causal inferences with fast computation. Following the same idea of LiNGAM (e.g., [11]), we assume that the observations are non-Gaussian and continuous, and incorporate prior information from direct LiNGAM into the DAG estimation. To acquire the prior knowledge, we estimate the undirected graph first, since the skeleton of the directed graph is always included in it. To be more specific, we apply the nonparanormal transformation [15] first, which converts the problem to a Gaussian graphical model (GGM), and then adopt the -learning to identify the undirected graph structure. The contributions of this paper are generally two-folds. Mathematically, we overcome the high-dimensionality difficulty of the LiNGAM methods and increase the convergency speed to the true DAG, even under the condition of small sample size. Biomedically, we apply the proposed model to the resting-state fMRI (rsfMRI) data from PNC, aiming to explain the cognitive variance through the directed functional connectivity differences. Unlike previous studies using association models [1, 4], our work is able to orient the causal directions and identify three types of hub structures: in-hub, out-hub and sum-hub from the directed graphs.

The remainder of this paper is organized as follows. In Section II, we first introduce some background knowledge of directed graphs and then describe the proposed -LiNGAM method. Both simulation studies and the rsfMRI analysis using PNC data are shown in Section III, followed by some discussions and concluding remarks in the last two sections.

Ii Methods

In this section, we first introduce some basic concepts of DAGs and the relationship between directed and undirected graphs in Section II.A. Then we describe the LiNGAM and the method for prior knowledge estimation in Section II.B and II.C, respectively. Finally, we summarize the -LiNGAM algorithm in Section II.D.

Ii-a Background: directed graph

A graph is composed of a node set and an edge set . In our setting, the nodes in set

correspond to the components of a random vector

. An edge is directed if but , and we denote it as and node is called a parent of node . A directed acyclic graph (DAG) means all the edges in are directed and there is no circle in graph (see Fig. 1 (a)). An edge is undirected if and , and we denote it as . An undirected graph is composed of undirected edges. The skeleton of a DAG , , is the undirected graph obtained from by substituting undirected edges for directed edges (see Fig. 1 (b)).

Fig. 1: Illustrations of the concepts in directed graphs. (a) gives an example of the cyclic and acyclic graphs. (b) shows the relationships between a DAG and its corresponding skeleton and moral graph . (c) is the Venn diagram describing the relationships in a network system and the representations of the directed and undirected graphs of the system. D represents the directed graph and U represents the undirected graph.

The two graphical structures, directed and undirected graphs, express different independence properties in a system. Fig. 1 (c) gives an illustration of a directed/ undirected graph whose independence properties cannot be expressed by each other. A directed graph can be converted into an undirected graph, called a moral graph [16]. In general, we first add additional undirected links between all pairs of parents for each node in the graph and then drop the arrows on the original links to give the moral graph . The relationship among the skeleton of the DAG , the moral graph and the undirected graph is .


The linear non-Gaussian acyclic model (LiNGAM) was first proposed by Shimizu et al. [10], which is a Bayesian network (BN) using a structural equation model (SEM) for non-Gaussian variants. LiNGAM assumes that the casual relationships of the variables can be represented graphically by a DAG , where the node set represents the corresponding variables and denotes the directed edges. Let be the weighted adjacency matrix specifying the edge weights of the underlying DAG . The observed random vector is assumed to be generated from the following linear SEM,

where is a continuous random vector; the ’s,

have non-Gaussian distributions with non-zero variances, and are independent of each other.

A property of acyclicity is that there exists at least one permutation of variables such that , . In other words, the weight matrix can be reordered to a strictly lower triangular matrix according to the permutation . The goal of LiNGAM is to find the correct permutation and estimate the weight matrix . Since the components of are independent and non-Gaussian, Shimizu et al. [10]

first proposed the independent component analysis (ICA) based algorithm known as ICA-LiNGAM. However, like most ICA algorithms, the convergency of ICA-LiNGAM is affected by the initial state, and an appropriate parameter selection is not straightforward. The direct LiNGAM

[11] method has been developed without the need for initial guess or algorithmic parameters, which also has the guaranteed convergence. However, it requires a large number of samples to reach the true graph and the computational time is much longer than the ICA-LiNGAM. Fortunately, the direct LiNGAM is flexible to incorporate prior knowledge. With a proper selection of priors, both the convergence and the computation can be largely accelerated due to the significant shrinkage of searching space.

Ii-C Prior knowledge estimation

We have discussed the relationship between directed graphs and undirected graphs in Section II.A. The undirected graph can be treated as the upper bound of the DAG, which can be estimated first as the prior information. Our goal in this step is to eliminate as many confounding edges as possible and only keep the direct associated edges, and therefore we select partial correlation coefficients as the measure of dependencies. Gaussian graphical models (GGMs) have become a popular tool for the estimation of undirected graphs using partial correlation coefficients due to their mathematical simplicity. To be more specific, let denote a -dimensional random vector following a multivariate Gaussian distribution , where and denote the unknown mean and the covariance matrix, respectively. It has been proven that the partial correlation between and on the condition of all other variables can be expressed as


where is the precision matrix (i.e., the inverse of the covariance matrix ), denotes the (,)th entry of and denotes the index set of variables [17], [18]. Thus, the construction of undirected graphs using GGMs is equivalent to the estimation of the precision matrices.

In our model, the variables are assumed to be non-Gaussian continuous, thus, the original GGMs with Gaussian assumption cannot be directly applied. To this end, we adopt the nonparanormal transformation proposed by Liu et al. [15], known as a semiparametric Gaussian copula model, to render data normal. Subsequently, we adopt the -learning method [3] to infer the GGM structure due to its computational efficiency and flexible framework as demonstrated by our previous work [4].


We now illustrate the -LiNGAM we proposed for high dimensional DAG estimation. First, we follow the same definition of the prior matrix as in [11], which is given as follows:


The procedure of the -LiNGAM algorithm is summarized in Algorithm 1. The nonparanormal transformation can be implemented through the R package huge. The R package equSA is designed for the -learning method. Since we intend to find the upper bound of the DAG, we set a broader significance level with . The code for direct LiNGAM is publicly available at

Observed sample vectors , where the ’s are non-Gaussian continuous.
Estimated weighted DAG adjacency matrix
1. Prior knowledge estimation: undirected graph structure estimation.
  a. Use the nonparanormal transformation [15] to render normal (Gaussian).
  b. Apply -learning method [3] to infer the GGM structure and acquire the edge set of the undirected graph.
  c. Extract the prior matrix from , where , if and otherwise .
2. Identify the casual order using the direct LiNGAM with the prior matrix [11].
3. Obtain the estimated weighted DAG adjacency matrix .
  a. Construct a strictly lower triangular matrix by following the causal order , and the corresponding with the same order.
  b. Estimate the connection strengths consistent with by solving sparse regressions of the form
  c. Obtain by converting to the original order.
Algorithm 1 -LiNGAM algorithm

Iii Results

Iii-a Simulation studies

In this section, we evaluated our proposed model through a series of simulation studies. We simulated the random DAG through the R package pcalg

with the edge probability

, where is an edge degree parameter and is the total number of variables. Given , we assigned uniformly random weights to the edges to obtain the weighted adjacency matrix : , if ; otherwise . Given , we generated from two non-Gaussian noise selections: Exponential (Exp) and Chi-squared (Chisq) noises. The exponential noise is set to have rate , i.e., ,

, and the chi-squared noise is set to be central with degree of freedom

, i.e. , . We illustrated the performance of the proposed -LiNGAM under the cases of small sample size and high dimensionality. We set and sampled the random vectors for each noise selection with and the degree parameter . For each scenario, we simulated 10 datasets independently. We assessed the performance of the four methods through the true positive rate (TPR), false discovery rate (FDR), and structural hamming distance (SHD) [19]. SHD is a commonly used metric based on the number of operations needed to transform the estimated DAG into the true graph [20]. In simple terms, SHD counts the total number of edge insertions, deletions or flips in the transformation.

(a) Subfigure 1 list of figures text
(b) Subfigure 1 list of figures text
Fig. 2: Simulation results under small sample size and high-dimensional cases, which represent the average performance in terms of TPR, FDR and SHD under various variable (), degree parameter () and noise (Exp, Chisq) settings with sample size .

Fig. 2 shows the performance of -LiNGAM under small sample size and high-dimensional settings with (i.e., ). As the graph gets denser ( increases) or larger ( increases), although the TPR descends the FDR remains at a low range. To further demonstrate the superiorty of the -LiNGAM, we compared it with existing methods, which were the PC algorithm [7], GES [8], the ICA-LiNGAM [10] and the direct LiNGAM [11] with (i.e., ) in Appendix.A. (The two LiNGAM methods cannot be applied for high-dimensional cases.) From the results between -LiNGAM and direct LiNGAM, -LiNGAM has significantly solved the convergence speed issue. In fact, the -LiNGAM has outperformed the other methods under each setting, especially under large variable numbers and/or low degree parameter settings (see Appendix.A Figure A, B). Compared to direct LiNGAM, the computational complexity of -LiNGAM is also favorable. The mean CPU times of the two methods under various variable sizes ’s on a 4 GHz computer are shown in TABLE I.

TABLE I: The mean CPU times (in seconds) for under various variable sizes ’s.

Iii-B Brain network analysis based on fMRI

Iii-B1 Data description

In this work, we intend to investigate the relationship between the variance of cognitive abilities and the brain functional activity. The dataset we used is the publicly available Philadelphia Neurodevelopmental Cohort (PNC), which is a large-scale collaborative study of the Brain Behavior Laboratory at the University of Pennsylvania and the Children’s Hospital of Philadelphia [21]. The data contains (among other fMRI modalities [22]) rsfMRI for nearly subjects aged from to years old. All MRI scans were acquired on a single 3T Siemens TIM Trio whole-body scanner. Blood oxygenation level-dependent (BOLD) fMRI was acquired using a whole-brain, single-shot, multislice, gradient-echo (GE) echoplanar (EPI) sequence of volumes (372s) with the following parameters TR/TE=3000/32 ms, flip , FOV mm, matrix , slice thickness/gap . The resulting nominal voxel size was mm [22]. Standard preprocessing steps were applied using SPM12 (, including motion correction, spatial normalization to standard Montreal Neurological Institute (MNI) space, and spatial smoothing with a mm full width at half max (FWHM) Gaussian kernel. The effect of head motion ( parameters) was further regressed out and BOLD time series went through a band-pass filter ( Hz to Hz) to suppress physiological artifacts [23]. Finally, we reduced the dimensionality of the data by employing the standard ROIs template proposed by Power et al. [24] with a mm sphere radius. For the ROIs, we divide them into 12 functional network (FN) modules, including sensory/somatomotor network (SSN), cingulo-opercular task control network (CON), auditory network (AUD), default mode network (DMN), memory retrieval network (MRN), visual network (VN), fronto-parietal task control network (FPN), salience network (SN), subcortical network (SCN), ventral attention network (VAN), dorsal attention network (DAN) and cerebellum network (CERE).

Beyond imaging, all subjects underwent a -hour long computerized neurocognitive battery (CNB) adapted from tasks applied in functional neuroimaging studies to evaluate a broad range of cognitive domains [25]. For this work, we relied on performance scores (ratio of total correct responses) from the wide range achievement test (WRAT), which measures an individual’s learning ability (reading, spelling, and mathematics) [26], and provides an estimate of intelligence quotient (IQ). To mitigate the influence of age over the final results, we excluded subjects whose ages were below 18 years old [27]

. We then converted the WRAT scores to z-scores based upon each subject’s raw score and the sample mean in order to provide a standard metric. We only kept subjects whose absolute z-score values were above 0.5, i.e.,

, to better compare and explain the brain cognitive differences. Finally, subjects were left for analysis and divided into 2 groups according to the WRAT scores (high WRAT group: subjects/low WRAT group: subjects). The detailed demographic information is summarized in TABLE II.

Group Mean WRAT (SD) Age Range Sex Ratio (M:F)

Characteristics of the subjects in this study. Here SD denotes the standard deviation, M and F represent male and female respectively.

Iii-B2 Results

First of all, we calculated the Anderson–-Darling score as suggested in [28] to ensure that the rsfMRI data meet our non-Gaussianity assumption. For each subject, we then applied the -LiNGAM method, and obtained the corresponding weighted directed adjacency matrix at the significance level (with FDR correction). We extracted three network statistics, as suggested in [5], i.e., density, transitivity, and global efficiency (GE), to reflect various aspects of the directed brain functional network. Density is the fraction of present connections to all possible connections, which also shows the mean network degree. The degree of a node is defined as the number of edges connected to the node. For a directed graph, the degree of a node is composed of two elements: in-degree and out-degree, which are named after the directions of the edges. Transitivity calculates the ratio between the number of triangles and the number of connected node triples. Transitivity is a measure of the brain functional segregation, i.e., the ability for specialized processing to occur within densely interconnected groups of brain regions [29]. GE calculates the average inverse shortest path length, which indicates the ability to combine specialized information from distributed brain regions, also known as the functional integration ability. The network statistics are calculated through the Brain Connectivity Toolbox ( In TABLE III, we find that although the brain network density of the low WRAT group is much larger than that of the high WRAT group, there are no big differences between them according to transitivity and global efficiency.

Group Density Transitivity Global Efficiency
TABLE III: Network statistics

To identify group structures, we conducted a one-sample t-test for each group to select the significant edges at significance level

and set threshold of the weights (i.e., the mean weights should be above ), which guarantee the results to be both statistically significant and numerically noticeable. Finally, and pairs of directed brain connections were identified for the high and low WRAT groups, respectively (see Fig. 3). To gain more insights into the obtained networks, we analyzed the inferred hub nodes in three ways: in-hubs, out-hubs and sum-hubs. Here we define hubs as the nodes with degrees at least two standard deviation higher than the mean degrees ([30]). The in-hubs (centers that receive information) and out-hubs (central nodes that convey out information) are selected through the in- and out- degrees, respectively, while the sum-hubs are the ones that do not belong to the in- and out- hubs but their sum degrees are high, i.e., transitional centers. As shown in TABLE IV, there is one in-hub ROI located in the right inferior frontal gyrus, triangular (IFGT.R) and one out-hub ROI located in the right middle occipital gyrus (MOG.R) for the high WRAT group. For the low WRAT group, in-hub ROIs have been selected, which are located in the right middle frontal gyrus (MFG.R), right precentral gyrus (PREG.R), right inferior temporal gyrus (ITG.R), and right middle temporal gyrus (MTG.R); out-hub ROI is located at the right superior temporal gyrus (STG.R). There are no sum-hubs identified for both groups.

(a) Subfigure 1 list of figures text
(b) Subfigure 1 list of figures text
Fig. 3: Directed brain connectivity for each group, where the arrows indicate the causal flow.
High WRAT Group
Type Index MNI AR FN Degree
In-hub 175 IFGT.R FPN
Out-hub 157 MOG.R VN
Low WRAT Group
Type Index MNI AR FN Degree
In-hub 206 MFG.R SN
205 PRE.R SN
11 ITG.R -
84 MTG.R -
Out-hub 235 STG.R VAN
TABLE IV: The hub ROIs from the identified directed network of each group, where AR represents the anatomical region.

To further compare the differences between the high and low WRAT groups, we conducted a Welch two sample t-test at significance level and Cohen’s d effect size statistics [31]. According to Cohen [32], the difference is negligible when . Thus, we chose and selected significant edges that perform differently between the two groups. The numbers of edges (features) for various strength levels of difference are given in TABLE V. The hub ROIs discovered with have been summarized in TABLE VI. Among them, in-hub ROIs are located at the right postcentral gyrus (POST.G), left precuneus (PQ.L), right precuneus (PQ.R) and left middle occipital gyrus (MOG.L); out-hub ROIs are located at the left rolandic operculum (RO.L) and right putamen (PUT.R); sum-hub ROIs are located at the right lingual gyrus (LING.R), left calcarine sulcus (CAL.L), and right thalamus (THA.R).

# of edges
TABLE V: The numbers of edges for different Cohen’s d statistics thresholds.
Type Index MNI AR FN Degree
In-hub 25 POST.R SSN
251 PQ.R DAN
147 MOG.L VN
Out-hub 70 RO.L AUD
Sum-hub 148 LING.R VN
146 CAL.L VN
TABLE VI: The hub ROIs of the different connections selected from threshold , where AR represents the anatomical region.

Iii-B3 Support Vector Machine based Classification

With the available phenotype information (WRAT score high/low), we did classification based on our selected features (significantly different connections) as a validation to our discovery. Specifically, we applied linear support vector machine (SVM) with default parameter value

[33] and compared the results with various inputs, which are given as follows:

  1. Pearson correlation ([1]): Calculate the Pearson correlation matrix for each subject and use the upper triangular elements of the matrix as the input.

  2. Partial correlation ([3]): Apply the -learning method to calculate the high-dimensional partial correlation matrix and use the upper triangular elements of the matrix as the input.

  3. PC ([7]): Apply the PC-algorithm and acquire the adjacency matrix of the completed partially directed acyclic graph (CPDAG), which contains both undirected and directed edges. Use all the elements from the adjacency matrix as the input.

  4. -LiNGAM: Apply the proposed -LiNGAM method to each subject and acquire the weighted adjacency matrix . Use all the elements from as the input.

  5. : Apply the proposed -LiNGAM method to each subject first, then compare the differences between the two groups as described in III.B 2). Set

    as the threshold for the feature selection (significantly different connections) and only choose the elements of the selected features from

    as the input.

  6. : Same as e., except for setting as the threshold.

  7. : Same as e., except for setting as the threshold.

  8. : Same as e., except for setting as the threshold.

All the algorithms were implemented in R. The -learning method is available in the R package equSA. The PC-algorithm was implemented through the R package pcalg. The SVM algorithm was implemented through the R package e1071. A 5-fold cross-validation (CV) procedure was implemented to evaluate the classification performance in all experiments. All subjects were randomly partitioned into 5 disjoint subsets with similar class distributions and size

. Each subset in turn was used as the test set and the remaining 4 subsets were used to train the SVM classifier. The classifier accuracy was estimated by comparing against the ground-truth labels on the test set. We then averaged the 5 individual accuracy measures as one test result from the CV. The whole process was repeated 20 times to acquire the final mean accuracy rate (ACC) for each input. In TABLE VII, we find that generally using the directed information (PC,

-LiNGAM) as inputs is better than using the undirected associations, however, the improvement is not very big. We then selected important features based on the different connections and set various thresholds to compare classification results. After feature selection from the -LiNGAM result, the ACC has been improved. Especially, when we set and chose features as the input, the ACC has increased dramatically, which suggests that these features captured the major differences of brain connectivity between the high and low WRAT groups. The details of the causal connections are shown in Fig. 4 and TABLE VIII. When , i.e., features were selected, the ACC was the highest; it indicates that the selected features can explain the most cognitive variance between the two groups.

Input Pearson Partial PC -LiNGAM
TABLE VII: The mean classification results (ACC) by SVM with various inputs.
Fig. 4: Sagittal views of the causal connections, where the arrows indicate the causal flow.
ROI Direction ROI
Index MNI AR (L/R) FN Index MNI AR (L/R) FN
6 Parahippocampus (L) - 134 Precuneus (L) MRN
234 Thalamus (R) SCN
7 Parahippocampus (R) - 10 Inferior temporal gyrus (R) -
98 Superior frontal gyrus, DMN 95 Precuneus (R) DMN
medial (L)
154 Inferior occipital lobe (L) VN
100 Middle frontal gyrus (L) DMN 151 Lingual gyrus (L) VN
109 Superior frontal gyrus, DMN 85 Insula (R) -
medial orbital (L)
132 Inferior frontal gyrus, - 183 Cerebellum -
orbital (L) crus 1 (L)
145 Calcarine (R) VN 94 Middle cingulate gyrus (L) DMN
180 Superior frontal gyrus, FPN 13 Precuneus (L) SSN
orbital (R)
186 Precentral gyrus (R) FPN 175 Inferior frontal gyrus, FPN
triangular (R)
235 Superior temporal gyrus (R) VAN 120 Middle temporal gyrus (L) DMN
239 Middle temporal gyrus (R) VAN 236 Middle temporal gyrus (L) VAN
247 Fusiform gyrus (R) - 105 Superior frontal gyrus, DMN
medial (R)
256 Superior parietal gyrus (R) DAN 257 Middle temporal gyrus (R) DAN
262 Inferior temporal gyrus (L) DAN 263 Superior parietal gyrus (L) DAN
TABLE VIII: The causal connections selected for , where L and R represent the left and the right side of the brain, respectively.

Iii-B4 Summary

By applying the -LiNGAM method, we obtained each subject’s weighted adjacency matrix . From the mean network statistics extracted from of each group, we found that although the mean densities of the two groups were significantly different (low high), there were no big differences in terms of transitivity and GE. In other words, the ability to combine and process specialized information was similar between the low and high WRAT groups; however, populations in the low WRAT group used more brain connections. The group-wise directed connectivity is shown in Fig. 3 and the detailed hub information is given in TABLE IV. We found that both groups had an out-hub whose degree was relatively large, respectively. To be more specific, for the high WRAT group, the ROI located at the right middle occipital gyrus (MOG.R) has significant influences on other areas at rest, while the ROI located at the right superior temporal gyrus (STG.R) has sent extensive information to other ROIs. To compare the differences between groups, as suggested from the classification results, the selected features explained the most cognitive variance when setting . We then identified in-hubs, out-hubs and sum-hubs (see TABLE VI) from the different connections. Combining all the results above, functional modules (the VN, SN and SCN) have been mostly involved in the cognitive differences. Besides, we concluded that the left middle occipital gyrus (MOG.L), precuneus (PQ), and the right postcentral gyrus (POST.G) are influenced differently; left rolandic operculum (RO.L) and right putamen (PUT.R) have aberrant outgoing performance; the right lingual gyrus (LING.R), left calcarine sulcus (CAL.L), and right thalamus (THA.R), as the transitional areas, perform variously at different cognitive levels. TABLE VIII presents most important pairs of causal flows.

Iv Discussion

In the procedure of identifying group brain connectivity patterns and feature selection (significantly different edges), we have used both significance tests (one-sample and two-sample t-tests) and effect size statistics (regression parameters and Cohen’s d statistics). The reasons we did both are described as follows. Firstly, the sample sizes of the two groups are different, while the t-value and the corresponding p-value are dependent on the sample size, which makes the statistical results not comparable. The effect size is not dependent on the sample size, hence comes to be a rational choice. Secondly, effect size statistics provide a uniform standard, but the significance of the value stays unknown. In our study, we applied both to make sure that the results were statistically significant and numerically meaningful.

Some of our findings have been already validated in the literature. For instance, considerable prior studies have implicated that the precuneus (PQ), as a central node in the human brain, exhibits heightened connectivity within both DMN and task-positive networks while at rest [34], and is important for supporting complex cognition and behavior [35]. The study in [36] showed that precuneus played an important role in integrating both internally and externally driven information. Kilory et al. [37] have found positive relationships between PQ and intelligence quotient (IQ). All the evidence agrees with our finding that PQ, as the in-side of the causal connections, is affected differently at rest for various cognitive ability (IQ) ranges. The postcentral gyrus (POST), otherwise known as the primary somatosensory cortex, has been reported to correlate with cognitive abilities [38, 39]. Our results give a possible explanation that POST, as an in-hub of different causal connectivity in various cognitive groups, receives signals aberrantly from other ROIs. In this sense, the out-hub features (centers that convey out information differently) should attract more attention. Among them, Putamen (PUT) has been found to relate to many regions of the cerebral cortex, as well as the thalamus, claustrum, and others through various pathways. In many studies, it has become apparent that the putamen is involved in various types of learning, such as reinforcement and implicit learning [40] and category learning [41], and thus relates to IQ [42]. Our conclusion agrees with the previous studies and suggests that putamen is a center for sending out information that causes the cognitive variance. The superior temporal gyrus (STG), has been discovered to be an important structure in the pathway consisting of the amygdala and prefrontal cortex, which are all involved in cognition [43, 44]. It is essential to the function of language in individuals who may have an impaired vocabulary, or are developing a sense of language. Several fMRI studies have suggested a link between insight based problem solving and activity in the right anterior superiortemporal gyrus (STG.R) [45], which was identified as the out-hub ROI of the low WRAT group in our analysis. The thalamus has been generally believed to act as a relay station, relaying information between different subcortical areas and the cerebral cortex [46], which strongly correlates with level of intelligence [47].

We have discovered some hub nodes in the occipital lobe, more precisely at the middle occipital gyrus (MOG), left rolandic operculum (RO.L) and the right lingual gyrus (LING.R). The lingual gyrus is a structure in the visual cortex, which plays an important role in the identification and recognition of words [48], and has been found to correlate with cognitive variance [39]. The roles of MOG and RO.L in cognitive differences are new discoveries, which may suggest potential functional basis of cognition variances but still need further validations.

V Conclusion

In this paper, we propose a -learning incorporated linear non-Gaussian Acyclic model (-LiNGAM) to study the casual interactions in human brain. As suggested by its name, the proposed method assumes non-Gaussianity of the data and believes that the non-Gaussian information can help to identify the direction of the edge. The main contributions of our work can be summarized as follows. First, the proposed method well integrates both the undirected graph and the directed acyclic graph, in the sense that we incorporate the undirected graph estimation as prior information into the direct LiNGAM model to perform DAG construction. Since the DAG is a subset of the undirected graph, the first step of prior screening can mitigate the irrelevant information but still maintain the true subset, which speeds up both numerical convergence and computation. Second, the simulation results in III. A show that the proposed method is stable with different settings and demonstrates improved performance in graph structure detection compared with other competing methods. Therefore, -LiNGAM is a more solid statistical model for DAG estimation. Third, the proposed method has been applied to rs-fMRI data from PNC, to explain brain functions through directed FC differences. We have identified significant causal connections that perform differently between the high and low WRAT groups. Three types of hub structures were found (see TABLE VI). Several of our hub ROIs have already been validated to have correlations with the cognitive variance. Our findings can further illustrate the roles of the hub ROIs in the brain causal flow. Finally, the classification results in III.B 3) prove the reliability of our discoveries, which also indicate that it can be widely applied to feature selection tasks in many other neuroimaging studies.

Future work for this line of research includes the following. First, for more reliable biomedical discovery, more datasets are needed to cross validate our findings. Second, the PNC data contains nearly subjects of age from late childhood to young adulthood. In this work, we only used the subjects over years old. It would also be interesting to track functional causal brain connectivity development over various age periods, which requires a joint DAG estimation model such as [49] or time-series DAG model. We are currently working on the expansion of the -LiNGAM method for causal inferences from multiple functional connectivity networks at different brain development stages, which will be reported elsewhere.

Vi Acknowledgment

The work has been funded by NIH (R01GM109068, R01MH104680, R01MH107354, P20GM103472, 2R01EB005846, 1R01EB006841, R01MH121101, R01MH103220), and NSF (#1539067).


  • [1] B. Cai, P. Zille, J. M. Stephen, T. W. Wilson, V. D. Calhoun, and Y.-P. Wang, “Estimation of dynamic sparse connectivity patterns from resting state fmri,” IEEE transactions on Medical Imaging, vol. 37, pp. 1224–1234, 2018.
  • [2]

    G. Zhang, B. Cai, A. Zhang, J. M. Stephen, T. W. Wilson, V. D. Calhoun, and Y.-P. Wang, “Estimating dynamic functional brain cconnectivity with a sparse hidden markov model,”

    IEEE transactions on medical imaging, vol. 39, pp. 488–498, 2019.
  • [3] F. Liang, Q. Song, and P. Qiu, “An equivalent measure of partial correlation coefficients for high dimensional gaussian graphical models,” J. Amer. Statist. Assoc., vol. 110, pp. 1248–1265, 2015.
  • [4] A. Zhang, J. Fang, F. Liang, V. D. Calhoun, and Y. Wang, “Aberrant brain connectivity in schizophrenia detected via a fast gaussian graphical model,” IEEE Journal of Biomedical and Health Informatics, vol. 23 (4), pp. 1479–1489, 2018.
  • [5] A. Zhang, B. Cai, W. Hu, B. Jia, F. Liang, T. Wilson, J. Stephen, V. Calhoun, and Y. Wang, “Joint bayesian-incorporating estimation of multiple gaussian graphical models to study brain connectivity development in adolescence,” IEEE transactions on medical imaging, vol. 39, pp. 357–365, 2019.
  • [6] A. T. Reid, D. B. Headley, R. D. Mill, R. Sanchez-Romero, L. Q. Uddin, D. Marinazzo, D. J. Lurie, P. A. Valdés-Sosa et al., “Advancing functional connectivity research from association to causation,” Nature neuroscience, vol. 1(10), 2019.
  • [7] P. Spirtes, C. Glymour, R. Scheines, D. Heckerman, C. Meek, G. Cooper, and T. Richardson, Causation, prediction, and search.   MIT press, 2000.
  • [8] D. Chickering, “Optimal structure identification with greedy search,”

    Journal of machine learning research

    , vol. 3, pp. 507–554, 2002.
  • [9] S. Smith, K. Miller, G. Salimi-Khorshidi, M. Webster, C. Beckmann, T. Nichols, J. Ramsey, and M. Woolrich, “Optimal structure identification with greedy search,” Network modelling methods for FMRI. Neuroimage, vol. 54(2), pp. 875–891, 2011.
  • [10] S. Shimizu, P. Hoyer, A. Hyvärinen, and A. Kerminen, “A linear non-gaussian acyclic model for causal discovery,” Journal of Machine Learning Research, vol. 7, pp. 2003–2030, 2006.
  • [11] S. Shimizu, T. Inazumi, Y. Sogawa, A. Hyvärinen, Y. Kawahara, T. Washio, P. Hoyer, and K. Bollen, “Directlingam: A direct method for learning a linear non-gaussian structural equation model,” Journal of Machine Learning Research, vol. 12, pp. 1225–1248, 2011.
  • [12] A. Hyvärinen and S. Smith, “Pairwise likelihood ratios for estimation of non-gaussian structural equation models,” Journal of Machine Learning Research, vol. 14, pp. 111–152, 2013.
  • [13] D. Heckerman, D. Geiger, and D. Chickering, “Learning bayesian networks: The combination of knowledge and statistical data,” Machine learning, vol. 20(3), pp. 197–243, 1995.
  • [14] B. Jia, S. Xu, G. Xiao, V. Lamba, and F. Liang, “Learning gene regulatory networks from next generation sequencing data,” Biometrics, vol. 73, no. 4, pp. 1221–1230, 2017.
  • [15] H. Liu, J. Lafferty, and L. Wasserman, “The nonparanormal: Semiparametric estimation of high dimensional undirected graphs,” Journal of Machine Learning Research, vol. 10, pp. 2295–2328, 2009.
  • [16] C. M. Bishop, Pattern recognition and machine learning.   springer, 2006.
  • [17] A. P. Dempster, “Covariance selection,” Biometrics, vol. 28, pp. 157–175, 1972.
  • [18] S. Lauritzen, Graphical Models.   Oxford: Oxford University Press, 1996.
  • [19] I. Tsamardinos, L. Brown, and C. Aliferis, “The max-min hill-climbing bayesian network structure learning algorithm,” Machine learning, vol. 65, pp. 31–78, 2006.
  • [20] M. Kalisch and P. Bühlmann, “Estimating high-dimensional directed acyclic graphs with the pc-algorithm,” Journal of Machine Learning Research, vol. 8, pp. 613–636, 2007.
  • [21] T. D. Satterthwaite et al., “The philadelphia neurodevelopmental cohort: A publicly available resource for the study of normal and abnormal brain development in youth,” Neuroimage, vol. 124, pp. 1115–1119, 2016.
  • [22] T. D. Satterthwaite, M. A. Elliott, K. Ruparel, J. Loughead, K. Prabhakaran, M. E. Calkins, R. Hopson, C. Jackson, J. Keefe, M. Riley, and F. D. Mentch, “Neuroimaging of the philadelphia neurodevelopmental cohort,” Neuroimage, vol. 86, pp. 544–553, 2014.
  • [23] B. Cai, G. Zhang, W. Hu, A. Zhang, P. Zille, Y. Zhang, J. M. Stephen, T. W. Wilson, V. D. Calhoun, and Y.-P. Wang, “Refined measure of functional connectomes for improved identifiability and prediction,” Human brain mapping, vol. 40, pp. 4843–4858, 2019.
  • [24] J. D. Power, D. A. Fair, B. L. Schlaggar, and S. E. Petersen, “The development of human functional brain networks,” Neuron, vol. 67(5), pp. 735–748, 2010.
  • [25] T. Moore, S. Reise, R. Gur, H. Hakonarson, and R. Gur, “Psychometric properties of the penn computerized neurocognitive battery,” Neuropsychology, vol. 29(2), p. 235, 2015.
  • [26] G. S. Wilkinson and G. J. Robertson, WRAT 4: wide range achievement test.   Psychological Assessment Resources Lutz, FL, 2006.
  • [27] L. Xiao, J. Stephen, T. Wilson, V. Calhoun, and Y.-P. Wang, “Alternating diffusion map based fusion of multimodal brain connectivity networks for iq prediction,” IEEE Transactions on Biomedical Engineering, vol. 66 (8), pp. 2140–2151, 2018.
  • [28] J. Ramsey, R. Sanchez-Romero, and C. Glymour, “Non-gaussian methods and high-pass filters in the estimation of effective connections,” Neuroimage, vol. 84, pp. 986–1006, 2014.
  • [29] M. Rubinov and O. Sporns, “Complex network measures of brain connectivity: uses and interpretations,” Neuroimage, vol. 52, pp. 1059–1069, 2010.
  • [30] J. Fang et al., “Fast and accurate detection of complex imaging genetics associations based on greedy projected distance correlation,” IEEE Transactions on Medical Imaging, vol. 37, pp. 860–870, 2018.
  • [31] J. Cohen, Statistical power analysis for the behavioral sciences. Abingdon.   United Kingdom: Routledge, 1988.
  • [32] ——, “A power primer,” Psychological bulletin, vol. 112(1), p. 155, 1992.
  • [33] B. Schölkopf, A. J. Smola, F. Bach et al., Learning with kernels: support vector machines, regularization, optimization, and beyond.   MIT press, 2002.
  • [34] R. Leech, S. Kamourieh, C. F. Beckmann, and D. J. Sharp, “Fractionating the default mode network: distinct contributions of the ventral and dorsal posterior cingulate cortex to cognitive control,” Journal of Neuroscience, vol. 31(9), pp. 3217–3224, 2011.
  • [35] A. V. Utevsky, D. V. Smith, and S. A. Huettel, “Precuneus is a functional core of the default-mode network,” Journal of Neuroscience, vol. 34(3), pp. 932–940, 2014.
  • [36] A. E. Cavanna and M. R. Trimble, “The precuneus: a review of its functional anatomy and behavioural correlates,” Brain, vol. 129(3), pp. 564–583, 2006.
  • [37] E. Kilroy et al., “Relationships between cerebral blood flow and iq in typically developing children and adolescents,” Journal of cognitive science, vol. 12(2), p. 151, 2011.
  • [38] W. E. Brown, S. R. Kesler, S. Eliez, I. S. Warsofsky, M. Haberecht, and A. L. Reiss, “A volumetric study of parietal lobe subregions in turner syndrome,” Developmental medicine and child neurology, vol. 46, no. 9, pp. 607–609, 2004.
  • [39] W. Johnson, R. E. Jung, R. Colom, and R. J. Haier, “Cognitive abilities independent of iq correlate with regional brain structure,” Intelligence, vol. 36, no. 1, pp. 18–28, 2008.
  • [40] H. Yamada, N. Matsumoto, and M. Kimura, “Tonically active neurons in the primate caudate nucleus and putamen differentially encode instructed motivational outcomes of action,” Journal of Neuroscience, vol. 24, no. 14, pp. 3500–3510, 2004.
  • [41] S. W. Ell, N. L. Marchant, and R. B. Ivry, “Focal putamen lesions impair learning in rule-based, but not information-integration categorization tasks,” Neuropsychologia, vol. 44, no. 10, pp. 1737–1751, 2006.
  • [42] P. A. MacDonald, H. Ganjavi, D. L. Collins, A. C. Evans, and S. Karama, “Investigating the relation between striatal volume and iq,” Brain imaging and behavior, vol. 8, no. 1, pp. 52–59, 2014.
  • [43] R. Adolphs, “Is the human amygdala specialized for processing social information?” Annals of the New York Academy of Sciences, vol. 985, no. 1, pp. 326–340, 2003.
  • [44] E. D. Bigler, S. Mortensen, E. S. Neeley, S. Ozonoff, L. Krasny, M. Johnson, J. Lu, S. L. Provencal, W. McMahon, and J. E. Lainhart, “Superior temporal gyrus, language function, and autism,” Developmental neuropsychology, vol. 31, no. 2, pp. 217–238, 2007.
  • [45] M. Jung-Beeman, E. M. Bowden, J. Haberman, J. L. Frymiare, S. Arambel-Liu, R. Greenblatt, P. J. Reber, and J. Kounios, “Neural activity when people solve verbal problems with insight,” PLoS biology, vol. 2, no. 4, p. e97, 2004.
  • [46] M. Gazzaniga and R. B. Ivry, Cognitive Neuroscience: The Biology of the Mind: Fourth International Student Edition.   WW Norton, 2013.
  • [47] G. Pergola, C. Bellebaum, B. Gehlhaar, B. Koch, M. Schwarz, I. Daum, and B. Suchan, “The involvement of the thalamus in semantic retrieval: a clinical group study,” Journal of cognitive neuroscience, vol. 25, no. 6, pp. 872–886, 2013.
  • [48] A. Mechelli, G. W. Humphreys, K. Mayall, A. Olson, and C. J. Price, “Differential effects of word length and visual contrast in the fusiform and lingual gyri during,” Proceedings of the Royal Society of London. Series B: Biological Sciences, vol. 267, no. 1455, pp. 1909–1913, 2000.
  • [49] Y. Wang, S. Segarra, and C. Uhler, “High-dimensional joint estimation of multiple directed gaussian graphical models,” arXiv preprint, vol. arXiv:1804.00778, 2018.