Estimating and Inferring the Maximum Degree of Stimulus-Locked Time-Varying Brain Connectivity Networks

Neuroscientists have enjoyed much success in understanding brain functions by constructing brain connectivity networks using data collected under highly controlled experimental settings. However, these experimental settings bear little resemblance to our real-life experience in day-to-day interactions with the surroundings. To address this issue, neuroscientists have been measuring brain activity under natural viewing experiments in which the subjects are given continuous stimuli, such as watching a movie or listening to a story. The main challenge with this approach is that the measured signal consists of both the stimulus-induced signal, as well as intrinsic-neural and non-neuronal signals. By exploiting the experimental design, we propose to estimate stimulus-locked brain network by treating non-stimulus-induced signals as nuisance parameters. In many neuroscience applications, it is often important to identify brain regions that are connected to many other brain regions during cognitive process. We propose an inferential method to test whether the maximum degree of the estimated network is larger than a pre-specific number. We prove that the type I error can be controlled and that the power increases to one asymptotically. Simulation studies are conducted to assess the performance of our method. Finally, we analyze a functional magnetic resonance imaging dataset obtained under the Sherlock Holmes movie stimuli.

Authors

• 14 publications
• 18 publications
• 99 publications
• 112 publications
03/30/2019

Bayesian Mixed Effect Sparse Tensor Response Regression Model with Joint Estimation of Activation and Connectivity

Brain activation and connectivity analyses in task-based functional magn...
08/30/2010

Brain covariance selection: better individual functional connectivity models using population prior

Spontaneous brain activity, as observed in functional neuroimaging, has ...
12/20/2014

Generative Modeling of Hidden Functional Brain Networks

Functional connectivity refers to the temporal statistical relationship ...
05/24/2019

Inference of Dynamic Graph Changes for Functional Connectome

Dynamic functional connectivity is an effective measure to characterize ...
11/15/2013

Mapping cognitive ontologies to and from the brain

Imaging neuroscience links brain activation maps to behavior and cogniti...
03/10/2018

Dealing with Unknown Unknowns: Identification and Selection of Minimal Sensing for Fractional Dynamics with Unknown Inputs

This paper focuses on analysis and design of time-varying complex networ...
05/19/2017

Here, we present a novel approach to solve the problem of reconstructing...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

In the past few decades, much effort has been put into understanding task-based brain connectivity networks. For instance, in a typical visual mapping experiment, subjects are presented with a simple static visual stimulus and are asked to maintain fixation at the visual stimulus, while their brain activities are measured. Under such highly controlled experimental settings, numerous studies have shown that there are substantial similarities across brain connectivity networks constructed for different subjects (Press et al. 2001, Hasson et al. 2003). However, such experimental settings bear little resemblance to our real-life experience in several aspects: natural viewing consists of a continuous stream of perceptual stimuli; subjects can freely move their eyes; there are interactions among viewing, context, and emotion (Hasson et al. 2004). To address this issue, neuroscientists have started measuring brain activity under continuous natural stimuli, such as watching a movie or listening to a story (Hasson et al. 2004, Simony et al. 2016, Chen et al. 2017). The main scientific question is to understand the dynamics of the brain connectivity network that are specific to the continuous natural stimuli.

In the neuroscience literature, a typical approach for constructing a brain connectivity network is to calculate a sample covariance matrix for each subject: the covariance matrix encodes marginal relationships for each pair of brain regions within each subject. More recently, graphical models have been used in modeling brain connectivity networks: graphical models encode conditional dependence relationships between each pair of brain regions, given the others (Rubinov & Sporns 2010). A graph consists of

nodes, each representing a random variable, as well as a set of edges joining pairs of nodes corresponding to conditionally dependent variables. There is a vast literature on learning the structure of static undirected graphical models, and we refer the reader to

Drton & Maathuis (2017) for a detailed review.

Under natural continuous stimuli, it is often of interest to estimate a dynamic brain connectivity network, i.e., a graph that changes over time. A natural candidate for this purpose is the time-varying Gaussian graphical model (Zhou et al. 2010, Kolar et al. 2010). The time-varying Gaussian graphical model assumes

 \bX(z)∣Z=z∼Nd{0,\bSigmaX(z)}, (1)

where is the covariance matrix of given , and has a continuous density. The inverse covariance matrix encodes conditional dependence relationships between pairs of random variables at time : if and only if the th and th variables are conditionally independent given the other variables at time .

In natural viewing experiments, the main goal is to construct a brain connectivity network that is locked to the processing of external stimuli, referred to as stimulus-locked network (Simony et al. 2016, Chen et al. 2017, Regev et al. 2018). Constructing a stimulus-locked network can better characterize the dynamic changes of brain patterns across the continuous stimulus (Simony et al. 2016). The main challenge in constructing stimulus-locked network is the lack of highly controlled experiments that remove spontaneous and individual variations. The measured blood-oxygen-level dependent (BOLD) signal consists of not only signal that is specific to the stimulus, but also intrinsic neural signal (random fluctuations) and non-neuronal signal (physiological noise) that are specific to each subject. The intrinsic neural signal and non-neuronal signal can be interpreted as measurement error or latent variables that confound the stimuli-specific signal. We refer to non-stimulus-induced signals as subject specific effects throughout the manuscript. Thus, directly fitting (1) using the measured data will yield a time-varying graph that primarily reflects intrinsic BOLD fluctuations within each brain rather than BOLD fluctuations due to the natural continuous stimulus.

In this paper, we exploit the experimental design aspect of natural viewing experiments and propose to estimate a dynamic stimulus-locked brain connectivity network by treating the intrinsic neural signal and non-neuronal signal as nuisance parameters. Our proposal exploits the fact that the same stimulus will be given to multiple independent subjects, and that the intrinsic neural signal and non-neuronal signal for different subjects are independent. Thus, this motivates us to estimate a brain connectivity network across two brains rather than within each brain. In fact, this approach has been considered in Simony et al. (2016) and Chen et al. (2017) where they estimated brain connectivity networks by calculating covariance for brain regions between two brains.

After estimating the stimulus-locked brain connectivity network, the next important question is to infer whether there are any regions of interest that are connected to many other regions of interest during cognitive process (Hagmann et al. 2008). These highly connected brain regions are referred to as hub nodes, and the number of connections for each brain region is referred to as degree. Identifying hub brain regions that are specific to the given natural continuous stimulus will lead to a better understanding of the cognitive processes in the brain, and may shed light on various cognitive disorders. In the existing literature, several authors have proposed statistical methodologies to estimate networks with hubs (see, for instance, Tan et al. 2014). In this paper, we instead focus on developing a novel inferential framework to test the hypothesis whether there exists at least one time point such that the maximum degree of the graph is greater than .

Our proposed inferential framework is motivated by two major components: (1) the Gaussian multiplier bootstrap for approximating the distribution of supreme of empirical processes (Chernozhukov et al. 2013, 2014b), and (2) the step-down method for multiple hypothesis testing problems (Romano & Wolf 2005). In a concurrent work, Neykov et al. (2019) proposed a framework for testing general graph structure on a static graph. In Appendix A, we will show that our proposed method can be extended to testing a large family of graph structures similar to that of Neykov et al. (2019).

2 Stimulus-Locked Time-Varying Brain Connectivity Networks

2.1 A Statistical Model

Let , , be the observed data, stimulus-induced signal, and subject specific effects at time , respectively. Assume that

is a continuous random variable with a continuous density. For a given

, we model the observed data as the summation of stimulus-induced signal and the subject specific effects:

 \bX(z)=\bS(z)+\bE(z),\bS(z)∣Z=z∼Nd{0,\bSigma(z)},\bE(z)∣Z=z∼Nd{0,\LbX(z)}, (2)

where is the covariance matrix of the stimulus-induced signal, and is the covariance matrix of the subject specific effects. We assume that and are independent for all . Thus, estimating the stimulus-locked brain connectivity network amounts to estimating . Fitting the model in (1) using the observed data will yield an estimate of , and thus, (1) fails to estimate the stimulus-locked brain connectivity network .

To address this issue, we exploit the experimental design aspect of natural viewing experiments. In many studies, neuroscientists often measure brain activity for multiple subjects under the same continuous natural stimulus (Chen et al. 2017, Simony et al. 2016). Let and be measured data for two subjects at time point . Since the same natural stimulus is given to both subjects, this motivates the following statistical model:

 \bX(z)=\bS(z)+\bEX(z),\bY(z)=\bS(z)+\bEY(z),\bS(z)|Z=z∼Nd{0,\bSigma(z)},\bEX(z)|Z=z∼Nd{0,\LbX(z)},    \bEY(z)|Z=z∼Nd{0,\LbY(z)}, (3)

where is the stimulus-induced signal, and and are the subject specific effects at . Model (3) motivates the calculation of inter-subject covariance between two subjects rather than the within-subject covariance. For a given time point , we have

 \EE[\bX(z){\bY(z)}T∣Z=z]=\EE[\bS(z){\bS(z)}T∣Z=z]+\EE[\bEX(z){\bEY(z)}T∣Z=z]=\bSigma(z).

That is, we estimate via the inter-subject covariance by treating and as nuisance parameters. In the neuroscience literature, several authors have been calculating inter-subject covariance matrix to estimate marginal dependencies among brain regions that are stimulus-locked (Chen et al. 2017, Simony et al. 2016). They have found that calculating the inter-subject covariance is able to better capture the stimulus-locked marginal relationships for pairs of brain regions.

For simplicity, throughout the paper, we focus on two subjects. When there are multiple subjects, we can split the subjects into two groups, and obtain an average of each group to estimate the stimulus-locked brain network. We also discuss a -statistic type estimator for the case when there are multiple subjects in Appendix B.

2.2 Inter-Subject Time-Varying Gaussian Graphical Models

We now propose inter-subject time-varying Gaussian graphical models for estimating stimulus-locked time-varying brain networks. Let be independent realizations of the triplets . Both subjects share the same since they are given the same continuous stimulus. Let be a symmetric kernel function. To obtain an estimate for , we propose the inter-subject kernel smoothed covariance estimator

 ^\bSigma(z)=∑i∈[n]Kh(Zi−z)\bXi\bYTi∑i∈[n]Kh(Zi−z), (4)

where , is the bandwidth parameter, and . For simplicity, we use the Epanechnikov kernel

 K(u)=0.75⋅(1−u2)⋅\ind{|u|≤1}, (5)

where is an indicator function that takes value one if and zero otherwise. The choice of kernel is not essential as long as it satisfies regularity conditions in Section 5.1.

Let . Given the kernel smoothed inter-subject covariance estimator in (27), there are multiple approaches to obtain an estimate of the inverse covariance matrix . We consider the CLIME estimator proposed by Cai et al. (2011). Let be the th canonical basis in

. For a vector

, let and let . For each , the CLIME estimator takes the form

 ^\bThetaj(z)=\argmin\btheta∈\RRd∥\btheta∥1subjectto∥∥^\bSigma(z)⋅\btheta−ej∥∥∞≤λ, (6)

where is a tuning parameter that controls the sparsity of . We construct an estimator for the stimulus-locked brain network as .

There are two tuning parameters in our proposed method: a bandwidth parameter that controls the smoothness of the estimated covariance matrix, and a tuning parameter that controls the sparsity of the estimated network. The bandwidth parameter can be selected according to the scientific context. For instance, in many neuroscience applications that involve continuous natural stimuli, we select such that there are always at least 30% of the time points that have non-zero kernel weights. In the following, we propose a -fold cross-validation type procedure to select . We first partition the time points into folds. Let be an index set containing time points for the th fold. Let be the estimated inverse covariance matrix using data excluding the th fold, and let be the estimated kernel smoothed covariance estimated using data only from the th fold. We calculate the following quantity for various values of :

 cvλ=1LL∑ℓ=1∑i∈Cℓ∥^\bSigma(zi,λ)(ℓ)^\bTheta(zi,λ)(−ℓ)−\Ibd∥max, (7)

where is the element-wise max norm for matrix. From performing extensive numerical studies, we find that picking that minimizes the above quantity tend to be too conservative. We instead propose to pick the smallest with

smaller than the minimum plus two standard deviation.

2.3 Inference on Maximum Degree

We consider testing the hypothesis:

 (8)

In the existing literature, many authors have proposed to test whether there is an edge between two nodes in a graph (see, Neykov et al. 2018, and the references therein). Due to the

penalty used to encourage a sparse graph, classical test statistics are no longer asymptotically normal. We employ the de-biased test statistic

 (9)

where is the th column of . The subtrahend in (9) is the bias introduced by imposing an penalty during the estimation procedure.

We use (9) to construct a test statistic for testing the maximum degree of a time-varying graph. Let be an undirected graph, where is a set of nodes and is a set of edges connecting pairs of nodes. Let

The edge set is defined based on the hypothesis testing problem. In the context of testing maximum degree of a time-varying graph as in (8), , and therefore the maximum is taken over all possible edges between pairs of nodes. Throughout the manuscript, we will use the notation to indicate some predefined known edge set. This general edge set will be different for testing different graph structures, and we refer the reader to Appendix A for details.

Since the test statistic (10) involves taking the supreme over and the maximum over all edges in , it is challenging to evaluate its asymptotic distribution. To this end, we generalize the Gaussian multiplier bootstrap proposed in Chernozhukov et al. (2013) and Chernozhukov et al. (2014b) to approximate the distribution of the test statistic . Let . We construct the bootstrap statistic as

 TBE=supz∈[0,1]max(j,k)∈E(z)√nh⋅∣∣ ∣ ∣∣∑i∈[n]{^\bThetaj(z)}TKh(Zi−z){\bXi\bYTi^\bThetak(z)−ek}ξi/n{^\bThetaj(z)}T^\bSigmaj(z)∣∣ ∣ ∣∣. (11)

We denote the conditional

-quantile of

given as

 c(1−α,E)=inf(t∈\RR|P[TBE≤t|{(Zi,\bXi,\bYi)}i∈[n]]≥1−α). (12)

The quantity can be calculated numerically using Monte-Carlo. In Section 5.2, we show that the quantile of in (10) can be estimated accurately by the conditional -quantile of the bootstrap statistic.

We now propose an inference framework for testing hypothesis problem of the form (8). Our proposed method is motivated by the step-down method in Romano & Wolf (2005) for multiple hypothesis tests. The details are summarized in Algorithm 1. Algorithm 1 involves evaluating all values of . In practice, we implement the proposed method by discretizing values of into a large number of time points. We note that there will be approximation error by taking the maximum over the discretized time points instead of the supremum of the continuous trajectory. The approximation error could be reduced to arbitrarily small if we increase the density of discretization.

In Section 5.2, we will show that Algorithm 1 is able to control the type I error at a pre-specified value . Moreover, the power of the proposed inferential method increases to one as we increase the number of time points . In fact, the proposed inferential method can be generalized to testing a wide variety of structures that satisfy the monotone graph property. Some examples of monotone graph property are that the graph is connected, the graph has no more than connected components, the maximum degree of the graph is larger than , the graph has no more than isolated nodes, and the graph contains a clique of size larger than . This generalization will be presented in Appendix A.

3 Simulation Studies

We perform numerical studies to evaluate the performance of our proposal using the inter-subject covariance relative to the typical time-varying Gaussian graphical model using within-subject covariance. To this end, we define the true positive rate as the proportion of correctly identified non-zeros in the true inverse covariance matrix, and the false positive rate as the proportion of zeros that are incorrectly identified to be non-zeros. To evaluate our testing procedure, we calculate the type I error rate and power as the proportion of falsely rejected and correctly rejected , respectively, over a large number of data sets.

To generate the data, we first construct the inverse covariance matrix for . At , we set off-diagonal elements of

to equal 0.3 randomly with equal probability. At

, we set an additional off-diagonal elements of to equal 0.3. At , we randomly select two columns of and add edges to each of the two columns. This guarantees that the maximum degree of the graph is greater than . To ensure that the inverse covariance matrix is smooth, for , we construct

by taking linear interpolations between the elements of

and . For , we construct in a similar fashion based on and . The construction is illustrated in Figure 1.

To ensure that the inverse covariance matrix is positive definite, we set , where

is the minimum eigenvalue of

. We then rescale the matrix such that the diagonal elements of equal one. The covariance can be obtained by taking the inverse of for each value of . Model (3) involves the subject specific covariance matrix and . For simplicity, we assume that these covariance matrices stay constant over time. We generate by setting the diagonal elements to be one and the off-diagonal elements to be 0.3. Then, we add random perturbations to for , where . The matrix is generated similarly.

To generate the data according to (3), we first generate . Given , we generate . We then simulate and . Finally, for each value of , we generate

 \bX(Zi)=\bS(Zi)+\bEX(Zi)and\bY(Zi)=\bS(Zi)+\bEY(Zi).

Note that both and share the same generated since both subjects will be given the same natural continuous stimulus. In the following sections, we will assess the performance of our proposal relative to that of typical approach for time-varying Gaussian graphical models using the. within-subject covariance matrix as input. We then evaluate the proposed inferential procedure in Section 2.3 by calculating its type I error and power.

3.1 Estimation

To mimic the data application we consider, we generate the data with , , and . Given the data , we estimate the covariance matrix at using the inter-subject kernel smoothed covariance estimator as defined in (27). To obtain estimates of the inverse covariance matrix , we use the CLIME estimator as described in (6), implemented using the R package clime. There are two tuning parameters and : we set and vary the tuning parameter to obtain the ROC curve in Figure 2. The smoothing parameter is selected such that there are always at least 30% of the time points that have non-zero kernel weights. We compare our proposal to time-varying Gaussian graphical models with the kernel smoothed within-subject covariance matrix. The true and false positive rates, averaged over 100 data sets, are in Figure 2.

From Figure 2, we see that our proposed method outperforms the typical approach for time-varying Gaussian graphical models by calculating the within-subject covariance matrix. This is because the typical approach is not estimating the parameter of interest, as discussed in Section 2.2. Our proposed method treats the subject specific effects as nuisance parameters and is able to estimate the stimulus-locked graph accurately.

3.2 Testing the Maximum Degree of a Time-Varying Graph

In this section, we evaluate the proposed inferential method in Algorithm 1 by calculating its type I error and power. In all of our simulation studies, we consider and bootstrap samples, across a range of samples . Similarly, we select the smoothing parameter to be . The tuning parameter is then selected using the cross-validation criterion defined in (7). The tuning parameter is selected for one of the simulated data set. For computational purposes, we use this value of tuning parameter across all replications.

We construct the test statistic and the Gaussian multiplier bootstrap statistic as defined in (10) and (11), respectively. Both the statistics and involve evaluating the supreme over . In our simulation studies, we approximate the supreme by taking the maximum of the statistics over 50 evenly spaced grid , where and .

Our testing procedure tests the hypothesis

 H0:forallz∈[zmin,zmax],themaximumdegreeofthegraphisnogreaterthank,H1:thereexistsaz0∈[zmin,zmax]suchthatthemaximumdegreeofthegraphisgreaterthank.

For power analysis, we construct according to Figure 1 by randomly selecting two columns of and adding edges to each of the two columns. This ensure that the maximum degree of the graph is greater than . To evaluate the type I error under , instead of adding edges to the two columns, we instead add sufficient edges such that the maximum degree of the graph is no greater than . For the purpose of illustrating the type I error and power in the finite sample setting, we increase the signal-to-noise ratio of the data by reducing the effect of the nuisance parameters in the data generating mechanism described in Section 3. The type I error and power for , averaged over data sets, are reported in Table 1. We see that the type I error is controlled and that the power increases to one as we increase the number of time points .

4 Sherlock Holmes Data

We analyze a brain imaging data set studied in Chen et al. (2017). This data set consists of fMRI measurements of 17 subjects while watching audio-visual movie stimuli in an fMRI scanner. More specifically, the subjects were asked to watch a 23-minute segment of BBC television series Sherlock, taken from the beginning of the first episode of the series. The fMRI measurements were taken every 1.5 seconds of the movie, yielding brain images for each subject. To understand the dynamics of the brain connectivity network under natural continuous stimuli, we partition the movie into 26 scenes (Chen et al. 2017). The data were pre-processed for slice time correction, motion correction, linear detrending, high-pass filtering, and coregistration to a template brain (Chen et al. 2017). Furthermore, for each subject, we attempt to mitigate issues caused by non-neuronal signal sources by regressing out the average white matter signal.

There are measurements for 271,633 voxels in this data set. For interpretation purposes, we reduce the dimension from 271,633 voxels to regions of interest (ROIs) as described in Baldassano et al. (2015). We map the brain images taken across the 23 minutes into the interval chronologically. We then standardize each of the 172 ROIs to have mean zero and standard deviation one.

We first estimate the stimulus-locked time-varying brain connectivity network. To this end, we construct the inter-subject kernel smoothed covariance matrix as defined in (27). Since there are 17 subjects, we randomly split the 17 subjects into two groups, and use the averaged data to construct (27). Note that we could also construct a brain connectivity network for each pair of subjects separately. We then obtain estimates of the inverse covariance matrix using the CLIME estimator as in (6). We set the smoothing parameter so that at least 30% of the kernel weights are non-zero across all time points . For the sparsity tuning parameter, our theoretical results suggest picking to guarantee a consistent estimator. We select the constant by considering a sequence of numbers using a -fold cross-validation procedure described in (7), and this yields . Heatmaps of the estimated stimulus-locked brain connectivity networks for three different scenes in Sherlock are in Figure 3.

From Figure 3, we see that there are quite a number of connections between brain regions that remain the same across different scenes in the movie. It is also evident that the graph structure changes across different scenes. We see that most brain regions are very sparsely connected, with the exception of a few ROIs. This raises the question of identifying whether there are hub ROIs that are connected to many other ROIs under audio-visual stimuli.

To answer this question, we perform a hypothesis test to test whether there are hub nodes that are connected to many other nodes in the graph across the 26 scenes. If there are such hub nodes, which ROIs do they correspond to. More formally, we test the hypothesis

 H0:forallz∈[0,1],themaximumdegreeofthegraphisnogreaterthan15,H1:thereexistsaz0∈[0,1]suchthatthemaximumdegreeofthegraphisgreaterthan15.

The number 15 is chosen since we are interested in testing whether there is any brain region that is connected to more than 10% of the total number of brain regions. We apply Algorithm 1 with 26 values of corresponding to the middle of the 26 scenes. Figure 4 shows the ROIs that have more than 12 rejected edges across the 26 scenes based on Algorithm 1. Since the maximum degree of the rejected nodes in some scenes are larger than 15, we reject the null hypothesis that the maximum degree of the graph is no greater than 15. In Figure 5, we plot the sagittal snapshot of the brain connectivity network, visualizing the rejected edges from Algorithm 1 and the identified hubs ROIs.

From Figure 4, we see that the rejected hub nodes (nodes that have more than 15 rejected edges) correspond to the frontal pole (7), temporal fusiform cortex (16, 100), lingual gyrus (17), and precuneus (102) regions of the brain. Many studies have suggested that the frontal pole plays significant roles in higher order cognitive operations such as decision making and moral reasoning (among others, Okuda et al. 2003). The fusiform cortex is linked to face and body recognition (see, Iaria et al. 2008, and the references therein). In addition, the lingual gyrus is known for its involvement in processing of visual information about parts of human faces (McCarthy et al. 1999). Thus, it is not surprising that both of these ROIs have more than 15 rejected edges since the brain images are collected while the subjects are exposed to an audio-visual movie stimulus.

Compared to the lingual gyrus, temporal fusiform cortex, and the frontal pole, the precuneus is the least well-understood brain literature in the current literature. We see from Figure 4 that the precuneus is the most connected ROI across many scenes. This is supported by the observation in Hagmann et al. (2008) where the precuneus serves as a hub region that is connected to many other parts of the brain. In recent years, Lerner et al. (2011) and Ames et al. (2015) conducted experiments where subjects were asked to listen to a story under an fMRI scanner. Their results suggest that the precuneus represents high-level concepts in the story, integrating feature information arriving from many different ROIs of the brain. Interestingly, we find that the precuneus has the highest number of rejected edges during the first half of the movie and that the number of rejected edges decreases significantly during the second half of the movie. Our results correspond well to the findings of Lerner et al. (2011) and Ames et al. (2015) in which the precuneus is active when the subjects comprehend the story. However, it also raises an interesting scientific question for future study: is the precuneus active only when the subjects are trying to comprehend the story, that is, once the story is understood, the precuneus is less active.

5 Theoretical Results

We establish uniform rates of convergence for the proposed estimators, and show that the testing procedure in Algorithm 1 is a uniformly valid test. We study the asymptotic regime in which , , and are allowed to increase. In the context of the Sherlock Holmes data set, is the total number of brain images obtained under the continuous stimulus, is the number of brain regions, and is the maximum number of connections for each brain region in the true stimulus-locked brain connectivity network. The current theoretical results assume that is a random variable with continuous density. Our theoretical results can be easily generalized to the case when are fixed.

5.1 Theoretical Results on Parameter Estimation

Our proposed estimator involves a kernel function : we require to be symmetric, bounded, unimodal, and compactly supported. More formally, for ,

 ∫K(u)du=1,∫uK(u)du=0,∫ulK(u)du<∞,∫Kl(u)du<∞. (13)

In addition, we require the total variation of to be bounded, i.e., , where . In other words, we require the kernel function to be a smooth function. The Epanechnikov kernel we consider in (5) for analyzing Sherlock data satisfies (13). A unimodal kernel function is extremely plausible in our setting: for instance, to estimate brain network in the “police press conference scene”, we expect the brain images within that scene to play a larger role than brain images that are far away from the scene. One practical limitation of the conditions on the kernel function is the symmetric kernel condition. When we are estimating a stimulus-locked brain network for a particular time point, the ideal case is to weight the previous images more heavily than the future brain images. The scientific reasoning is that there may be some time lag for information processing. In order to capture this effect, a more carefully designed kernel function is needed and it is out of the scope of this paper.

Next, we impose regularity conditions on the marginal density . There exists a constant such that Furthermore, is twice continuously differentiable and that there exists a constant such that .

Next, we impose smoothness assumptions on the inter-subject covariance matrix . Our theoretical results hold for any positive definite subject-specific covariance matrices and , since these matrices are treated as nuisance parameters.

There exists a constant such that

 supz∈[0,1]maxj,k∈[d]max{|\bSigmajk(z)|,|˙\bSigmajk(z)|,|¨\bSigmajk(z)|}≤Mσ.

In other words, we assume that the inter-subject covariance matrices are smooth and do not change too rapidly in neighboring time points. This assumption clearly holds in a dynamic brain network where we expect the brain network to change smoothly over time. Assumptions 5.1 and 5.1 on and are standard assumptions in the nonparametric statistics literature (see, for instance, Chapter 2 of Pagan & Ullah 1999).

The following theorem establishes the uniform rates of convergence for . Assume that and that . Under Assumptions 5.1-5.1, we have

 supz∈[0,1]∥∥^\bSigma(z)−\bSigma(z)∥∥max=\cOP⎧⎨⎩h2+√log(d/h)nh⎫⎬⎭.

Theorem 5.1 guarantees that our estimator always converges to the population parameter under the max norm, if the smoothing parameter goes to zero asymptotically. For instance, this will satisfy if for some constant . The quantity can be upper bounded by the summation of two terms: and

, which are known as the bias and variance terms, respectively, in the kernel smoothing literature (see, for instance, Chapter 2 of

Pagan & Ullah 1999). The term on the upper bound corresponds to the bias term and the term corresponds to the variance term.

Next, we establish theoretical results for . Recall that the stimulus-locked brain connectivity network is encoded by the support of the inverse covariance matrix : if and only if the th and th brain regions are conditionally independent given all of the other brain regions. We consider the class of inverse covariance matrices:

 \cUs,M={\bTheta∈\RRd×d∣\bTheta≻0,∥\bTheta∥2≤ρ,maxj∈[d]∥\bThetaj∥0≤s,maxj∈[d]∥\bThetaj∥1≤M}. (14)

Here,

is the largest singular value of

and is the number of non-zeros in .

Brain connectivity networks are usually densely connected due to the intrinsic-neural and non-neuronal signals, such as background processing. Instead of assuming an overall sparse brain network, we assume that the stimulus-locked brain network is sparse, and allow the intrinsic brain network unrelated to the stimulus to be dense. The sparsity assumption on stimulus-locked brain network is plausible in this setting since it characterizes brain activities that are specific to the stimulus. For instance, we may believe that only certain brain regions are active under cognitive process. The other conditions are satisfied since is the inverse of a positive definite covariance matrix.

Given Theorem 5.1, the following corollary establishes the uniform rates of convergence for using the CLIME estimator as defined in (6). It follows directly from the proof of Theorem 6 in Cai et al. (2011).

Assume that for all . Let for . Under the same conditions in Theorem 5.1,

 supz∈[0,1]∥∥^\bTheta(z)−\bTheta(z)∥∥max=\cOP⎧⎨⎩h2+√log(d/h)nh⎫⎬⎭; (15)
 supz∈[0,1]maxj∈[d]∥∥^\bThetaj(z)−\bThetaj(z)∥∥1=\cOP⎡⎣s⋅⎧⎨⎩h2+√log(d/h)nh⎫⎬⎭⎤⎦; (16)
 supz∈[0,1]maxj∈[d]∥∥∥{^\bThetaj(z)}T^\bSigma(z)−ej∥∥∥∞=\cOP⎧⎨⎩h2+√log(d/h)nh⎫⎬⎭. (17)

In the real data analysis, Corollary 5.1 is helpful in terms of selecting the sparsity tuning parameter : it motivates a sparsity tuning parameter of the form to guarantee statistically consistent estimated stimulus-locked brain network. To select the constant , we consider a sequence of number and select the appropriate using a data-driven cross-validation procedure in (7).

5.2 Theoretical Results on Topological Inference

In this section, we first show that the distribution of the test statistic can be approximated by the conditional -quantile of the bootstrap statistic . Next, we show that the proposed testing method in Algorithm 1 is valid in the sense that the type I error can be controlled at a pre-specified level .

Recall from (12) the definition of . The following theorem shows that the Gaussian multiplier bootstrap is valid for approximating the quantile of the test statistic in (10). Our results are based on the series of work on Gaussian approximation on multiplier bootstrap in high dimensions (see, e.g., Chernozhukov et al. 2013, 2014b). We see from (10) that involves taking the supremum over and a dynamic edge set . Due to the dynamic edge set , existing theoretical results for the Gaussian multiplier bootstrap methods cannot be directly applied. We construct a novel Gaussian approximation result for the supreme of empirical processes of by carefully characterizing the capacity of the dynamic edge set .

Assume that . In addition, assume that . Under the same conditions in Corollary 5.1, we have

 limn→∞sup\bTheta(⋅)∈\cUs,MP\bTheta(⋅){TE≥c(1−α,E)}≤α.

Some of the scaling conditions are standard conditions in nonparametric estimation (Tsybakov 2009). The most notable scaling conditions are and : these conditions arise from Gaussian approximation on multiplier bootstrap (Chernozhukov et al. 2013). These scaling conditions will hold asymptotically as long as the number of brain images is much larger than the maximum degree in the graph . This corresponds well with the real data analysis where we expect only certain ROIs are active during information processing.

Recall the hypothesis testing problem in (8). We now show that the type I error of the proposed inferential method for testing the maximum degree of a time-varying graph can be controlled at a pre-specified level . Assume that the same conditions in Theorem 5.2 hold. Under the null hypothesis in (8), we have

 limn→∞Pnull(Algorithm ??? rejects the null hypothesis)≤α.

To study the power analysis of the proposed method, we define the signal strength of a precision matrix as

 Sigdeg(\bTheta):=maxE′⊆E(\bTheta),Deg(E)>kmine∈E′|\bThetae|, (18)

where is the maximum degree of graph . Under the alternative hypothesis in (8), the maximum degree of the graph is greater than . We define the parameter space under the alternative:

 (19)

The following theorem presents the power analysis of Algorithm 1. Assume that the same conditions in Theorem 5.2 hold and select the smoothing parameter such that . Under the alternative hypothesis in (8) and the assumption that , where is a fixed large constant, we have

 (20)

for any fixed . The signal strength condition defined in (24) is weaker than the typical minimal signal strength condition required on testing a single edge on a conditional independent graph, . The condition in (24) requires only that there exists a subgraph whose maximum degree is larger than and the minimal signal strength on that subgraph is above certain level. In our real data analysis, this requires only the edges for brain regions that are highly connected to many other brain regions to be strong, which is plausible since these regions should have high brain activity.

6 Discussion

We consider estimating stimulus-locked brain connectivity networks from data obtained under natural continuous stimuli. Due to lack of highly controlled experiments that remove all spontaneous and individual variations, the measured brain signal consists of not only stimulus-induced signal, but also intrinsic neural signal and non-neuronal signals that are subjects specific. Typical approach for estimating time-varying Gaussian graphical models will fail to estimate the stimulus-locked brain connectivity network accurately due to the presence of subject specific effects. By exploiting the experimental design aspect of the problem, we propose a simple approach to estimating stimulus-locked brain connectivity network. In particular, rather than calculating within-subject smoothed covariance matrix as in the typical approach for modeling time-varying Gaussian graphical models, we propose to construct the inter-subject smoothed covariance matrix instead, treating the subject specific effects as nuisance parameters.

To answer the scientific question on whether there are any brain regions that are connected to many other brain regions during the given stimulus, we propose an inferential method for testing the maximum degree of a stimulus-locked time-varying graph. In our analysis, we found that several interesting brain regions such as the fusiform cortex, lingual gyrus, and precuneus are highly connected. From the neuroscience literature, these brain regions are mainly responsible for high order cognitive operations, face and body recognition, and serve as control region that integrates information from other brain regions. We have also extended the proposed inferential framework to testing various topological graph structures. These are detailed in Appendix A.

The practical limitation of our proposed method is on the Gaussian assumption on the data. While we focus on the time-varying Gaussian graphical model in this paper, our framework can be extended to other types of time-varying graphical models such as the time-varying discrete graphical model or the time-varying nonparanormal graphical model (Kolar et al. 2010, Lu et al. 2015b). Another limitation is the independence assumption on the data across time points. All of our theoretical results can be generalized to the case when the data across time points are correlated, and we leave such generalization for future work.

Acknowledgement

We thank Hasson’s lab at the Princeton Neuroscience Institute for providing us the fMRI data set under audio-visual movie stimulus. We thank Janice Chen for very helpful conversations on preprocessing the fMRI data set and interpreting the results of our analysis.

Appendix A Inference on Topological Structure of Time-Varying Graph

In this section, we generalize Algorithm 1 in the main manuscript to testing various graph structures that satisfy the monotone graph property. In A.1, we briefly introduce some concepts on graph theory. These include the notion of isomorphism, graph property, monotone graph property, and critical edge set. In A.2, we provide a test statistic and an estimate of the quantile of the proposed test statistic using the Gaussian multiplier bootstrap. We then develop an algorithm to test the dynamic topological structure of a time-varying graph which satisfies the monotone graph property.

a.1 Graph Theory

Let be an undirected graph where is a set of nodes and is a set of edges connecting pairs of nodes. Let be the set of all graphs with the same number of nodes. For any two graphs and , we write if is a subgraph of , that is, if . We start with introducing some concepts on graph theory (see, for instance, Chapter 4 of Lovász 2012). Two graphs and are said to be isomorphic if there exists permutations such that if and only if . The notion of isomorphism is used in the graph theory literature to quantify whether two graphs have the same topological structure, up to any permutation of the vertices (see Chapter 1.2 of Bondy & Murty 1976). We provide two concrete examples on the notion of isomorphism in Figure 6.

Next, we introduce the notion of graph property. A graph property is a property of graphs that depends only on the structure of the graphs, that is, a graph property is invariant under permutation of vertices. A formal definition is given as follows. For two graphs and that are isomorphic, a graph property is a function such that . A graph satisfies the graph property if . Some examples of graph property are that the graph is connected, the graph has no more than connected components, the maximum degree of the graph is larger than , the graph has no more than isolated nodes, the graph contains a clique of size larger than , and the graph contains a triangle. For instance, the two graphs in Figures 6(i) and 6(ii) are isomorphic and satisfy the graph property of being connected.

For two graphs , a graph property is monotone if implies that . In other words, we say that a graph property is monotone if the graph property is preserved under the addition of new edges. Many graph property that are of interest such as those given in the paragraph immediately after Definition 6 are monotone. In Figure 7, we present several examples of graph property that are monotone by showing that adding additional edges to the graph does not change the graph property. For instance, we see from Figure 7(a) that the existing graph with gray edges are connected. Adding the red edges to the existing graph, the graph remains connected and therefore the graph property is monotone. Another example is the graph with maximum degree at least three as in Figure 7(c). We see that adding the red dash edges to the graph preserves the graph property of having maximum degree at least three.

For a given graph , we define the class of edge sets satisfying the graph property as

 P={E⊆V×V∣\cP(G)=1}. (21)

Finally, we introduce the notion of critical edge set in the following definition. Given any edge set , we define the critical edge set of for a given monotone graph property as

 \cC(E,\cP)={e∣e∉E,thereexistsE′⊇EsuchthatE′∈P and E′∖{e}∉P}. (22)

For a given monotone graph property , the critical edge set is the set of edges that will change the graph property of the graph once added to the existing graph. We provide two examples in Figure 8. Suppose that is the graph property of being connected. In Figure 8(a), we see that the graph is not connected, and thus . Adding any of the red dash edges in Figure 8(b) changes to .

a.2 An Algorithm for Topological Inference

Throughout the rest of the paper, we denote as the graph at . We consider hypothesis testing problem of the form

 H0:\cP{G(z)}=0forallz∈[0,1]H1:thereexistsaz0∈[0,1]suchthat\cP{G(z0)}=1, (23)

where is the true underlying graph and is a given monotone graph property as defined in Definition A.1. We provide two concrete examples of the hypothesis testing problem in (23). Number of connected components:

 H0:forallz∈[0,1],thenumberofconnectedcomponentsisgreaterthank,H1:thereexistsaz0∈[0,1]suchthatthenumberofconnectedcomponentsisnotgreaterthank.

Maximum degree of the graph:

 H0:forallz∈[0,1],themaximumdegreeofthegraphisnotgreaterthank,H1:thereexistsaz0∈[0,1]suchthatthemaximumdegreeofthegraphisgreaterthank.

We now propose an algorithm to test the topological structure of a time-varying graph. The proposed algorithm is very general and is able to test the hypothesis problem of the form in (23). Our proposed algorithm is motivated by the step-down algorithm in Romano & Wolf (2005) for testing multiple hypothesis simultaneously. The main crux of our algorithm is as follows. By Definition 21, the critical edge set contains edges that may change the graph property from to . Thus, at the -th iteration of the proposed algorithm, it suffices to test whether the edges on the critical edge set are rejected. Let , where is the rejected edge set from the critical edge set . Since is a monotone graph property, if there exists a such that , we directly reject the null hypothesis for all . This is due to the definition of monotone graph property that adding more edges does not change the graph property. If , we repeat this process until the null hypothesis is rejected or no more edges in the critical edge set are rejected. We summarize the procedure in Algorithm 2.