Separating Stimulus-Induced and Background Components of Dynamic Functional Connectivity in Naturalistic fMRI

We consider the challenges in extracting stimulus-related neural dynamics from other intrinsic processes and noise in naturalistic functional magnetic resonance imaging (fMRI). Most studies rely on inter-subject correlations (ISC) of low-level regional activity and neglect varying responses in individuals. We propose a novel, data-driven approach based on low-rank plus sparse (L+S) decomposition to isolate stimulus-driven dynamic changes in brain functional connectivity (FC) from the background noise, by exploiting shared network structure among subjects receiving the same naturalistic stimuli. The time-resolved multi-subject FC matrices are modeled as a sum of a low-rank component of correlated FC patterns across subjects, and a sparse component of subject-specific, idiosyncratic background activities. To recover the shared low-rank subspace, we introduce a fused version of principal component pursuit (PCP) by adding a fusion-type penalty on the differences between the rows of the low-rank matrix. The method improves the detection of stimulus-induced group-level homogeneity in the FC profile while capturing inter-subject variability. We develop an efficient algorithm via a linearized alternating direction method of multipliers to solve the fused-PCP. Simulations show accurate recovery by the fused-PCP even when a large fraction of FC edges are severely corrupted. When applied to natural fMRI data, our method reveals FC changes that were time-locked to auditory processing during movie watching, with dynamic engagement of sensorimotor systems for speech-in-noise. It also provides a better mapping to auditory content in the movie than ISC.


Bayesian Mixed Effect Sparse Tensor Response Regression Model with Joint Estimation of Activation and Connectivity

Brain activation and connectivity analyses in task-based functional magn...

Recovering low-rank structure from multiple networks with unknown edge distributions

In increasingly many settings, particularly in neuroimaging, data sets c...

Fused graphical lasso for brain networks with symmetries

Neuroimaging is the growing area of neuroscience devoted to produce data...

Aligning individual brains with Fused Unbalanced Gromov-Wasserstein

Individual brains vary in both anatomy and functional organization, even...

A constrained ICA-EMD Model for Group Level fMRI Analysis

Independent component analysis (ICA), as a data driven method, has shown...

Uncovering the Topology of Time-Varying fMRI Data using Cubical Persistence

Functional magnetic resonance imaging (fMRI) is a crucial technology for...

Group Linear non-Gaussian Component Analysis with Applications to Neuroimaging

Independent component analysis (ICA) is an unsupervised learning method ...

I Introduction

Naturalistic functional magnetic resonance imaging (fMRI) is an emerging approach in cognitive neuroscience, employing naturalistic stimuli (e.g., movies, spoken narratives, music, etc.) to provide an ecologically-valid paradigm that mimics real life scenarios [25]. Complementary to traditional task-based paradigms with strictly controlled artificial stimuli, naturalistic paradigms offer a better understanding of neural processing and network interactions in realistic contexts, where stimuli typically engage the brain in continuous integration of dynamic streams of multimodal (e.g., audiovisual) information. Resting-state fMRI, although entirely unconstrained, is vulnerable to confounds and difficult to link with ongoing cognitive states. Natural stimuli like movies have shown higher test-retest reliability than resting and task fMRI [29]. In this paper, we focus on a major challenge in naturalistic fMRI data analysis – to extract stimulus-related neural dynamics from other intrinsic and noise contributions.

Early studies use traditional methods for task fMRI such as the general linear model (GLM) to detect brain activation driven by natural stimuli, using annotations of specific features or events (e.g., faces, scenes or speech in a film [3]) as regressors. However, GLM-based analysis requires explicit a priori models of relevant features of the stimuli and associated neural responses as predictors, where minor model misspecification can result in low detection power. It is therefore effective only for simple parametric designs, but extremely rigid for capturing the richer dynamical information present in natural stimuli. Inter-subject correlation (ISC) analyses provide a powerful, data-driven alternative for handling naturalistic paradigms without a pre-defined response model, by leveraging the reliable, shared neural responses across different subjects when exposed to the same continuous naturalistic stimulation [12, 11]. By correlating fMRI time courses from the same regions across subjects, ISC can detect inter-subject synchronization of brain activity in specific areas activated by the stimuli, e.g., prefrontal cortex during movie viewing [13]. It was recently extended to inter-subject functional correlation (ISFC) to characterize stimulus-related functional networks by computing the correlations between all pairs of regions across subjects [22]. Use of ISFC has revealed, for example, engagement of default mode networks during narrative comprehension [22]

. The ISFC increases specificity to stimulus-locked inter-regional correlations, by filtering out intrinsic, task-unrelated neural dynamics as well as non-neuronal artifacts (e.g., respiratory rate; head motion) that are theoretically uncorrelated across subjects. Time-resolved ISFC computed using a sliding-window technique has been used to track movie-induced dynamic changes in functional connectivity

[4]. A recent study [10] explored ISC of time-courses of dynamic connectivity rather than regional activity. For a review of the ISC family of approaches see [20]. However, one limitation of IS(F)C is that it has no inherent way to detect individual differences in neural activity, due to the averaging-out of all uncorrelated variations across subjects. Moreover, the ISFC analysis inherently relies on the Pearson correlations between activity time courses of subjects, and thus how it can be generalized to other measures of brain connectivity, e.g., for directed functional networks is still unclear.

We propose a new approach based on low-rank plus sparse (L+S) decomposition to separate stimulus-induced and background components of dynamic functional connectivity (dFC) in naturalistic fMRI. Our method is inspired by the successful applications of L+S decomposition in video surveillance to separate a slowly-changing background (modeled by a low-rank subspace) from moving foreground objects (sparse outliers) from a video sequence

[5, 19]

. We formulate a time-varying L+S model for dFC, where the time-resolved, multi-subject functional connectivity (FC) matrices are represented as the sum of a low-rank component and a sparse component. The low-rank component corresponds to the similar or correlated connectivity patterns across subjects elicited by the same stimuli, which can be well-approximated by a common low-dimensional subspace. The sparse component captures the corruptions on stimulus-related FC edges, arising from background activity (i.e., intrinsic processes and artifacts) that are idiosyncratic and occur occasionally in certain subjects. We apply the robust principal component analysis (RPCA)

[31, 8] at each time point to recover a sequence of time-resolved low-rank FC matrices from the sparse background noise, by solving a convex optimization that minimizes a combination of nuclear norm and norm (called principal component pursuit (PCP)). Under mild assumptions (i.e., the sparse matrix is sufficiently sparse), the RPCA-PCP can exactly recover the shared low-rank FC structure across subjects, even though an unknown fraction of the FC edges are corrupted by artifacts of arbitrarily large magnitude, as shown in [8] for the general case. A related work [2] imposed L+S constraints on FC matrices of individual subjects and focused on modeling raw fMRI signals, another [14] used a low-rank approximation of stacked FC networks across subjects. However, both studies are limited to analyses of static FC and resting-state data.

We further introduce a novel L+S decomposition method called the fused PCP, by adding a new fusion-type penalty in the PCP to shrink the successive differences between the columns of the low-rank matrix towards zero. The fused PCP encourages homogeneity in connectivity patterns across subjects, by smoothing out the remaining stimulus-unrelated individual differences in the common connectivity structure. Moreover, it can provide a better recovery of the low-rank FC matrices in presence of dense noise (i.e., many of the FC edges are corrupted) than the generic PCP. We develop an efficient algorithm via linearized alternating direction method of multipliers (ADMM) to solve the proposed fused PCP program which admits no closed-form solution. We evaluated the performance of the proposed algorithm for L+S recovery via simulations. Application to naturalistic fMRI data reveals dynamic FC patterns that are reliably related to processing of continuous auditory events during movie watching.

The contributions of this work are as follows: (1) This is the first work to propose a L+S decomposition method for naturalistic fMRI, which is is model-free and data-driven in detecting stimulus-induced dynamic FC. (2) The proposed L+S learning algorithm offers a novel way to isolate low-rank parts of correlated multi-subject FC from idiosyncratic components over time. It exploits shared neuronal activity across subjects in terms of the network connectivity directly rather than the low-level BOLD responses in ISFC. This is equivalent to denoising the FC networks instead of raw fMRI signals via ISC, and thus extracts better FC structure information. By utilizing cross-subject reliability of FC measures captured in the low-rank part, we show that it provides a better mapping to naturalistic stimuli than the ISFC. Furthermore, our method advances a more general analysis framework applicable to other FC measures beyond the Pearson correlation in ISFC. (3) Unlike ISFC which computes a group-level FC profile related only to the stimuli, the proposed method recovers both the stimulus-related and unrelated FC patterns. This allows us to quantify confounding influence on the stimulus-induced FC, including potentially interesting stimulus-independent sources of neural dynamics, such as intrinsic processes, attentional variations and other individual differences. (4) While extracting a shared connectivity structure, our method also retains subject-specific FC patterns unavailable from ISFC, which is useful for studying individual differences in dynamic FC during naturalistic stimulation.

Ii Low-Rank and Sparse Matrix Decomposition for Dynamic Functional Connectivity

Ii-a Proposed Model

Suppose we observe a set of weighted connectivity matrices for multi-subject, time-dependent functional brain networks with same set of nodes, where elements in are measures of pairwise interactions between nodes in the network at time point for th subject. is the number of time points and is the number of subjects. Given the observed , we aim to separate the task or stimulus-related brain FC dynamics from the background composed of intrinsic neural correlations, noise and other non-neuronal contributions. Our approach builds on the hypothesis that FC networks across different subjects admit a common underlying structure and exhibit similar or highly-correlated patterns, due to shared neuronal response among subjects when experiencing the same stimuli, e.g., watching the same movie in a naturalistic setting. On the other hand, the background connectivity networks are assumed to be relatively sparse compared to the stimulus-induced ones, and some confounding fluctuations such as motion-related artifacts occur only occasionally in certain subjects. These background activities are spontaneous, not time-locked to stimuli, and thus varying considerably across subjects.

In particular, we propose a decomposition of the time-dependent connectivity matrices as a superposition of a low-rank and a sparse component that represent respectively the correlated stimulus-induced connectivity and the subject-varying background activities. To formulate, let

be the vectorized version of

, and to denote the concatenation of the vectors of connectivity metrics over all subjects at time . We consider a low-rank plus sparse (L+S) decomposition of the time-dependent matrices


where is the low-rank matrix with , is a sparse matrix with a fraction of non-zero entries, and , are respectively the column vectors of stimulus-related and background connectivity metrics for the -th subject at time . The stimulus-induced connectivity that are correlated between subjects at each time can be well-approximated by the columns of that lie on a low-dimensional subspace spanned by common bases. For example, rank one yields a constant connectivity pattern over all subjects. The columns of captures the subject-specific background activities. Unlike the small Gaussian noise assumption, entries of may have arbitrarily large magnitude with unknown sparse support, and thus capturing random, gross corruptions from the background events on the stimulus-related information in functional connectivity.

Ii-B L+S Decomposition

Ii-B1 Principal Component Pursuit

To recover the sequence of low-rank matrices of stimulus-related connectivity from the corrupted observations in (1), we consider solving the following optimization problem


where is the nuclear norm of matrix

, i.e. the sum of all its singular values,

is the norm of and is a regularization parameter controlling the trade-off between the low-rank and sparse components. The nuclear norm induces sparsity of the vector of singular values, or equivalently for the matrix to be low rank, while the penalty imposes element-wise sparsity of . Minimization of (2) corresponds to applying the well-known PCP approach [8] at each time point. Under the incoherence conditions on the row and column subspaces of , the PCP solution can perfectly recover the low-rank and sparse components, provided that the rank of is not too large and is reasonably sparse [8, 9]. Many efficient and scalable algorithms such as the augmented Lagrange multipliers (ALM) method [18] have been proposed to solve this convex optimization.

Ii-B2 Fused L+S Recovery

The slowly-varying or possibly constant stimulus-related connectivity patterns across subjects may not be fully explained by a low-rank structure estimated in (

2). To better capture this inter-subject similarity in connectivity structure, we introduce a novel L+S decomposition by solving a fused version of PCP (fused PCP)


Here, the additional fused term , regularized by tuning parameter , encourages the stimulus-induced group homogeneity by penalizing the differences in the connectivity profiles between subjects. For some sufficiently large , the penalty shrinks inter-subject differences in some connectivity coefficients to 0, resulting in similar (but not necessarily identical) connectivity patterns across subjects. Clearly, the model (3) includes the original L+S decomposition (2) as a special case when .

Define a matrix where

is first-order difference matrix and is a identity matrix. We can reformulate the objective function in (3) as


Iii Optimization

For the fused L+S decomposition, we develop an ADMM algorithm to solve the convex optimization problem (3) which has no closed form solution. We introduce a set of surrogate variables where . Then, the minimization of (4) is equivalent to


The augmented Lagrangian function of (5) is


where are Lagrange multipliers, is a pre-specified step-size parameter, denotes the norm and is the Frobenius norm. In a rescaled form, (6) can be written as


where and are rescaled Lagrange multipliers.

We can obtain the parameter estimates by minimizing the augmented Lagrangian via ADMM algorithm [6], which alternates between minimizing (7) with respect to each set of primal variables sequentially, and the updating of the dual variables . Let initialize the parameters by for . The proposed ADMM algorithm solves the following subproblems iteratively until convergence, with parameter updates at each iteration given by


Next we derive the solution for each subproblem.

Iii-1 Update

To update , the subproblem (8) can be separated into minimizations with respect to ’s at individual time points (similarly for subproblems (9)-(10)), where each is updated independently


Due to the non-identity matrix , (13) does not have a closed-form solution. Motivated by the linearized ADMM method in [30]

recently applied to lasso regression

[17, 16], we linearize the quadratic term in (13) as


where the parameter controls the proximity to . Substituting (14) into (13) and after some algebra, we obtain the following approximate subproblem for (13)


where is a matrix such that with . We show that (15) can be solved via the singular value thresholding (SVT) [7]. More precisely, let be the element-wise soft-thresholding (shrinkage) operator for a matrix at level , . We also denote by the singular value thresholding operator given by where

is the singular value decomposition (SVD) of

. Then, admits an explicit solution


The derivation of (16) is given in Supplementary Section I. Note that when , (16) reduces to the solution of the ALM algorithm for the low-rank component [18] in the original L+S decomposition problem (2).

Iii-2 Update

The minimization in subproblem (9) with respect to at each time point is equivalent to

We can get a closed-form solution by applying the element-wise soft-thresholding operator


Iii-3 Update

Similarly, the subproblem (10) with respect to each is

where the solution is also a soft-thresholding operation


We chose the parameter , as suggested by [8] for the PCP. The tuning parameter can be selected using cross-validation. To establish the convergence of the linearized ADMM algorithm, the parameter is required to satisfy where denotes the spectral radius [16]. We set . We set the stopping criterion as

or the maximum iteration number of 5000 is reached. The proposed ADMM algorithm for solving the fused L+S decomposition is summarized in Algorithm 1.

0:  A series of vectorized multi-subject dynamic connectivity metrics .
0:  , , , where denotes the spectral radius.
0:  , .
1:  for  do
2:     while not converge do
3:        Compute : with
4:        Compute
5:        Update
6:        Update
7:        Update
8:        Update
9:        Update
10:     end while
11:  end for
11:   and .
Algorithm 1 ADMM for Fused L+S Decomposition

Iv Simulation

We evaluate the performance of the proposed fused L+S decomposition method via simulation studies. The goal is to assess the performance of L+S algorithms in separating the common structure from individual-specific background components in multi-subject FC networks. Here, we focus on the static snapshot of dynamic networks at a particular time point, and drop the index for notational brevity. We generate the data matrix of concatenated connectivity metrics of weighted, undirected networks for subjects as follows


where , with , whose columns correspond to a set of vectorized basis connectivity matrices that are shared across subjects, and with being a vector of mixing weights for subject . To emulate the modular structure of brain networks [26] induced by stimuli or tasks, we generate each basis connectivity matrix from a stochastic block model (SBM) [21]

with two equal-sized communities with intra- and inter-community edge probabilities of 0.95 and 0.2, respectively. The edge strengths are randomly drawn from the uniform distribution

. The subject-dependent mixing weights

are clustered according to two subgroups, which are independently sampled from normal distributions

for and for to represent the strong and the weak response to the stimulus in the two subgroups, respectively. We set such that the variability within each subgroup is small. The resulting matrix has rank , and the connectivity profiles in the columns of are similar within each subgroup. The background components of individual subjects are generated independently as sparse symmetric matrices with support set uniformly chosen at random at a constant sparsity level (fraction of non-zero entries), and whose non-zero entries are i.i.d. samples from indicating large magnitude of noise in . The simulations were replicated 100 times.

We compare the performance of the fused L+S decomposition with the original version of PCP in recovering and from the observed matrix . For the fused L+S, we select the optimal parameter over a large grid of values, which gives the best recovery performance on an independently generated validation set. To measure the accuracy of recovery, we computed the root mean squared error (RMSE) between the ground-truth and estimated low-rank and sparse components, defined by and .

(d) (a)
(e) (b)
(f) (c)
Fig. 1: Performance comparison of original and fused L+S decomposition algorithms in recovering low-rank and sparse structures from simulated multi-subject connectivity matrices. RMSEs of estimated low-rank (top) and sparse (bottom) components against increasing level of sparsity for different ranks . Results are averages over 100 replications.
Fig. 2: L+S recovery of synthetic multi-subject FC networks. One realization of simulated matrix (top) whose columns are vectorized upper-triangular part of undirected FC matrices with nodes for subjects. The dimension is . Low-rank (middle) and sparse (bottom) components recovered from using the fused L+S method.

We investigate robustness of the methods to increasing rank and sparsity level of the ground-truth components. Fig.1 plots the average RMSEs over 100 replications as a function of (varied from 0.1 to 0.8 with an increment of 0.1) for with fixed and . As expected, estimation errors of both methods for the low-rank and sparse components increase with and , generally. The larger fraction of corrupted entries and the rank of the data matrix render the estimated and more deviated from the ground-truth, in consistency with theoretical results in previous studies [8]. Both methods perform comparably when is small. However, as increases, the fused L+S estimator clearly outperforms the original version with substantially lower errors with a slower growth rate. This suggests the robustness of the fused L+S method under the presence of severe dense noise. It provides a better recovery of the low-rank matrix of shared connectivity patterns even if almost all of its entries are grossly corrupted by the background components, as evidenced by the relatively stable errors when and . This is due to the additional fused penalty term which smooths out the individual-specific random background noise, when leveraging on the inter-subject similarity of the stimulus-induced connectivity profiles.

Fig.2 shows fused L+S decomposition of a simulated multi-subject connectivity matrix with underlying rank and a fraction of corrupted edges in each subject. The estimate successfully recovers the correlated connectivity patterns across subjects with distinct sub-group shared responses from the highly-corrupted , while filters out subject-specific background noise without prior information of the locations of the corrupted edges (the support of ).

Fig. 3: Fused L+S decomposition of time-resolved, multi-subject FC from movie-watching fMRI. (Left) Original dynamic pairwise correlations between 16 ROIs and the estimated low-rank and sparse components, averaged over all subjects. (Right) Mean dynamic correlations across all edges and the estimated low-rank and sparse components. The group-average (blue lines) and individual (thin lines) time courses depict distributions of dynamic correlations over subjects. The time-varying low-rank components are temporally locked across subjects, suggesting potential shared response to stimulus sequence in the movie.

V Analysis of Movie-Watching fMRI

In this section, we applied the proposed fused L+S decomposition approach to separating the stimulus-related and background components in time-varying FC from fMRI data of a group of subjects exposed to complex naturalistic stimulation from movie viewing. We tested the established hypothesis that, as speech becomes more difficult to process, e.g., as through speech-in-noise, somatosensory, motor, and auditory systems are engaged more to facilitate processing [23].

V-a Data Acquisition & Stimuli

Thirteen subjects watched a television show during which 3 Tesla fMRI data were collected at the Joan & Sanford I. Weill Medical College of Cornell University by Dr Skipper (GE Medical Systems, Milwaukee, WI; TR = 1.5 s; see [24] for full details). The show, Are you smarter than a 5th grader?, was 32min 24sec long. Among other features, audio events were annotated at a 25ms resolution by a trained coder. These were grouped into five general categories including noise, music, speech (speech that occurred in silence), sp+noise (speech that co-occurred with noise, including background talking, clapping, music etc.) and silence. The audio annotations were down-sampled to match the resolution of fMRI data by selecting the auditory category that occurred most frequently within each 1.5 s TR.

V-B Preprocessing

The fMRI data were minimally preprocessed, including slice timing correction, despiking, registration, nonlinear transformation to MNI coordinate space, and masking of voxels outside of the brain. We used the Freesurfer parcellation to obtain 16 regions of interest (ROIs) associated with language and auditory processing: These ROIs include the central sulcus (CS), planum polare (PP), planum temporale (PT), postcentral gyrus (PoCG), precentral gyrus (PreCG), superior temporal gyrus (STG), transverse temporal gyrus (TTG) and transverse temporal sulcus (TTS) from both left and right hemispheres. These regions are commonly associated with auditory, somatosensory and motor processing generally and speech perception more specifically. The mean BOLD time series were computed across voxels within each ROI.

V-C Shared Structure in Dynamic Connectivity Across Subjects

To characterize dynamic FC, we computed time-varying correlations between the 16 ROI fMRI time series for each subject, using sliding windows of 15 TRs (22.5 s) with step size of 1 TR (1.5 s) between windows. The choice of window length of 22.5 s was based on the inverse of the lowest frequency present in the data [15] after high-pass filtering at 0.05 Hz. This window length has been used in [4] to compute the sliding-window ISFC for movie-watching fMRI data. At each time point , the vectorized correlation coefficients of individual subjects are stacked as columns of matrix . We then performed the fused L+S decomposition on the multi-subject dynamic correlation matrices by solving (3) using the proposed ADMM algorithm. We selected the penalty parameter based on the performance of the resulting low-rank models in predicting the time-varying auditory stimuli (see Section IV-E). From a range of values from 0.001 to 0.5, we identified as optimal, giving the highest classification accuracy on the validation set in a leave-one-subject-out cross-validation. Fig. 3 shows the extracted time-varying low-rank and sparse components averaged across subjects and across all edges. In Fig. 3 (left), the low-rank component shows noticeable time-varying structure with higher specificity in time compared to the original dynamic correlations. It provides a better localization of dynamic changes in connectivity which were possibly elicited by the movie and time-locked to the processing of specific ongoing movie events. From Fig. 3 (right), it is apparent that the low-rank components are highly synchronized across subjects over time, while identifying some inter-subject variability possibly arising from varied responses of different individuals to the stimuli. In contrast, the sparse components show temporally uncorrelated patterns. The low-rank part is specifically sensitive to synchronized responses across subjects. This suggests that the proposed fused L+S decomposition is capable of isolating reliable FC changes driven by a correlated source across subjects (exposed to the same movie stimuli in our case). The uncorrelated background sources of dynamic FC across subjects are filtered out as the residual sparse components. These sources may correspond to subject-specific dynamic fluctuations of FC networks as well as non-neuronal artifacts unrelated to the stimulus-processing.

For illustration, Fig. 4 shows decomposition for a snapshot of connectivity networks at time point TRs (selected based on median correlation over time). The low-rank component identifies a common network structure underlying all subjects, as evident from the similar and highly-correlated connectivity patterns detected across subjects. As validated by simulation, our algorithm can reliably recover the common structure from the subject-specific background noise, which could be arbitrarily large in magnitude and contaminated most of the network correlations (e.g., the dense residuals for subject 3). The estimated rank was , where individual networks (columns of ) can be expressed as a linear combination of common basis connectivity matrices with different mixing weights for each subject. See Supplementary Fig. 1 for additional example decomposition at TRs.

Fig. 4: Common structure in time-resolved functional connectivity across subjects. (Top) An example of original functional connectivity matrices for subjects at time window TRs. (Middle) Low-rank component with recovered by our fused L+S approach, revealing shared connectivity structure across subjects. (Bottom) Estimated sparse errors or residuals correspond to individual-specific background components. The selected time window corresponds to the median value of mean correlation over all edges and subjects.

V-D Low-Rank Components Reveal Stimulus-Induced Connectivity Patterns

To test the aforementioned hypothesis, we examined whether the extracted low-rank components are related to the auditory stimuli present in the movie. We did so by fitting a multinomial logistic regression model to learn the mapping between the audio annotations and the time-resolved low-rank connectivity metrics. We set the auditory category labels as responses and the connectivity edges as predictors. The silence events were used as the reference category. To evaluate the goodness-of-fit and how well the low-rank components are temporally mapped to the auditory events, we used the fitted logistic model to decode the probability of the actual category labels at each time point. Fig. 

5 shows the decoded time courses for each auditory category. For all categories, high probabilities are precisely aligned with the intervals when those categories are present, indicating accurate mapping of the low-rank components to the stimulus annotations.

Fig. 5: Decoded probability of a specific auditory stimulus category being present in the movie over time based on the multinomial logistic regression fitted on the extracted low-rank components. Colored lines indicate time courses of decoded probabilities averaged over all subjects. Gray regions show time points when the stimulus category was actually present in the movie.
Fig. 6: Stimulus-induced connectivity networks during movie watching as revealed by the low-rank components. (Top) Average functional connectivity maps (across 13 subjects and time steps) associated with each of the four categories of auditory events in the natural movie. (Bottom) The corresponding spatial maps of regression coefficients from the fitted logistic model. Only functional connections showing significant stimulus-related effects are displayed (masked by regression coefficients with , Bonferroni-corrected for multiple testings).

In Fig. 6, we plot the mean functional connectivity maps and the estimated regression coefficient maps associated with the four auditory categories in the movie. The results are based on the low-rank components at a lag of 3 s (i.e., at 2 time points before stimulus onsets at ). The latter was empirically determined to exhibit the strongest connectivity by testing at previous and future time points (at intervals of 1.5 s or 1 TR) relative to the stimulus onsets (See Supplementary Fig. 2 and Fig. 3). This lag was expected because it approximately includes to delay, rise time and peak of a canonical hemodynamic response function associated with any one stimulus lasting about 1.5 s. For details, see Supplementary Section III for the additional analyses on the temporal effect of brain response. As shown in Fig. 6, the low-rank component detected distinct connectivity patterns, with relatively different distributions of connectivity across the different auditory environments. As hypothesized, we observe a noticeable increase in connectivity when speech was accompanied by noise (sp+noise

), compared to speech in silence. We used a group-level t-test to contrast the FC between the

sp+noise and speech conditions. Fig. 7(a) shows difference in the mean FC matrices (across subjects and time points) from the low-rank component between the two conditions. The connections were thresholded at (Bonferroni-corrected). Specifically, there was significantly higher FC throughout regions typically involved in speech perception and language comprehension for speech-in-noise compared to speech. This included connections PT.R-TTS.R and PT.R-STG.R which showed the largest increase in FC strength. Furthermore, our method identified significant increase in connectivity between the regions of auditory and speech processing with the premotor, motor and somatosensory regions in response to speech mixed with noise, e.g., TTS.L-PoCG.R, PT.R-PreCG.R and TTG.R-PoCG.R, as shown in Fig. 7(b).

In summary, results support hypotheses about the role of ‘the motor system’ in speech perception [1, 23]. In particular, a distributed set of brain regions involved in producing speech are dynamically recruited in perceiving speech, particularly in more effortful listening situations. This might be because speech production regions are engaged in predicting upcoming speech sounds. In noisy environments, this process needs to be engaged more as predictions are inherently less accurate.

Fig. 7: Difference in functional connectivity between the speech-in-noise (sp+noise) and speech-only (speech) events as revealed by the low-rank components at time lag of 3 s (or 2 TRs). (a) Connectivity maps show increase (left) and decrease (right) in FC during sp+noise compared to speech events. Only edges with significant difference in FC strength (absolute value of correlation coefficients) are shown (, Bonferroni-corrected). The five connections exhibiting largest increase in FC include PT.R-TTS.R, TTS.L-PoCG.R, PP.R-TTS.R, TTS.L-PT.R and PT.R-STG.R. (b) Connectogram shows changes increase (hot colors) and decrease (blue) in FC strength between auditory (AUD) and premotor-motor-somatosensory (PMS) regions. The thickness of links indicate magnitude of changes.

V-E Across-Subject Prediction of Stimulus Annotations

We trained a support vector machine (SVM) classifier on the time-varying low-rank FC components to evaluate any improvement in predicting the different auditory categories over the original observed version of time-varying FC metrics. We also compared the performance with the inter-subject functional correlation (ISFC)

[22], a widely-used method for isolating stimulus-induced inter-regional correlations between brains exposed to the same stimuli from other intrinsic and non-neuronal activities. We computed the ISFC in a sliding-window manner as in [4], which showed that the time-varying ISFC can detect movie-driven fMRI FC changes by averaging out uncorrelated dynamic fluctuations. Following [22], we performed across-subject classification but using the time-resolved instead of static FC patterns to predict the stimulus events over time. We used leave-one-subject-out cross-validation. In each fold, one subject was held out for testing and the remaining subjects were used to train the SVM. We computed the time-resolved low-rank components for the training set from the standard FC, using the fused L+S algorithm. Since the test set is unseen from the training set, we extracted the low-rank components for the test subject via orthogonal projection of the test data , where is a basis matrix estimated based on the SVD of train data . Classification accuracy was computed as the percentage of time points correctly assigned to the true category labels.

Fig. 8 shows the classification results using the different time-varying FC metrics at a lag of 3 s relative to the stimulus annotations. Analysis of the temporal effects on the classification performance shows that the highest accuracy was obtained at this time lag (Supplementary Fig. 5). The low-rank components significantly outperformed both the standard FC and ISFC in discriminating ongoing auditory events in the movie. The ISFC is slightly better than the standard FC. The superior decoding performance suggests that the low-rank components are more stimulus-related, extracting novel information about the stimulus-induced dynamic functional changes not captured by the FC and ISFC. Among the L+S algorithms, addition of fused penalty improves the cross-subject prediction considerably, due to the further smoothing of subject-specific fluctuations to uncover shared neuronal responses that are time-locked to the same stimulus processing. As expected, the residual sparse components, corresponding to uncorrelated effects of intrinsic neuronal processes and noise, were poorly predictive of the stimuli across subjects.

Fig. 8: Accuracy scores of leave-one-subject-out prediction of stimulus annotations for five categories of audio events present in the movie over time using different time-resolved connectivity features. The categories considered include Noise, Music, Speech, Sp+Noise and Silence.

Vi Conclusion

We developed a novel L+S decomposition method for separating stimulus-induced and background components of dynamic FC networks during naturalistic neuroimaging, by leveraging on the time-locked nature of naturalistic stimuli. The method provides a principled way to recover both the stimulus-evoked, shared FC dynamics across individuals and the stimulus-unrelated idiosyncratic variations in individuals, respectively modeled as the low-rank and sparse structures of multi-subject dFC networks. The proposed fused PCP solved via an efficient ADMM algorithm can capture the stimulus-induced similarities in FC profiles between subjects more effectively, and its robustness to dense noise allows us to filter out corruptions of FC edges that are not necessarily sparse. Our proposed framework is general, broadly applicable to other neuroimaging data such as electroencephalography (EEG), and can incorporate other FC measures beyond simple correlations in ISC-based analyses. In an application to movie-watching fMRI, our method identified time-locked changes in FC networks across subjects, which were meaningfully related to the processing of complex, time-varying auditory events in the movie, e.g., dynamic recruitment of speech production systems in perceiving speech in noisy environments. It also revealed potentially interesting individual differences in FC patterns that may relate to behavioral appraisal of stimuli. The extracted low-rank FC components also show better prediction of the movie auditory annotations than ISFC, suggesting the gain in information about the naturalistic stimuli by using shared network structure instead of synchronization of low-level regional activity. Thus, the proposed L+S recovery approach opens new opportunities for mapping brain network dynamics to stimulus features and behavior during naturalistic paradigms. Future studies can explore potential applications of L+S analysis of natural fMRI to connectome-based prediction of individual traits and behavior, and diagnosis of cognitive and affective disorders. Further extensions are needed to improve the scalability of L+S learning algorithm to handle large-sized brain networks. The dynamic FC analysis can also be extended to investigate state-related changes in the network structure, e.g., dynamic community structure across subjects during naturalistic paradigms, by applying regime-switching models [27, 28] on the extracted time-resolved FC components.


  • [1] P. Adank (2012) The neural bases of difficult speech comprehension and speech production: two activation likelihood estimation (ale) meta-analyses. Brain and Language 122 (1), pp. 42–54. Cited by: §V-D.
  • [2] P. Aggarwal and A. Gupta (2018) Low rank and sparsity constrained method for identifying overlapping functional brain networks. PloS one 13 (11), pp. 1–19. Cited by: §I.
  • [3] A. Bartels and S. Zeki (2004) Functional brain mapping during free viewing of natural scenes. Human Brain Mapp. 21 (2), pp. 75–85. Cited by: §I.
  • [4] T. A. Bolton, D. Jochaut, A. Giraud, and D. Van De Ville (2018) Brain dynamics in ASD during movie-watching show idiosyncratic functional integration and segregation. Human Brain Mapp. 39 (6), pp. 2391–2404. Cited by: §I, §V-C, §V-E.
  • [5] T. Bouwmans and E. H. Zahzah (2014) Robust pca via principal component pursuit: A review for a comparative evaluation in video surveillance. Comput. Vision Image Understanding 122, pp. 22–34. Cited by: §I.
  • [6] S. Boyd, N. Parikh, E. Chu, B. Peleato, J. Eckstein, et al. (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3 (1), pp. 1–122. Cited by: §III.
  • [7] J. Cai, E. J. Candès, and Z. Shen (2010) A singular value thresholding algorithm for matrix completion. SIAM J. Optimization 20 (4), pp. 1956–1982. Cited by: §III-1.
  • [8] E. J. Candès, X. Li, Y. Ma, and J. Wright (2011) Robust principal component analysis?. J. ACM 58 (3), pp. 1–37. Cited by: §I, §II-B1, §III-3, §IV.
  • [9] V. Chandrasekaran, S. Sanghavi, P. A. Parrilo, and A. S. Willsky (2011) Rank-sparsity incoherence for matrix decomposition. SIAM J. Optimization 21 (2), pp. 572–596. Cited by: §II-B1.
  • [10] X. Di and B. B. Biswal (2020) Intersubject consistent dynamic connectivity during natural vision revealed by functional mri. NeuroImage, pp. 1–9. Cited by: §I.
  • [11] U. Hasson, R. Malach, and D. J. Heeger (2010) Reliability of cortical activity during natural stimulation. Trends Cognitive Sci. 14 (1), pp. 40–48. Cited by: §I.
  • [12] U. Hasson, Y. Nir, I. Levy, G. Fuhrmann, and R. Malach (2004) Intersubject synchronization of cortical activity during natural vision. Science 303 (5664), pp. 1634–1640. Cited by: §I.
  • [13] I. P. Jääskeläinen, K. Koskentalo, M. H. Balk, T. Autti, J. Kauramäki, C. Pomren, and M. Sams (2008) Inter-subject synchronization of prefrontal cortex hemodynamic activity during natural viewing. Open Neuroimaging J. 2, pp. 14–19. Cited by: §I.
  • [14] X. Jiang, L. Zhang, L. Qiao, and D. Shen (2020)

    Estimating functional connectivity networks via low-rank tensor approximation with applications to mci identification

    IEEE Trans. Biomed. Eng. 67 (7), pp. 1912–1920. Cited by: §I.
  • [15] N. Leonardi and D. Van De Ville (2015) On spurious and real fluctuations of dynamic functional connectivity during rest. NeuroImage 104, pp. 430–436. Cited by: §V-C.
  • [16] M. Li, Q. Guo, W. Zhai, and B. Chen (2020) The linearized alternating direction method of multipliers for low-rank and fused lasso matrix regression model. J. Appl. Stat., pp. 1–18. Cited by: §III-1, §III-3.
  • [17] X. Li, L. Mo, X. Yuan, and J. Zhang (2014) Linearized alternating direction method of multipliers for sparse group and fused lasso models. Comput. Stat. Data Anal. 79, pp. 203–221. Cited by: §III-1.
  • [18] Z. Lin, M. Chen, and Y. Ma (2010) The augmented lagrange multiplier method for exact recovery of corrupted low-rank matrices. arXiv preprint arXiv:1009.5055. Cited by: §II-B1, §III-1.
  • [19] X. Liu, G. Zhao, J. Yao, and C. Qi (2015) Background subtraction based on low-rank and structured sparse decomposition. IEEE Trans. Image Process. 24 (8), pp. 2502–2514. Cited by: §I.
  • [20] S. A. Nastase, V. Gazzola, U. Hasson, and C. Keysers (2019) Measuring shared responses across subjects using intersubject correlation. Cited by: §I.
  • [21] D. M. Pavlovic, P. E. Vértes, E. T. Bullmore, W. R. Schafer, and T. E. Nichols (2014) Stochastic blockmodeling of the modules and core of the caenorhabditis elegans connectome. PloS One 9 (7). Cited by: §IV.
  • [22] E. Simony, C. J. Honey, J. Chen, O. Lositsky, Y. Yeshurun, A. Wiesel, and U. Hasson (2016) Dynamic reconfiguration of the default mode network during narrative comprehension. Nature Comm. 7, pp. 12141. Cited by: §I, §V-E.
  • [23] J. I. Skipper, J. T. Devlin, and D. R. Lametti (2017) The hearing ear is always found close to the speaking tongue: review of the role of the motor system in speech perception. Brain and language 164, pp. 77–105. Cited by: §V-D, §V.
  • [24] J. I. Skipper and J. D. Zevin (2017) Brain reorganization in anticipation of predictable words. bioRxiv 101113. External Links: Document Cited by: §V-A.
  • [25] S. Sonkusare, M. Breakspear, and C. Guo (2019) Naturalistic stimuli in neuroscience: Critically acclaimed. Trends Cognitive Sci. 23 (8), pp. 699–714. Cited by: §I.
  • [26] O. Sporns and R. F. Betzel (2016) Modular brain networks. Annu. Rev. Psychol. 67, pp. 613–640. Cited by: §IV.
  • [27] C. M. Ting, H. Ombao, S. B. Samdin, and S. H. Salleh (2018) Estimating dynamic connectivity states in fMRI using regime-switching factor models. IEEE Trans. Med. Imaging 37 (4), pp. 1011–1023. Cited by: §VI.
  • [28] C. M. Ting, S. B. Samdin, M. Tang, and H. Ombao (2020) Detecting dynamic community structure in functional brain networks across individuals: A multilayer approach. IEEE Trans. Med. Imaging. Note: In press Cited by: §VI.
  • [29] J. Wang, Y. Ren, X. Hu, V. T. Nguyen, L. Guo, J. Han, and C. C. Guo (2017) Test–retest reliability of functional connectivity networks during naturalistic fmri paradigms. Human Brain Mapp. 38 (4), pp. 2226–2241. Cited by: §I.
  • [30] X. Wang and X. Yuan (2012) The linearized alternating direction method of multipliers for dantzig selector. SIAM J. Sci. Comput. 34 (5). Cited by: §III-1.
  • [31] J. Wright, A. Ganesh, S. Rao, Y. Peng, and Y. Ma (2009) Robust principal component analysis: Exact recovery of corrupted low-rank matrices via convex optimization. In Adv. Neural Inf. Process. Syst., pp. 2080–2088. Cited by: §I.