I Introduction
Naturalistic functional magnetic resonance imaging (fMRI) is an emerging approach in cognitive neuroscience, employing naturalistic stimuli (e.g., movies, spoken narratives, music, etc.) to provide an ecologicallyvalid paradigm that mimics real life scenarios [25]. Complementary to traditional taskbased paradigms with strictly controlled artificial stimuli, naturalistic paradigms offer a better understanding of neural processing and network interactions in realistic contexts, where stimuli typically engage the brain in continuous integration of dynamic streams of multimodal (e.g., audiovisual) information. Restingstate fMRI, although entirely unconstrained, is vulnerable to confounds and difficult to link with ongoing cognitive states. Natural stimuli like movies have shown higher testretest reliability than resting and task fMRI [29]. In this paper, we focus on a major challenge in naturalistic fMRI data analysis – to extract stimulusrelated neural dynamics from other intrinsic and noise contributions.
Early studies use traditional methods for task fMRI such as the general linear model (GLM) to detect brain activation driven by natural stimuli, using annotations of specific features or events (e.g., faces, scenes or speech in a film [3]) as regressors. However, GLMbased analysis requires explicit a priori models of relevant features of the stimuli and associated neural responses as predictors, where minor model misspecification can result in low detection power. It is therefore effective only for simple parametric designs, but extremely rigid for capturing the richer dynamical information present in natural stimuli. Intersubject correlation (ISC) analyses provide a powerful, datadriven alternative for handling naturalistic paradigms without a predefined response model, by leveraging the reliable, shared neural responses across different subjects when exposed to the same continuous naturalistic stimulation [12, 11]. By correlating fMRI time courses from the same regions across subjects, ISC can detect intersubject synchronization of brain activity in specific areas activated by the stimuli, e.g., prefrontal cortex during movie viewing [13]. It was recently extended to intersubject functional correlation (ISFC) to characterize stimulusrelated functional networks by computing the correlations between all pairs of regions across subjects [22]. Use of ISFC has revealed, for example, engagement of default mode networks during narrative comprehension [22]
. The ISFC increases specificity to stimuluslocked interregional correlations, by filtering out intrinsic, taskunrelated neural dynamics as well as nonneuronal artifacts (e.g., respiratory rate; head motion) that are theoretically uncorrelated across subjects. Timeresolved ISFC computed using a slidingwindow technique has been used to track movieinduced dynamic changes in functional connectivity
[4]. A recent study [10] explored ISC of timecourses of dynamic connectivity rather than regional activity. For a review of the ISC family of approaches see [20]. However, one limitation of IS(F)C is that it has no inherent way to detect individual differences in neural activity, due to the averagingout of all uncorrelated variations across subjects. Moreover, the ISFC analysis inherently relies on the Pearson correlations between activity time courses of subjects, and thus how it can be generalized to other measures of brain connectivity, e.g., for directed functional networks is still unclear.We propose a new approach based on lowrank plus sparse (L+S) decomposition to separate stimulusinduced and background components of dynamic functional connectivity (dFC) in naturalistic fMRI. Our method is inspired by the successful applications of L+S decomposition in video surveillance to separate a slowlychanging background (modeled by a lowrank subspace) from moving foreground objects (sparse outliers) from a video sequence
[5, 19]. We formulate a timevarying L+S model for dFC, where the timeresolved, multisubject functional connectivity (FC) matrices are represented as the sum of a lowrank component and a sparse component. The lowrank component corresponds to the similar or correlated connectivity patterns across subjects elicited by the same stimuli, which can be wellapproximated by a common lowdimensional subspace. The sparse component captures the corruptions on stimulusrelated FC edges, arising from background activity (i.e., intrinsic processes and artifacts) that are idiosyncratic and occur occasionally in certain subjects. We apply the robust principal component analysis (RPCA)
[31, 8] at each time point to recover a sequence of timeresolved lowrank FC matrices from the sparse background noise, by solving a convex optimization that minimizes a combination of nuclear norm and norm (called principal component pursuit (PCP)). Under mild assumptions (i.e., the sparse matrix is sufficiently sparse), the RPCAPCP can exactly recover the shared lowrank FC structure across subjects, even though an unknown fraction of the FC edges are corrupted by artifacts of arbitrarily large magnitude, as shown in [8] for the general case. A related work [2] imposed L+S constraints on FC matrices of individual subjects and focused on modeling raw fMRI signals, another [14] used a lowrank approximation of stacked FC networks across subjects. However, both studies are limited to analyses of static FC and restingstate data.We further introduce a novel L+S decomposition method called the fused PCP, by adding a new fusiontype penalty in the PCP to shrink the successive differences between the columns of the lowrank matrix towards zero. The fused PCP encourages homogeneity in connectivity patterns across subjects, by smoothing out the remaining stimulusunrelated individual differences in the common connectivity structure. Moreover, it can provide a better recovery of the lowrank FC matrices in presence of dense noise (i.e., many of the FC edges are corrupted) than the generic PCP. We develop an efficient algorithm via linearized alternating direction method of multipliers (ADMM) to solve the proposed fused PCP program which admits no closedform solution. We evaluated the performance of the proposed algorithm for L+S recovery via simulations. Application to naturalistic fMRI data reveals dynamic FC patterns that are reliably related to processing of continuous auditory events during movie watching.
The contributions of this work are as follows: (1) This is the first work to propose a L+S decomposition method for naturalistic fMRI, which is is modelfree and datadriven in detecting stimulusinduced dynamic FC. (2) The proposed L+S learning algorithm offers a novel way to isolate lowrank parts of correlated multisubject FC from idiosyncratic components over time. It exploits shared neuronal activity across subjects in terms of the network connectivity directly rather than the lowlevel BOLD responses in ISFC. This is equivalent to denoising the FC networks instead of raw fMRI signals via ISC, and thus extracts better FC structure information. By utilizing crosssubject reliability of FC measures captured in the lowrank part, we show that it provides a better mapping to naturalistic stimuli than the ISFC. Furthermore, our method advances a more general analysis framework applicable to other FC measures beyond the Pearson correlation in ISFC. (3) Unlike ISFC which computes a grouplevel FC profile related only to the stimuli, the proposed method recovers both the stimulusrelated and unrelated FC patterns. This allows us to quantify confounding influence on the stimulusinduced FC, including potentially interesting stimulusindependent sources of neural dynamics, such as intrinsic processes, attentional variations and other individual differences. (4) While extracting a shared connectivity structure, our method also retains subjectspecific FC patterns unavailable from ISFC, which is useful for studying individual differences in dynamic FC during naturalistic stimulation.
Ii LowRank and Sparse Matrix Decomposition for Dynamic Functional Connectivity
Iia Proposed Model
Suppose we observe a set of weighted connectivity matrices for multisubject, timedependent functional brain networks with same set of nodes, where elements in are measures of pairwise interactions between nodes in the network at time point for th subject. is the number of time points and is the number of subjects. Given the observed , we aim to separate the task or stimulusrelated brain FC dynamics from the background composed of intrinsic neural correlations, noise and other nonneuronal contributions. Our approach builds on the hypothesis that FC networks across different subjects admit a common underlying structure and exhibit similar or highlycorrelated patterns, due to shared neuronal response among subjects when experiencing the same stimuli, e.g., watching the same movie in a naturalistic setting. On the other hand, the background connectivity networks are assumed to be relatively sparse compared to the stimulusinduced ones, and some confounding fluctuations such as motionrelated artifacts occur only occasionally in certain subjects. These background activities are spontaneous, not timelocked to stimuli, and thus varying considerably across subjects.
In particular, we propose a decomposition of the timedependent connectivity matrices as a superposition of a lowrank and a sparse component that represent respectively the correlated stimulusinduced connectivity and the subjectvarying background activities. To formulate, let
be the vectorized version of
, and to denote the concatenation of the vectors of connectivity metrics over all subjects at time . We consider a lowrank plus sparse (L+S) decomposition of the timedependent matrices(1) 
where is the lowrank matrix with , is a sparse matrix with a fraction of nonzero entries, and , are respectively the column vectors of stimulusrelated and background connectivity metrics for the th subject at time . The stimulusinduced connectivity that are correlated between subjects at each time can be wellapproximated by the columns of that lie on a lowdimensional subspace spanned by common bases. For example, rank one yields a constant connectivity pattern over all subjects. The columns of captures the subjectspecific background activities. Unlike the small Gaussian noise assumption, entries of may have arbitrarily large magnitude with unknown sparse support, and thus capturing random, gross corruptions from the background events on the stimulusrelated information in functional connectivity.
IiB L+S Decomposition
IiB1 Principal Component Pursuit
To recover the sequence of lowrank matrices of stimulusrelated connectivity from the corrupted observations in (1), we consider solving the following optimization problem
(2) 
where is the nuclear norm of matrix
, i.e. the sum of all its singular values,
is the norm of and is a regularization parameter controlling the tradeoff between the lowrank and sparse components. The nuclear norm induces sparsity of the vector of singular values, or equivalently for the matrix to be low rank, while the penalty imposes elementwise sparsity of . Minimization of (2) corresponds to applying the wellknown PCP approach [8] at each time point. Under the incoherence conditions on the row and column subspaces of , the PCP solution can perfectly recover the lowrank and sparse components, provided that the rank of is not too large and is reasonably sparse [8, 9]. Many efficient and scalable algorithms such as the augmented Lagrange multipliers (ALM) method [18] have been proposed to solve this convex optimization.IiB2 Fused L+S Recovery
The slowlyvarying or possibly constant stimulusrelated connectivity patterns across subjects may not be fully explained by a lowrank structure estimated in (
2). To better capture this intersubject similarity in connectivity structure, we introduce a novel L+S decomposition by solving a fused version of PCP (fused PCP)(3) 
Here, the additional fused term , regularized by tuning parameter , encourages the stimulusinduced group homogeneity by penalizing the differences in the connectivity profiles between subjects. For some sufficiently large , the penalty shrinks intersubject differences in some connectivity coefficients to 0, resulting in similar (but not necessarily identical) connectivity patterns across subjects. Clearly, the model (3) includes the original L+S decomposition (2) as a special case when .
Define a matrix where
is firstorder difference matrix and is a identity matrix. We can reformulate the objective function in (3) as
(4) 
Iii Optimization
For the fused L+S decomposition, we develop an ADMM algorithm to solve the convex optimization problem (3) which has no closed form solution. We introduce a set of surrogate variables where . Then, the minimization of (4) is equivalent to
(5) 
The augmented Lagrangian function of (5) is
(6) 
where are Lagrange multipliers, is a prespecified stepsize parameter, denotes the norm and is the Frobenius norm. In a rescaled form, (6) can be written as
(7) 
where and are rescaled Lagrange multipliers.
We can obtain the parameter estimates by minimizing the augmented Lagrangian via ADMM algorithm [6], which alternates between minimizing (7) with respect to each set of primal variables sequentially, and the updating of the dual variables . Let initialize the parameters by for . The proposed ADMM algorithm solves the following subproblems iteratively until convergence, with parameter updates at each iteration given by
(8)  
(9)  
(10)  
(11)  
(12) 
Next we derive the solution for each subproblem.
Iii1 Update
To update , the subproblem (8) can be separated into minimizations with respect to ’s at individual time points (similarly for subproblems (9)(10)), where each is updated independently
(13) 
Due to the nonidentity matrix , (13) does not have a closedform solution. Motivated by the linearized ADMM method in [30]
recently applied to lasso regression
[17, 16], we linearize the quadratic term in (13) as(14) 
where the parameter controls the proximity to . Substituting (14) into (13) and after some algebra, we obtain the following approximate subproblem for (13)
(15) 
where is a matrix such that with . We show that (15) can be solved via the singular value thresholding (SVT) [7]. More precisely, let be the elementwise softthresholding (shrinkage) operator for a matrix at level , . We also denote by the singular value thresholding operator given by where
is the singular value decomposition (SVD) of
. Then, admits an explicit solution(16) 
The derivation of (16) is given in Supplementary Section I. Note that when , (16) reduces to the solution of the ALM algorithm for the lowrank component [18] in the original L+S decomposition problem (2).
Iii2 Update
The minimization in subproblem (9) with respect to at each time point is equivalent to
We can get a closedform solution by applying the elementwise softthresholding operator
(17) 
Iii3 Update
Similarly, the subproblem (10) with respect to each is
where the solution is also a softthresholding operation
(18) 
We chose the parameter , as suggested by [8] for the PCP. The tuning parameter can be selected using crossvalidation. To establish the convergence of the linearized ADMM algorithm, the parameter is required to satisfy where denotes the spectral radius [16]. We set . We set the stopping criterion as
or the maximum iteration number of 5000 is reached. The proposed ADMM algorithm for solving the fused L+S decomposition is summarized in Algorithm 1.
Iv Simulation
We evaluate the performance of the proposed fused L+S decomposition method via simulation studies. The goal is to assess the performance of L+S algorithms in separating the common structure from individualspecific background components in multisubject FC networks. Here, we focus on the static snapshot of dynamic networks at a particular time point, and drop the index for notational brevity. We generate the data matrix of concatenated connectivity metrics of weighted, undirected networks for subjects as follows
(19) 
where , with , whose columns correspond to a set of vectorized basis connectivity matrices that are shared across subjects, and with being a vector of mixing weights for subject . To emulate the modular structure of brain networks [26] induced by stimuli or tasks, we generate each basis connectivity matrix from a stochastic block model (SBM) [21]
with two equalsized communities with intra and intercommunity edge probabilities of 0.95 and 0.2, respectively. The edge strengths are randomly drawn from the uniform distribution
. The subjectdependent mixing weightsare clustered according to two subgroups, which are independently sampled from normal distributions
for and for to represent the strong and the weak response to the stimulus in the two subgroups, respectively. We set such that the variability within each subgroup is small. The resulting matrix has rank , and the connectivity profiles in the columns of are similar within each subgroup. The background components of individual subjects are generated independently as sparse symmetric matrices with support set uniformly chosen at random at a constant sparsity level (fraction of nonzero entries), and whose nonzero entries are i.i.d. samples from indicating large magnitude of noise in . The simulations were replicated 100 times.We compare the performance of the fused L+S decomposition with the original version of PCP in recovering and from the observed matrix . For the fused L+S, we select the optimal parameter over a large grid of values, which gives the best recovery performance on an independently generated validation set. To measure the accuracy of recovery, we computed the root mean squared error (RMSE) between the groundtruth and estimated lowrank and sparse components, defined by and .
We investigate robustness of the methods to increasing rank and sparsity level of the groundtruth components. Fig.1 plots the average RMSEs over 100 replications as a function of (varied from 0.1 to 0.8 with an increment of 0.1) for with fixed and . As expected, estimation errors of both methods for the lowrank and sparse components increase with and , generally. The larger fraction of corrupted entries and the rank of the data matrix render the estimated and more deviated from the groundtruth, in consistency with theoretical results in previous studies [8]. Both methods perform comparably when is small. However, as increases, the fused L+S estimator clearly outperforms the original version with substantially lower errors with a slower growth rate. This suggests the robustness of the fused L+S method under the presence of severe dense noise. It provides a better recovery of the lowrank matrix of shared connectivity patterns even if almost all of its entries are grossly corrupted by the background components, as evidenced by the relatively stable errors when and . This is due to the additional fused penalty term which smooths out the individualspecific random background noise, when leveraging on the intersubject similarity of the stimulusinduced connectivity profiles.
Fig.2 shows fused L+S decomposition of a simulated multisubject connectivity matrix with underlying rank and a fraction of corrupted edges in each subject. The estimate successfully recovers the correlated connectivity patterns across subjects with distinct subgroup shared responses from the highlycorrupted , while filters out subjectspecific background noise without prior information of the locations of the corrupted edges (the support of ).
V Analysis of MovieWatching fMRI
In this section, we applied the proposed fused L+S decomposition approach to separating the stimulusrelated and background components in timevarying FC from fMRI data of a group of subjects exposed to complex naturalistic stimulation from movie viewing. We tested the established hypothesis that, as speech becomes more difficult to process, e.g., as through speechinnoise, somatosensory, motor, and auditory systems are engaged more to facilitate processing [23].
Va Data Acquisition & Stimuli
Thirteen subjects watched a television show during which 3 Tesla fMRI data were collected at the Joan & Sanford I. Weill Medical College of Cornell University by Dr Skipper (GE Medical Systems, Milwaukee, WI; TR = 1.5 s; see [24] for full details). The show, Are you smarter than a 5th grader?, was 32min 24sec long. Among other features, audio events were annotated at a 25ms resolution by a trained coder. These were grouped into five general categories including noise, music, speech (speech that occurred in silence), sp+noise (speech that cooccurred with noise, including background talking, clapping, music etc.) and silence. The audio annotations were downsampled to match the resolution of fMRI data by selecting the auditory category that occurred most frequently within each 1.5 s TR.
VB Preprocessing
The fMRI data were minimally preprocessed, including slice timing correction, despiking, registration, nonlinear transformation to MNI coordinate space, and masking of voxels outside of the brain. We used the Freesurfer parcellation to obtain 16 regions of interest (ROIs) associated with language and auditory processing: These ROIs include the central sulcus (CS), planum polare (PP), planum temporale (PT), postcentral gyrus (PoCG), precentral gyrus (PreCG), superior temporal gyrus (STG), transverse temporal gyrus (TTG) and transverse temporal sulcus (TTS) from both left and right hemispheres. These regions are commonly associated with auditory, somatosensory and motor processing generally and speech perception more specifically. The mean BOLD time series were computed across voxels within each ROI.
VC Shared Structure in Dynamic Connectivity Across Subjects
To characterize dynamic FC, we computed timevarying correlations between the 16 ROI fMRI time series for each subject, using sliding windows of 15 TRs (22.5 s) with step size of 1 TR (1.5 s) between windows. The choice of window length of 22.5 s was based on the inverse of the lowest frequency present in the data [15] after highpass filtering at 0.05 Hz. This window length has been used in [4] to compute the slidingwindow ISFC for moviewatching fMRI data. At each time point , the vectorized correlation coefficients of individual subjects are stacked as columns of matrix . We then performed the fused L+S decomposition on the multisubject dynamic correlation matrices by solving (3) using the proposed ADMM algorithm. We selected the penalty parameter based on the performance of the resulting lowrank models in predicting the timevarying auditory stimuli (see Section IVE). From a range of values from 0.001 to 0.5, we identified as optimal, giving the highest classification accuracy on the validation set in a leaveonesubjectout crossvalidation. Fig. 3 shows the extracted timevarying lowrank and sparse components averaged across subjects and across all edges. In Fig. 3 (left), the lowrank component shows noticeable timevarying structure with higher specificity in time compared to the original dynamic correlations. It provides a better localization of dynamic changes in connectivity which were possibly elicited by the movie and timelocked to the processing of specific ongoing movie events. From Fig. 3 (right), it is apparent that the lowrank components are highly synchronized across subjects over time, while identifying some intersubject variability possibly arising from varied responses of different individuals to the stimuli. In contrast, the sparse components show temporally uncorrelated patterns. The lowrank part is specifically sensitive to synchronized responses across subjects. This suggests that the proposed fused L+S decomposition is capable of isolating reliable FC changes driven by a correlated source across subjects (exposed to the same movie stimuli in our case). The uncorrelated background sources of dynamic FC across subjects are filtered out as the residual sparse components. These sources may correspond to subjectspecific dynamic fluctuations of FC networks as well as nonneuronal artifacts unrelated to the stimulusprocessing.
For illustration, Fig. 4 shows decomposition for a snapshot of connectivity networks at time point TRs (selected based on median correlation over time). The lowrank component identifies a common network structure underlying all subjects, as evident from the similar and highlycorrelated connectivity patterns detected across subjects. As validated by simulation, our algorithm can reliably recover the common structure from the subjectspecific background noise, which could be arbitrarily large in magnitude and contaminated most of the network correlations (e.g., the dense residuals for subject 3). The estimated rank was , where individual networks (columns of ) can be expressed as a linear combination of common basis connectivity matrices with different mixing weights for each subject. See Supplementary Fig. 1 for additional example decomposition at TRs.
VD LowRank Components Reveal StimulusInduced Connectivity Patterns
To test the aforementioned hypothesis, we examined whether the extracted lowrank components are related to the auditory stimuli present in the movie. We did so by fitting a multinomial logistic regression model to learn the mapping between the audio annotations and the timeresolved lowrank connectivity metrics. We set the auditory category labels as responses and the connectivity edges as predictors. The silence events were used as the reference category. To evaluate the goodnessoffit and how well the lowrank components are temporally mapped to the auditory events, we used the fitted logistic model to decode the probability of the actual category labels at each time point. Fig.
5 shows the decoded time courses for each auditory category. For all categories, high probabilities are precisely aligned with the intervals when those categories are present, indicating accurate mapping of the lowrank components to the stimulus annotations.In Fig. 6, we plot the mean functional connectivity maps and the estimated regression coefficient maps associated with the four auditory categories in the movie. The results are based on the lowrank components at a lag of 3 s (i.e., at 2 time points before stimulus onsets at ). The latter was empirically determined to exhibit the strongest connectivity by testing at previous and future time points (at intervals of 1.5 s or 1 TR) relative to the stimulus onsets (See Supplementary Fig. 2 and Fig. 3). This lag was expected because it approximately includes to delay, rise time and peak of a canonical hemodynamic response function associated with any one stimulus lasting about 1.5 s. For details, see Supplementary Section III for the additional analyses on the temporal effect of brain response. As shown in Fig. 6, the lowrank component detected distinct connectivity patterns, with relatively different distributions of connectivity across the different auditory environments. As hypothesized, we observe a noticeable increase in connectivity when speech was accompanied by noise (sp+noise
), compared to speech in silence. We used a grouplevel ttest to contrast the FC between the
sp+noise and speech conditions. Fig. 7(a) shows difference in the mean FC matrices (across subjects and time points) from the lowrank component between the two conditions. The connections were thresholded at (Bonferronicorrected). Specifically, there was significantly higher FC throughout regions typically involved in speech perception and language comprehension for speechinnoise compared to speech. This included connections PT.RTTS.R and PT.RSTG.R which showed the largest increase in FC strength. Furthermore, our method identified significant increase in connectivity between the regions of auditory and speech processing with the premotor, motor and somatosensory regions in response to speech mixed with noise, e.g., TTS.LPoCG.R, PT.RPreCG.R and TTG.RPoCG.R, as shown in Fig. 7(b).In summary, results support hypotheses about the role of ‘the motor system’ in speech perception [1, 23]. In particular, a distributed set of brain regions involved in producing speech are dynamically recruited in perceiving speech, particularly in more effortful listening situations. This might be because speech production regions are engaged in predicting upcoming speech sounds. In noisy environments, this process needs to be engaged more as predictions are inherently less accurate.
VE AcrossSubject Prediction of Stimulus Annotations
We trained a support vector machine (SVM) classifier on the timevarying lowrank FC components to evaluate any improvement in predicting the different auditory categories over the original observed version of timevarying FC metrics. We also compared the performance with the intersubject functional correlation (ISFC)
[22], a widelyused method for isolating stimulusinduced interregional correlations between brains exposed to the same stimuli from other intrinsic and nonneuronal activities. We computed the ISFC in a slidingwindow manner as in [4], which showed that the timevarying ISFC can detect moviedriven fMRI FC changes by averaging out uncorrelated dynamic fluctuations. Following [22], we performed acrosssubject classification but using the timeresolved instead of static FC patterns to predict the stimulus events over time. We used leaveonesubjectout crossvalidation. In each fold, one subject was held out for testing and the remaining subjects were used to train the SVM. We computed the timeresolved lowrank components for the training set from the standard FC, using the fused L+S algorithm. Since the test set is unseen from the training set, we extracted the lowrank components for the test subject via orthogonal projection of the test data , where is a basis matrix estimated based on the SVD of train data . Classification accuracy was computed as the percentage of time points correctly assigned to the true category labels.Fig. 8 shows the classification results using the different timevarying FC metrics at a lag of 3 s relative to the stimulus annotations. Analysis of the temporal effects on the classification performance shows that the highest accuracy was obtained at this time lag (Supplementary Fig. 5). The lowrank components significantly outperformed both the standard FC and ISFC in discriminating ongoing auditory events in the movie. The ISFC is slightly better than the standard FC. The superior decoding performance suggests that the lowrank components are more stimulusrelated, extracting novel information about the stimulusinduced dynamic functional changes not captured by the FC and ISFC. Among the L+S algorithms, addition of fused penalty improves the crosssubject prediction considerably, due to the further smoothing of subjectspecific fluctuations to uncover shared neuronal responses that are timelocked to the same stimulus processing. As expected, the residual sparse components, corresponding to uncorrelated effects of intrinsic neuronal processes and noise, were poorly predictive of the stimuli across subjects.
Vi Conclusion
We developed a novel L+S decomposition method for separating stimulusinduced and background components of dynamic FC networks during naturalistic neuroimaging, by leveraging on the timelocked nature of naturalistic stimuli. The method provides a principled way to recover both the stimulusevoked, shared FC dynamics across individuals and the stimulusunrelated idiosyncratic variations in individuals, respectively modeled as the lowrank and sparse structures of multisubject dFC networks. The proposed fused PCP solved via an efficient ADMM algorithm can capture the stimulusinduced similarities in FC profiles between subjects more effectively, and its robustness to dense noise allows us to filter out corruptions of FC edges that are not necessarily sparse. Our proposed framework is general, broadly applicable to other neuroimaging data such as electroencephalography (EEG), and can incorporate other FC measures beyond simple correlations in ISCbased analyses. In an application to moviewatching fMRI, our method identified timelocked changes in FC networks across subjects, which were meaningfully related to the processing of complex, timevarying auditory events in the movie, e.g., dynamic recruitment of speech production systems in perceiving speech in noisy environments. It also revealed potentially interesting individual differences in FC patterns that may relate to behavioral appraisal of stimuli. The extracted lowrank FC components also show better prediction of the movie auditory annotations than ISFC, suggesting the gain in information about the naturalistic stimuli by using shared network structure instead of synchronization of lowlevel regional activity. Thus, the proposed L+S recovery approach opens new opportunities for mapping brain network dynamics to stimulus features and behavior during naturalistic paradigms. Future studies can explore potential applications of L+S analysis of natural fMRI to connectomebased prediction of individual traits and behavior, and diagnosis of cognitive and affective disorders. Further extensions are needed to improve the scalability of L+S learning algorithm to handle largesized brain networks. The dynamic FC analysis can also be extended to investigate staterelated changes in the network structure, e.g., dynamic community structure across subjects during naturalistic paradigms, by applying regimeswitching models [27, 28] on the extracted timeresolved FC components.
References
 [1] (2012) The neural bases of difficult speech comprehension and speech production: two activation likelihood estimation (ale) metaanalyses. Brain and Language 122 (1), pp. 42–54. Cited by: §VD.
 [2] (2018) Low rank and sparsity constrained method for identifying overlapping functional brain networks. PloS one 13 (11), pp. 1–19. Cited by: §I.
 [3] (2004) Functional brain mapping during free viewing of natural scenes. Human Brain Mapp. 21 (2), pp. 75–85. Cited by: §I.
 [4] (2018) Brain dynamics in ASD during moviewatching show idiosyncratic functional integration and segregation. Human Brain Mapp. 39 (6), pp. 2391–2404. Cited by: §I, §VC, §VE.
 [5] (2014) Robust pca via principal component pursuit: A review for a comparative evaluation in video surveillance. Comput. Vision Image Understanding 122, pp. 22–34. Cited by: §I.
 [6] (2011) Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 3 (1), pp. 1–122. Cited by: §III.
 [7] (2010) A singular value thresholding algorithm for matrix completion. SIAM J. Optimization 20 (4), pp. 1956–1982. Cited by: §III1.
 [8] (2011) Robust principal component analysis?. J. ACM 58 (3), pp. 1–37. Cited by: §I, §IIB1, §III3, §IV.
 [9] (2011) Ranksparsity incoherence for matrix decomposition. SIAM J. Optimization 21 (2), pp. 572–596. Cited by: §IIB1.
 [10] (2020) Intersubject consistent dynamic connectivity during natural vision revealed by functional mri. NeuroImage, pp. 1–9. Cited by: §I.
 [11] (2010) Reliability of cortical activity during natural stimulation. Trends Cognitive Sci. 14 (1), pp. 40–48. Cited by: §I.
 [12] (2004) Intersubject synchronization of cortical activity during natural vision. Science 303 (5664), pp. 1634–1640. Cited by: §I.
 [13] (2008) Intersubject synchronization of prefrontal cortex hemodynamic activity during natural viewing. Open Neuroimaging J. 2, pp. 14–19. Cited by: §I.

[14]
(2020)
Estimating functional connectivity networks via lowrank tensor approximation with applications to mci identification
. IEEE Trans. Biomed. Eng. 67 (7), pp. 1912–1920. Cited by: §I.  [15] (2015) On spurious and real fluctuations of dynamic functional connectivity during rest. NeuroImage 104, pp. 430–436. Cited by: §VC.
 [16] (2020) The linearized alternating direction method of multipliers for lowrank and fused lasso matrix regression model. J. Appl. Stat., pp. 1–18. Cited by: §III1, §III3.
 [17] (2014) Linearized alternating direction method of multipliers for sparse group and fused lasso models. Comput. Stat. Data Anal. 79, pp. 203–221. Cited by: §III1.
 [18] (2010) The augmented lagrange multiplier method for exact recovery of corrupted lowrank matrices. arXiv preprint arXiv:1009.5055. Cited by: §IIB1, §III1.
 [19] (2015) Background subtraction based on lowrank and structured sparse decomposition. IEEE Trans. Image Process. 24 (8), pp. 2502–2514. Cited by: §I.
 [20] (2019) Measuring shared responses across subjects using intersubject correlation. Cited by: §I.
 [21] (2014) Stochastic blockmodeling of the modules and core of the caenorhabditis elegans connectome. PloS One 9 (7). Cited by: §IV.
 [22] (2016) Dynamic reconfiguration of the default mode network during narrative comprehension. Nature Comm. 7, pp. 12141. Cited by: §I, §VE.
 [23] (2017) The hearing ear is always found close to the speaking tongue: review of the role of the motor system in speech perception. Brain and language 164, pp. 77–105. Cited by: §VD, §V.
 [24] (2017) Brain reorganization in anticipation of predictable words. bioRxiv 101113. External Links: Document Cited by: §VA.
 [25] (2019) Naturalistic stimuli in neuroscience: Critically acclaimed. Trends Cognitive Sci. 23 (8), pp. 699–714. Cited by: §I.
 [26] (2016) Modular brain networks. Annu. Rev. Psychol. 67, pp. 613–640. Cited by: §IV.
 [27] (2018) Estimating dynamic connectivity states in fMRI using regimeswitching factor models. IEEE Trans. Med. Imaging 37 (4), pp. 1011–1023. Cited by: §VI.
 [28] (2020) Detecting dynamic community structure in functional brain networks across individuals: A multilayer approach. IEEE Trans. Med. Imaging. Note: In press Cited by: §VI.
 [29] (2017) Test–retest reliability of functional connectivity networks during naturalistic fmri paradigms. Human Brain Mapp. 38 (4), pp. 2226–2241. Cited by: §I.
 [30] (2012) The linearized alternating direction method of multipliers for dantzig selector. SIAM J. Sci. Comput. 34 (5). Cited by: §III1.
 [31] (2009) Robust principal component analysis: Exact recovery of corrupted lowrank matrices via convex optimization. In Adv. Neural Inf. Process. Syst., pp. 2080–2088. Cited by: §I.