Fast shared response model for fMRI data

09/27/2019 ∙ by Hugo Richard, et al. ∙ Inria 6

The shared response model provides a simple but effective framework toanalyse fMRI data of subjects exposed to naturalistic stimuli. However whenthe number of subjects or runs is large, fitting the model requires a large amountof memory and computational power, which limits its use in practice. In thiswork, we introduce the FastSRM algorithm that relies on an intermediate atlas-based representation. It provides considerable speed-up in time and memoryusage, hence it allows easy and fast large-scale analysis of naturalistic-stimulusfMRI data. Using four different datasets, we show that our method outperformsthe original SRM algorithm while being about 5x faster and 20x to 40x morememory efficient. Based on this contribution, we use FastSRM to predict agefrom movie watching data on the CamCAN sample. Besides delivering accuratepredictions (mean absolute error of 7.5 years), FastSRM extracts topographicpatterns that are predictive of age, demonstrating that brain activity duringfree perception reflects age.



There are no comments yet.


page 5

page 9

page 18

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

When exposed to naturalistic stimuli (e.g. movie watching), subjects’ experience is closer to their every-day life than with classical psychological experiments. This makes naturalistic paradigms an attractive class of stimulation protocols for brain imaging. While there is a broad interest in understanding how the brain reacts in such ecological conditions, the recorded brain activity is difficult to analyse. Standards methods such as the general linear model Poline and Brett (2012) require the experimenter to construct a design matrix that models features of the presented stimuli across time. Such design matrices are notoriously difficult to construct for naturalistic stimuli as one has to rely on manual annotations (see Huth et al. (2012)

) or deep learning techniques (see e.g.

Güçlü and van Gerven (2017), Eickenberg et al. (2017), Richard et al. (2018) or Güçlü and van Gerven (2015)) that are hard to use, and provide high-dimensional, cumbersome models of the stimulus.

Hasson et al. (2004) has shown that brains exposed to the same natural stimuli exhibit synchronous activity. The shared response model (SRM) Chen et al. (2015) models this behaviour and extracts a common response from different subjects exposed to the same stimuli and subject-specific spatial components. The resulting shared response can serve as a design matrix, while the spatial components are naturally seen as weighting factors.

SRM has initially been designed to work within regions of interest using few subjects. It has been used in Chen et al. (2015) to transfer knowledge between subjects allowing precise location of a 15s time segment of fMRI data from a left-out subject using data from other subjects exposed to the same stimuli.

A first big step towards more scalability has been made in Anderson et al. (2016) reducing fitting time and memory requirements by several orders of magnitude thanks to a smart use of the inversion lemma. After this improvement, studies using larger regions of interest have emerged such as Vodrahalli et al. (2018), which uses SRM to predict text embeddings from fMRI data or Chen et al. (2017), which shows that shared memories also come with shared structure in neural activity. However, when using full brain data and a large number of subjects and runs, computational costs are still very high. Memory requirements are difficult to meet since all data have to be loaded in memory and since all full brain spatial components of all subjects are updated at each iteration, this leads to a heavy computational burden.

Fortunately, these high costs can be reduced. Intuitively, since the shared response lives in a reduced space, a compressed representation of the input is good-enough to find a suitable estimate. With the use of off-the-shelf atlases and careful memory management we implement this idea and build FastSRM.

FastSRM is scalable with the number of runs and subjects. Fitting time and memory requirements are reduced considerably, making it possible to fit FastSRM using a laptop in a reasonable amount of time. We also find that the use of atlases limits redundancy and diminishes the noise by smoothing (as seen in Varoquaux and Craddock (2013)) yielding better performance than current implementations. FastSRM makes large-scale analysis of movie-watching fMRI fast and easy. We demonstrate its usefulness on CamCAN data where we predict age from movie watching data with a mean absolute error (MAE) of 7.5 years. We also show in an encoding experiment that FastSRM’s ability to transfer data between subjects is superior to current implementations (Anderson et al. (2016)).

Our code is freely available at

There is a long history of using latent factor models for fMRI data analysis. ICA was first applied on fMRI data in McKeown et al. (1998) as an alternative to the generalized linear (GLM) model. A few years later, Beckmann and Smith (2004) proposed Probabilistic ICA reducing overfitting problems of original ICA by introducing a Gaussian noise model and low rank structure. In order to be able to compare patterns across subjects, ICA can be applied on time-wise concatenated data Calhoun et al. (2001). In 2010, Varoquaux et al. (2010) introduces CanICA that yields more stable group decompositions than previous group ICA methods (Calhoun et al. (2001), Beckmann and Smith (2005) and Smith et al. (2004)).

Since then, a number of different factor models have been applied to fMRI data such as non-negative matrix factorization Xie et al. (2017) or dictionary learning Varoquaux et al. (2011). All models enforce constrains on the data that are more or less realistic. Abraham et al. (2013) shows that total variation constrains on spatial components yield accurate and stable decompositions across subjects. However, with high quantities of data, the impact of such regularizations on the result vanishes (see Dohmatob et al. (2016)) and in this case one should therefore favor efficient algorithms (such as the online dictionary learning implementation of Mensch et al. (2016)). In practice, most factor algorithms have been used to derive atlases from rest data. But other applications exist. In Varoquaux et al. (2013), dictionary learning is used to derive an atlas from task data (contrast maps) and ICA is used in Calhoun et al. (2004) and Calhoun et al. (2002) to study respectively the effect of alcohol and speed on driving behavior thanks to a simulated driving protocol.

SRM has first been introduced in Chen et al. (2015). Like Lashkari and Golland (2009), it learns subject-specific atlases and extracts a shared functional representation from the fMRI data of multiple subjects exposed to the same stimuli. It relies on the hypothesis that orthonormality constrains put on spatial components allow for the extraction of more interpretable spatial components from naturalistic-stimulus fMRI data. SRM is not sensitive to the spatial variability of activated area between subjects since the only common features across participants are time courses making up the shared representation. It is thus related to template estimation methods such as hyperalignment Guntupalli et al. (2016) or optimal transport Bazeille et al. (2019). Many variants of SRM exists. Chen et al. (2016b)

uses a convolutional autoencoder to preserve spatial locality,

Shvartsman et al. (2017) uses matrix normal prior on spatial components and shared response, Turek et al. (2018) adds a subject-specific component and Turek et al. (2017) gives a semi-supervised version of SRM. All of these methods suffer from the efficiency issues already mentioned for standard SRM. Note that, unless specified otherwise, whenever we refer to SRM, we refer to the current most efficient implementation of ProbSRM as provided by Anderson et al. (2016) which is freely available in the brainiak library

FastSRM needs atlases with a sufficient number of regions to work. Good atlases of various size are available in the literature such as Basc (up to 444 parcels) Bellec et al. (2010), Shaeffer (up to 800 parcels) Schaefer et al. (2017) or MODL (up to 1024 parcels) Mensch et al. (2018). We show that taking any such atlas yields quantitatively similar results.

2 Material and methods

2.1 The shared response model (SRM)

The shared response model is a latent factor model. The brain images of subject during run are stored in a matrix where is the number of voxels and the number of acquired brain images. At time , the brain volume is modeled as a weighted sum of orthonormal spatial components stored in .

Keeping things simple, we assume that all subjects have the same number of voxels and all runs have the same number of timeframes. The extension to the more general case where each run has its own number of timeframes and each subject its own number of voxels is straightforward. In our implementation runs can have different number of timeframes but each subject has the same number of voxels. Typical values for the number of voxels , the number of timeframes , the number of runs , the number of subjects and the number of components are given in Table 1. Let us introduce the following notations:

  • For each subject , the concatenation of the acquisition data for all runs:

  • The concatenation of the brain acquisition of all subjects for all runs:

  • The concatenation of the spatial components of all subjects:

    where contains the spatial components of subject .

  • The concatenation of the weights of all runs:

    where contains the weights of run across time.

Formally SRM is defined by:

Where the spatial components , the shared response and the noise need to be estimated from . An illustration of this definition is given in Figure 1. Framed this way, group versions of dictionary learning, ICA, blind signal separation or matrix factorization can be seen as particular instances of shared response model. Most versions of SRM impose orthonormal constrains on spatial components of each subject:


is the identity matrix of size


Figure 1: Shared response model: The raw fMRI data are modeled as a weighted combination of subject-specific spatial components with additive noise. The weights are shared between subjects and constitute the shared response to the stimuli.

2.2 Deterministic SRM model (DetSRM)

The deterministic SRM model assumes Gaussian noise with the same variance for all subjects and orthonormal spatial components. Formally the model reads:

Maximizing the log-likelihood we obtain the following optimization problem:

This can be solved efficiently using alternate minimization on and . At each iteration we have two problems to solve that have a closed form solution:

where SVD stands for singular value decomposition, and

Assuming , the time-complexity of this approach such as implemented in the brainiak library is in and storage requirements are in . This means that the method becomes expensive whenever the number of subjects or runs becomes large.

2.3 Probabilistic SRM model (ProbSRM)

In the probabilistic SRM, spatial components are assumed orthonormal, the shared response is modeled by its covariance matrix and the variance of the Gaussian noise is assumed different for different subjects. In the original paper Chen et al. (2015), an intercept is also learned but since we remove the mean of each time-course as a preprocessing step, it is of no use here. Formally the model reads:

The optimization is done using an expectation maximization algorithm described in

Anderson et al. (2016) and Chen et al. (2015). Assuming , the time-complexity of this approach such as implemented in the brainiak library is and storage requirements are in . So ProbSRM is also expensive whenever the number of subjects or runs becomes large.

2.4 FastSRM model

When we deal with large datasets ( are large), above implementations require huge computational power. We introduce FastSRM, a fast and memory-efficient algorithm. In a first step, we project the data onto a chosen atlas with parcels. The atlas can be either probabilistic or a strict partition of the set of voxels but the number of parcels of the atlas should be larger than the number of components of the FastSRM model and small compared to the number of voxels. In typical settings reaches hundreds to one thousand. Projection onto the atlas yields reduced data .

In cases where the atlas is a partition, projecting onto the atlas is equivalent to averaging brain activation in each parcel of the atlas. Formally, denoting the parcel of atlas , we compute the projection over the atlas by:

In a second step we apply our preferred SRM algorithm on the reduced data to find the shared response in reduced space (in our implementation we use a deterministic SRM). Since is small compared to , this step is very fast even if the number of iterations is high.

where the reduced maps and the shared response in reduced space are output of the deterministic SRM algorithm (DetSRM). The spatial components of each subject are recovered by orthonormal regression using the shared response in reduced space and the data :

In FastSRM as well as in ProbSRM and DetSRM, fitting the algorithm only means learning the spatial components. If a temporal model is needed it has to be recomputed a posteriori. In order to compute the shared response from fMRI data of subjects in a particular run , one has just to average the projection of the data onto the basis of each subject:

Note that when the shared response of train data is needed one cannot use directly the FastSRM shared response in reduced space, since it does not have the right scale. However, the spatial components obtained by orthonormal regression from the ill-scaled shared response are valid. Indeed if we multiply the shared response by a scaling factor , for any subject the singular value decomposition of is given by where is the singular decomposition we would obtain with and therefore the spatial components of subject are given by which is independent from the scale factor.

The orthonormal projection can easily be replaced by any kind of regression. One can easily impose sparsity, non-negativity, smoothness or similarity constrains on spatial components instead of orthonormality to obtain more interpretable patterns. This however leads to more heavier computations. In our implementation, we use orthonormal regression. Assuming , the bottleneck is the projection onto the atlas which yields a time complexity in . We need to keep spatial components, the atlas as well as one run of one subject in memory which yields a memory complexity of . If parallelization is used memory requirements become and fitting time becomes where is the number of jobs. It is also possible to write spatial components on disk instead of keeping them in memory which adds about read/write operations but reduces the memory complexity to . In our experiments we write spatial components on the disk.

Figure 2: FastSRM algorithm In step 1, data are projected onto an atlas (top). In step 2 a deterministic SRM algorithm is applied on reduced data to compute the shared response(middle). In step 3, spatial components are recovered by regression from the shared response (bottom).

2.5 Experiments

2.5.1 Datasets

We use five fMRI datasets of subjects exposed to naturalistic stimuli. When needed, datasets are preprocessed with FSL using slice time correction, spatial realignment, coregistration to the T1 image and affine transformation of the functional volumes to a template brain (MNI). Using nilearn Abraham et al. (2014), preprocessed data are resampled, masked (using a full brain mask available at, detrended and standardized after a 5 mm smoothing is applied.


In SHERLOCK dataset, 17 participants are watching ”Sherlock” BBC TV show (episode 1). These data are downloaded from Data were acquired using a 3T scanner with an isotropic spatial resolution of 3 mm. More information including the preprocessing pipeline is available in Chen et al. (2016a). Subject 5 is removed because of missing data leaving us with 16 participants. Although SHERLOCK data contains originally only 1 run, we split it into 4 runs of 395 timeframes and one run of 396 timeframes for the needs of our experiments.


In FORREST dataset 20 participants are listening to an audio version of the movie Forrest Gump. FORREST data are downloaded from OpenfMRI Poldrack et al. (2013). Data were acquired using a 7T scanner with an isotropic spatial resolution of 1 mm (see more details in Hanke et al. (2014). More information about the forrest project can be found at Subject 10 and run 8 are discarded because of missing data. We therefore use full brain data of 19 subjects split in 7 runs of respectively 451, 441, 438, 488, 462, 439 and 542 timeframes.


In RAIDERS dataset, 10 participants are watching the movie ”Raiders of the lost ark”. The RAIDERS dataset pertains to the Individual Brain Charting dataset (Pinho et al. (2018)). They acquired at NeuroSpin using a 3T scanner with an isotropic spatial resolution of 3 mm. The RAIDERS dataset reproduces the protocol described in Haxby et al. (2011). We use full brain data of 10 subjects split in 9 runs of respectively 374, 297, 314, 379, 347, 346, 350, 353 and 211 timeframes.


In CLIPS dataset, 10 participants are exposed to short clips. The CLIPS dataset also pertains to the Individual Brain Charting dataset (Pinho et al. (2018)). It reproduces the protocol of original studies described in Nishimoto et al. (2011) and Huth et al. (2012). In our experiments we use the data of 10 participants acquired in 17 runs of 325 timeframes.

At the time of writing, the CLIPS and RAIDERS dataset from the individual brain charting dataset are not yet public, but they will be in the future. Protocols on the visual stimuli presented are available in a dedicated repository on Github: The informed consent of all subjects was obtained before scanning.


In CamCAN dataset, 647 participants aged from 18 to 88 years are watching Alfred Hitchcock’s ”Bang! You’re Dead” (edited so that it lasts only 8 minutes). CamCAN consists of data obtained from the CamCAN repository (available at (see Taylor et al. (2017) and Shafto et al. (2014)). We use all available subjects and runs yielding 647 participants and 1 run of 193 timeframes.

A summary about the size of each dataset is available in Table 2.

2.5.2 fMRI reconstruction: Evaluate the ability to recover BOLD signal on left-out runs

When subjects are exposed to the same stimuli, SRM algorithms posit that the recorded fMRI data can be modeled as a product of two matrices, one of which is fixed across time but subject-specific (the spatial components) while the other varies across time but is common to all subjects (the shared response). Under this framework, once spatial components are known, we can generate an accurate estimation of the data of one subject given the data of all others. In this experiment we test whether we can recover data of a left-out subject using previous data of the same subject as well as data from other subjects. We denote brain recordings of all runs but run :

Similarly, refers to the brain recordings of subject using all runs but run . The data from all subjects but acquired during run are denoted:

We evaluate our model using a cross validation scheme known as co-smoothing (see Wu et al. (2018)). First, all brain recordings for all runs but one are used for learning subjects spatial components . The exponent in indicates that these spatial components were learned using all runs but run .

Then we focus on the left-out run and use all subjects but one to compute a shared response for the left-out run .

From and we compute which stands as an estimate of brain activity of the left-out subject during the left-out run .

Figure 3: Experiment — Reconstruct data from a left-out subject All runs but one are used to compute spatial components for every subject (left). Then spatial components and data from the left-out run of all subjects but one are used to compute the shared response in the left-out run. At last, the shared response during the left-out run and the spatial components of the test subject are used to predict the data of the test subject in the left-out run. The performance of the model is measured by comparing the prediction and true data using the score.

An illustration of our reconstruction experiment is available in Figure 3. The performance is measured voxel-wise using the score between and as a similarity measure. For any two time-courses and we define the score by:

Where . Following the leave-one-out cross validation scheme, all our experiments are done several times with a different left-out subject to reconstruct. We measure the average score across all left-out subjects. Note that we obtain one such value per voxel.

2.5.3 Predict age from spatial components

Since spatial components are subject-specific they should be predictive of subject-specific features such as age. In this experiment we try to predict subject’s age from movie-watching data using SRM algorithms. Functionally matched spatial components are obtained using an SRM algorithm. They are divided into two groups (train and test data) where the train set contains 80% of the data and the test set 20%. Within the train set we split again our data into two groups: the first group is used to train one Ridge model per spatial components, the second group is used to train a Random Forest to predict age from Ridge predictions. This way of stacking models is similar to the pipeline used in

Rahim et al. (2017). We use 5 fold cross validation to split the train set (so that the number of samples used to train the Random Forest is the number of elements in the train set). Then the train set is used to train one Ridge model per spatial component. On the test set each Ridge model makes a prediction and the predictions are aggregated using the Random Forest model. An illustration of the process is available in Figure 4.

In each Ridge model, the coefficient that determines the level of l2 penalization is set by generalized cross validation, an efficient form of leave-one-out cross validation (see the RidgeCV implementation of Scikit Learn Pedregosa et al. (2011)).

The train and test sets are chosen randomly. 5 different choices for the train and test set are made. We report the average mean absolute error (MAE) on the test set averaged over the 5 splits.

Figure 4: Experiment — Predict age from spatial components extracted using FastSRM: We first learn the spatial components from fMRI data using SRM. We learn one Ridge model per spatial components to predict age across subjects. Then, these models are aggregated using a Random Forest (like in Rahim et al. (2017)).

3 Results and Discussion

3.1 fMRI Reconstruction

We perform the reconstruction experiment on the FORREST, CLIPS, RAIDERS and SHERLOCK datasets. We compare brainiak’s implementation of ProbSRM / DetSRM to our implementation of FastSRM in terms of fitting time, memory usage and performance. In order to be fair, we do not use parallelization () and we set the number of iterations to 10 () which is ProbSRM’s default. Note that the fitting time of ProbSRM is roughly proportional to the number of iterations while it has a limited impact on the fitting time of FastSRM.

We run our experiments on the full brain and report an score per voxel. However we measure the performance in terms of mean score inside a region of interest (in order to leave out regions where there is no useful information). In order to determine the region of interest, we focus on the results of ProbSRM with and components and keep only the intersection of regions where the score is above . This means of selecting regions favors ProbSRM. For completeness, full brain images obtained on the four datasets with ProbSRM and FastSRM using components averaged across subjects are available in Figure 10.

In Figure 5, we plotted the mean score against the number of components () for ProbSRM, DetSRM and FastSRM algorithm with different atlases. The score tends to increase with the number of components (which is what is expected as more information can be retrieved when the number of components is high). FastSRM consistently outperforms ProbSRM and DetSRM. This holds for any atlas we chose (BASC (444 parcels), SHAEFFER (800 parcels), MODL (512 and 1024 parcels)) and for all datasets we tried (SHERLOCK, RAIDERS, CLIPS and FORREST).

In Figure 6, we compare the running time of FastSRM, DetSRM and ProbSRM on the four different datasets. FastSRM is on average (across datasets) about 5 times faster. On the FORREST dataset, we compute a shared response in about 3 minutes when it takes about 20 minutes with ProbSRM or DetSRM.

In Figure 9, we compare the memory (RAM) consumption of FastSRM, DetSRM and ProbSRM on the four different datasets. FastSRM is 20 to 40 times more memory friendly than ProbSRM and 10 to 20 times more memory friendly than DetSRM. On the FORREST dataset the memory usage of FastSRM is between 1 and 3 Go depending on the number of components and the atlas used. Most modern laptops meet these requirements. On the same dataset memory consumption is about 80 Go for ProbSRM and 40 Go for DetSRM which is manageable but costly for small labs. Overall FastSRM yields better performance than ProbSRM and DetSRM while being much faster and using far less memory. We also show that the atlas used to reduce the data only has a minor impact on performance.

Figure 5: Performance of the methods in an encoding test We compare the performance (measured in terms of average score in a region of interest) of ProbSRM and FastSRM with different atlases in function of the number of components used. Atlases tested are MODL with 512 and 1024 parcels, Basc with 444 parcels and Shaeffer with 800 parcels. Datasets tested are SHERLOCK (top left), RAIDERS (top right), CLIPS (bottom left) and FORREST (bottom right). As we can see, no matter which atlas is chosen, FastSRM consistently outperforms ProbSRM.
Figure 6: Fitting time of FastSRM, ProbSRM and DetSRM We compare the fitting time of ProbSRM, DetSRM and FastSRM with different atlases in function of the number of components used. Atlases tested are MODL with 512 and 1024 parcels, Basc with 444 parcels and Shaeffer with 800 parcels. Datasets tested are SHERLOCK, RAIDERS, CLIPS and FORREST. Left: Fitting time (as a fraction of ProbSRM fitting time) averaged over the four datasets. Right: Fitting time (in seconds) for each of the four different datasets. FastSRM is about 5 times faster than ProbSRM.

3.2 Predicting age from spatial components

Because FastSRM is fast and memory efficient, it enables large-scale analysis of fMRI recordings of subjects exposed to the same naturalistic stimuli. We use all 647 subjects of the CamCAN dataset and demonstrate the usefulness of FastSRM by showing that the spatial components it extracts from movie watching data are predictive of age. A key asset of FastSRM is that these spatial components can be visualized and therefore provide meaningful insights.

Figure 7 shows that FastSRM predicts age with a good accuracy (better than ProbSRM and a lot better than chance) resulting in a mean absolute error (MAE) of 7.5 years. It also shows that on CamCAN data, FastSRM is 4x faster and more than 150x more memory efficient than ProbSRM. As before and in order to ensure fair comparison the number of iterations is set to and we do not make use of parallelization. Note that the memory requirements of ProbSRM on the CamCAN dataset (186Go) make it difficult to use. FastSRM does not suffer from memory issues, making it suitable to analyse big datasets.

A key asset of our pipeline is that we can see which spatial components are most predictive of age by using feature importance. Feature importance is assessed by the Gini importance defined in Breiman (2001) or Louppe et al. (2013). It measures for each feature the relative reduction in Gini impurity brought by this feature. Feature importance varies with different splits. We use the averaged feature importance over the 5 splits of our pipeline. In Figure 7 are shown the 3 most important spatial components representing respectively 16%, 12% and 8% of total feature importance. These spatial components in decreasing order of importance represent the visual dorsal pathway, the precuneus and the visual ventral pathway. The fact that averaged spatial components are interpretable and meaningful allows us to study the influence of age on brain networks involved in movie-watching. In Figure 8, we plot the most important spatial component averaged within groups of ages. We see that these spatial components evolve with age allowing us to visually identify which regions are meaningful. It turns out that aging is mostly reflected in brain activity as a fading of activity in the spatial correlates of movie watching, particularly in the dorsal visual cortex.

Figure 7: Age prediction from spatial components: (top) FastSRM predicts age with a good accuracy (better than ProbSRM and a lot better than chance) resulting in a mean absolute error (MAE) of 7.5 years. (middle) FastSRM is more than 4x faster than ProbSRM and uses 150x less memory, hence it scales better than ProbSRM. (bottom) The three most important spatial components in terms of the reduction in Gini impurity they bring (see Gini importance or Feature importance in Breiman (2001), Louppe et al. (2013)). From top to bottom, the most important spatial component (feature importance: 16%) highlights the visual dorsal pathway, the second most important spatial component (feature importance: 12%) highlights the precuneus and the third most important spatial component (feature importance: 8%) highlights the visual ventral pathway.
Figure 8: Evolution of the most predictive spatial component with age: (Top) Spatial component most predictive of age averaged within groups of different age (18-35, 36-48, 48-61, 61-74, 74-88). (Bottom) Mean activation in the region highlighted by the mask on the left. We see that the activity in the dorsal pathway decreases with age, which explains why this spatial component is a good predictor of age.

3.3 Conclusion

As studies using naturalistic stimuli will tend to become more common and large within and across subjects, we need scalable models especially in terms of memory usage. This is what FastSRM provides. We show that compared to ProbSRM or DetSRM, FastSRM provides better performance and lower fitting time while requiring a lot less memory. While FastSRM’s scalability relies on the use of atlases to compress the BOLD signal, we show that the precise choice of the atlas has only marginal effects on performance.

FastSRM allows large scale analysis of fMRI data of subjects exposed to naturalistic stimuli. As one example of such analysis, we show that it can be used to predict age from movie-watching data. Interestingly, although FastSRM is an unsupervised model, it extracts meaningful networks and as such constitutes a practical way of studying subjects exposed to naturalistic stimuli.

We also show that individual information can be extracted from the fMRI activity when subjects are exposed to naturalistic stimuli. Our predictive model is reminiscent of that of Bijsterbosch et al. (2018), that have shown that ICA components obtained from the decomposition of resting state data carry important information on individual characteristics.

As a side note, we chose to keep the orthonormality assumptions of the original SRM model but slight modifications of our implementation of FastSRM would allow one to build more refined model promoting sparsity, non-negativity or smoothness of spatial components for example.

The remaining difficulty with SRM is to interpret the spatio-temporal decomposition. Reverse correlation Hasson et al. (2004) can be used to clarify the cognitive information captured in the shared response.

Our code is freely available at

Appendix A Appendices

In Table 1, we show typical values for the main dataset parameters. In Table 2, we describe the main dataset parameters of the real datasets we used. In Figure 9 we compare the memory usage of DetSRM, ProbSRM and FastSRM with different atlases as a function of the number of components used while performing the encoding experiment described in Section 2.5.2. In Figure 10, we show for the same experiment, the score per voxels averaged across cross validation folds.

Figure 9: Memory usage of FastSRM, ProbSRM and DetSRM We compare the memory usage of DetSRM, ProbSRM and FastSRM with different atlases in function of the number of components used. Atlases tested are MODL with 512 and 1024 parcels, Basc with 444 parcels and Shaeffer with 800 parcels. Datasets tested are SHERLOCK, RAIDERS, CLIPS and FORREST. Left: Memory usage (as a fraction of ProbSRM memory usage) averaged over the four datasets Right: Memory usage (in Mo) for each of the four different datasets. FastSRM with probabilistic atlases (MODL) is about 20x more memory efficient than ProbSRM and 40x with deterministic atlases (Basc, Shaeffer) making it possible to compute a shared response on a large dataset using a modern laptop.
Figure 10: fMRI reconstruction: score per voxels averaged across cross validation folds: We benchmark ProbSRM and FastSRM using components on SHERLOCK, RAIDERS, CLIPS and FORREST datasets. An score of 1 means perfect reconstruction while 0 means that the model only predicts the mean of the voxel time-course.
Name of variable Notation Typical value
Number of voxels []
Number of timeframes []
Number of runs
Number of subjects []
Number of components []
Table 1: Typical values for the main dataset parameters.
Dataset Subjects Runs Average run Voxels
length (per subject)
(in timeframes)
CLIPS 10 17 325 212445
SHERLOCK 16 5 395 212445
RAIDERS 10 9 330 212445
FORREST 19 7 465 212445
CamCAN 647 1 193 212445
Table 2: Datasets description


CamCAN Data collection and sharing was provided by the Cambridge Centre for Ageing and Neuroscience (CamCAN). CamCAN funding was provided by the UK Biotechnology and Biological Sciences Research Council (grant number BB/H008217/1), together with support from the UK Medical Research Council and University of Cambridge, UK.

This project has received funding from the European Union’s Horizon 2020 Research and Innovation Programme under Grant Agreement No. 785907 (HBP SGA2).



  • A. Abraham, E. Dohmatob, B. Thirion, D. Samaras, and G. Varoquaux (2013) Extracting brain regions from rest fmri with total-variation constrained dictionary learning. In International Conference on Medical Image Computing and Computer-Assisted Intervention, pp. 607–615. Cited by: §1.
  • A. Abraham, F. Pedregosa, M. Eickenberg, P. Gervais, A. Mueller, J. Kossaifi, A. Gramfort, B. Thirion, and G. Varoquaux (2014) Machine learning for neuroimaging with scikit-learn. Frontiers in neuroinformatics 8, pp. 14. Cited by: §2.5.1.
  • M. J. Anderson, M. Capota, J. S. Turek, X. Zhu, T. L. Willke, Y. Wang, P. Chen, J. R. Manning, P. J. Ramadge, and K. A. Norman (2016) Enabling factor analysis on thousand-subject neuroimaging datasets. In 2016 IEEE International Conference on Big Data (Big Data), pp. 1151–1160. Cited by: §1, §1, §1, §2.3.
  • T. Bazeille, H. Richard, H. Janati, and B. Thirion (2019) Local optimal transport for functional brain template estimation. See DBLP:conf/ipmi/2019, pp. 237–248. External Links: Link, Document Cited by: §1.
  • C. F. Beckmann and S. M. Smith (2004)

    Probabilistic independent component analysis for functional magnetic resonance imaging

    IEEE transactions on medical imaging 23 (2), pp. 137–152. Cited by: §1.
  • C. F. Beckmann and S. M. Smith (2005) Tensorial extensions of independent component analysis for multisubject fmri analysis. Neuroimage 25 (1), pp. 294–311. Cited by: §1.
  • P. Bellec, P. Rosa-Neto, O. C. Lyttelton, H. Benali, and A. C. Evans (2010) Multi-level bootstrap analysis of stable clusters in resting-state fmri. Neuroimage 51 (3), pp. 1126–1139. Cited by: §1.
  • J. D. Bijsterbosch, M. W. Woolrich, M. F. Glasser, E. C. Robinson, C. F. Beckmann, D. C. Van Essen, S. J. Harrison, and S. M. Smith (2018) The relationship between spatial configuration and functional connectivity of brain regions. Elife 7, pp. e32992. Cited by: §3.3.
  • L. Breiman (2001) Random forests. Machine learning 45 (1), pp. 5–32. Cited by: Figure 7, §3.2.
  • V. D. Calhoun, T. Adali, G. D. Pearlson, and J. J. Pekar (2001) A method for making group inferences from functional mri data using independent component analysis. Human brain mapping 14 (3), pp. 140–151. Cited by: §1.
  • V. D. Calhoun, J. J. Pekar, V. B. McGinty, T. Adali, T. D. Watson, and G. D. Pearlson (2002) Different activation dynamics in multiple neural systems during simulated driving. Human brain mapping 16 (3), pp. 158–167. Cited by: §1.
  • V. D. Calhoun, J. J. Pekar, and G. D. Pearlson (2004) Alcohol intoxication effects on simulated driving: exploring alcohol-dose effects on brain activation using functional mri. Neuropsychopharmacology 29 (11), pp. 2097. Cited by: §1.
  • J. Chen, Y.C. Leong, K.A. Norman, and U. Hasson (2016a) Shared experience, shared memory: a common structure for brain activity during naturalistic recall. bioRxiv. External Links: Document, Link, Cited by: §2.5.1.
  • J. Chen, Y. C. Leong, C. J. Honey, C. H. Yong, K. A. Norman, and U. Hasson (2017) Shared memories reveal shared structure in neural activity across individuals. Nature neuroscience 20 (1), pp. 115. Cited by: §1.
  • P. C. Chen, J. Chen, Y. Yeshurun, U. Hasson, J. Haxby, and P. J. Ramadge (2015) A reduced-dimension fmri shared response model. In Advances in Neural Information Processing Systems, pp. 460–468. Cited by: §1, §1, §1, §2.3, §2.3.
  • P. Chen, X. Zhu, H. Zhang, J. S. Turek, J. Chen, T. L. Willke, U. Hasson, and P. J. Ramadge (2016b) A convolutional autoencoder for multi-subject fmri data aggregation. arXiv preprint arXiv:1608.04846. Cited by: §1.
  • E. Dohmatob, A. Mensch, G. Varoquaux, and B. Thirion (2016) Learning brain regions via large-scale online structured sparse dictionary learning. In Advances in Neural Information Processing Systems, pp. 4610–4618. Cited by: §1.
  • M. Eickenberg, A. Gramfort, G. Varoquaux, and B. Thirion (2017) Seeing it all: convolutional network layers map the function of the human visual system. NeuroImage 152, pp. 184–194. Cited by: §1.
  • U. Güçlü and M. A. van Gerven (2015)

    Deep neural networks reveal a gradient in the complexity of neural representations across the ventral stream

    Journal of Neuroscience 35 (27), pp. 10005–10014. Cited by: §1.
  • U. Güçlü and M. A. van Gerven (2017) Increasingly complex representations of natural movies across the dorsal stream are shared between subjects. NeuroImage 145, pp. 329–336. Cited by: §1.
  • J. S. Guntupalli, M. Hanke, Y. O. Halchenko, A. C. Connolly, P. J. Ramadge, and J. V. Haxby (2016) A model of representational spaces in human cortex. Cerebral cortex 26 (6), pp. 2919–2934. Cited by: §1.
  • M. Hanke, F. J. Baumgartner, P. Ibe, F. R. Kaule, S. Pollmann, O. Speck, W. Zinke, and J. Stadler (2014) A high-resolution 7-tesla fmri dataset from complex natural stimulation with an audio movie. Scientific data 1, pp. 140003. Cited by: §2.5.1.
  • U. Hasson, Y. Nir, I. Levy, G. Fuhrmann, and R. Malach (2004) Intersubject synchronization of cortical activity during natural vision. science 303 (5664), pp. 1634–1640. Cited by: §1, §3.3.
  • J. V. Haxby, J. S. Guntupalli, A. C. Connolly, Y. O. Halchenko, B. R. Conroy, M. I. Gobbini, M. Hanke, and P. J. Ramadge (2011) A common, high-dimensional model of the representational space in human ventral temporal cortex. Neuron 72 (2), pp. 404–416. Cited by: §2.5.1.
  • A. G. Huth, S. Nishimoto, A. T. Vu, and J. L. Gallant (2012) A continuous semantic space describes the representation of thousands of object and action categories across the human brain. Neuron 76 (6), pp. 1210–1224. Cited by: §1, §2.5.1.
  • D. Lashkari and P. Golland (2009) Exploratory fmri analysis without spatial normalization. In International Conference on Information Processing in Medical Imaging, pp. 398–410. Cited by: §1.
  • G. Louppe, L. Wehenkel, A. Sutera, and P. Geurts (2013) Understanding variable importances in forests of randomized trees. In Advances in neural information processing systems, pp. 431–439. Cited by: Figure 7, §3.2.
  • M. J. McKeown, S. Makeig, G. G. Brown, T. Jung, S. S. Kindermann, A. J. Bell, and T. J. Sejnowski (1998) Analysis of fmri data by blind separation into independent spatial components. Human brain mapping 6 (3), pp. 160–188. Cited by: §1.
  • A. Mensch, J. Mairal, B. Thirion, and G. Varoquaux (2016) Dictionary learning for massive matrix factorization. In International Conference on Machine Learning, pp. 1737–1746. Cited by: §1.
  • A. Mensch, J. Mairal, B. Thirion, and G. Varoquaux (2018) Extracting universal representations of cognition across brain-imaging studies. arXiv preprint arXiv:1809.06035. Cited by: §1.
  • S. Nishimoto, A.T. Vu, T. Naselaris, Y. Benjamini, B. Yu, and J. L. Gallant (2011) Reconstructing visual experiences from brain activity evoked by natural movies. Current Biology 21 (19), pp. 1641–1646. Cited by: §2.5.1.
  • F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, et al. (2011) Scikit-learn: machine learning in python. Journal of machine learning research 12 (Oct), pp. 2825–2830. Cited by: §2.5.3.
  • A. L. Pinho, A. Amadon, T. Ruest, M. Fabre, E. Dohmatob, I. Denghien, C. Ginisty, S. Becuwe-Desmidt, S. Roger, L. Laurier, et al. (2018) Individual brain charting, a high-resolution fmri dataset for cognitive mapping. Scientific data 5. Cited by: §2.5.1, §2.5.1.
  • R. A. Poldrack, D. M. Barch, J. Mitchell, T. Wager, A. D. Wagner, J. T. Devlin, C. Cumba, O. Koyejo, and M. Milham (2013) Toward open sharing of task-based fmri data: the openfmri project. Frontiers in neuroinformatics 7, pp. 12. Cited by: §2.5.1.
  • J. Poline and M. Brett (2012) The general linear model and fmri: does love last forever?. Neuroimage 62 (2), pp. 871–880. Cited by: §1.
  • M. Rahim, B. Thirion, D. Bzdok, I. Buvat, and G. Varoquaux (2017) Joint prediction of multiple scores captures better individual traits from brain images. NeuroImage 158, pp. 145–154. Cited by: Figure 4, §2.5.3.
  • H. Richard, A. Pinho, B. Thirion, and G. Charpiat (2018) Optimizing deep video representation to match brain activity. arXiv preprint arXiv:1809.02440. Cited by: §1.
  • A. Schaefer, R. Kong, E. M. Gordon, T. O. Laumann, X. Zuo, A. J. Holmes, S. B. Eickhoff, and B. T. Yeo (2017) Local-global parcellation of the human cerebral cortex from intrinsic functional connectivity mri. Cerebral Cortex 28 (9), pp. 3095–3114. Cited by: §1.
  • M. A. Shafto, L. K. Tyler, M. Dixon, J. R. Taylor, J. B. Rowe, R. Cusack, A. J. Calder, W. D. Marslen-Wilson, J. Duncan, T. Dalgleish, et al. (2014) The cambridge centre for ageing and neuroscience (cam-can) study protocol: a cross-sectional, lifespan, multidisciplinary examination of healthy cognitive ageing. BMC neurology 14 (1), pp. 204. Cited by: §2.5.1.
  • M. Shvartsman, N. Sundaram, M. C. Aoi, A. Charles, T. C. Wilke, and J. D. Cohen (2017) Matrix-normal models for fmri analysis. arXiv preprint arXiv:1711.03058. Cited by: §1.
  • S. M. Smith, M. Jenkinson, M. W. Woolrich, C. F. Beckmann, T. E. Behrens, H. Johansen-Berg, P. R. Bannister, M. De Luca, I. Drobnjak, D. E. Flitney, et al. (2004) Advances in functional and structural mr image analysis and implementation as fsl. Neuroimage 23, pp. S208–S219. Cited by: §1.
  • J. R. Taylor, N. Williams, R. Cusack, T. Auer, M. A. Shafto, M. Dixon, L. K. Tyler, R. N. Henson, et al. (2017) The cambridge centre for ageing and neuroscience (cam-can) data repository: structural and functional mri, meg, and cognitive data from a cross-sectional adult lifespan sample. Neuroimage 144, pp. 262–269. Cited by: §2.5.1.
  • J. S. Turek, C. T. Ellis, L. J. Skalaban, N. B. Turk-Browne, and T. L. Willke (2018) Capturing shared and individual information in fmri data. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 826–830. Cited by: §1.
  • J. S. Turek, T. L. Willke, P. Chen, and P. J. Ramadge (2017) A semi-supervised method for multi-subject fmri functional alignment. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1098–1102. Cited by: §1.
  • G. Varoquaux and R. C. Craddock (2013) Learning and comparing functional connectomes across subjects. NeuroImage 80, pp. 405–415. External Links: Link, Document Cited by: §1.
  • G. Varoquaux, A. Gramfort, F. Pedregosa, V. Michel, and B. Thirion (2011) Multi-subject dictionary learning to segment an atlas of brain spontaneous activity. In Information Processing in Medical Imaging, Lecture Notes in Computer Science, Vol. 6801, Kaufbeuren, Germany, pp. 562–573. External Links: Link, Document Cited by: §1.
  • G. Varoquaux, S. Sadaghiani, P. Pinel, A. Kleinschmidt, J. Poline, and B. Thirion (2010) A group model for stable multi-subject ica on fmri datasets. Neuroimage 51 (1), pp. 288–299. Cited by: §1.
  • G. Varoquaux, Y. Schwartz, P. Pinel, and B. Thirion (2013) Cohort-level brain mapping: learning cognitive atoms to single out specialized regions. In International Conference on Information Processing in Medical Imaging, pp. 438–449. Cited by: §1.
  • K. Vodrahalli, P. Chen, Y. Liang, C. Baldassano, J. Chen, E. Yong, C. Honey, U. Hasson, P. Ramadge, K. A. Norman, et al. (2018) Mapping between fmri responses to movies and their natural language annotations. Neuroimage 180, pp. 223–231. Cited by: §1.
  • A. Wu, S. Pashkovski, S. R. Datta, and J. W. Pillow (2018) Learning a latent manifold of odor representations from neural responses in piriform cortex. In Advances in Neural Information Processing Systems, pp. 5378–5388. Cited by: §2.5.2.
  • J. Xie, P. K. Douglas, Y. N. Wu, A. L. Brody, and A. E. Anderson (2017) Decoding the encoding of functional brain networks: an fmri classification comparison of non-negative matrix factorization (nmf), independent component analysis (ica), and sparse coding algorithms. Journal of neuroscience methods 282, pp. 81–94. Cited by: §1.