Analyzing neuroimaging studies is both a large data problem and a small data problem. A single scanning run typically involves hundreds of full-brain scans that each contain tens of thousands of spatial locations (to which we refer as voxels). At the same time, neuroimaging studies tend to have limited statistical power. A typical study considers a cohort of 20-50 participants undergoing tens of stimuli (or even fewer). A challenge in this domain is to develop analysis methods that can appropriately account for both the commonalities and variations among participants and stimulus effects, while also scaling to tens of gigabytes of data and providing a means to reason about the confidence level of predictions.
In this paper, we develop Neural Topographic Factor Analysis (NTFA)111Source code submitted with paper and available on Github by request., a family of models for analysis of spatio-temporal fMRI data suitable for reasoning about variations among differing participants and stimuli. NTFA extends Topographic Factor Analysis (TFA), an established technique for fMRI analysis (Manning et al., 2014b)
, with a deep generative prior. This prior defines a distribution over embeddings (i.e. vectors of features) for each participant and stimulus, along with a conditional distribution over spatial and temporal factors, which is parameterized by a simple neural network. The result is a structured deep probabilistic model that can characterize variation among participants and stimuli, as well as interaction effects between the two.
We evaluate NTFA on four datasets. We validate that inference and learning recover known group structure in a synthetic dataset, simulated from a simplified generative model that has been designed to contain distinguishable clusters of participants and stimuli. We additionally consider two datasets from publicly available fMRI studies. In the first, participants listen to the narrative “Pie Man” (Simony et al., 2016). In the second, participants with and without major depressive disorder listen to audio stimuli and music (Lepping et al., 2016). Finally, we report results from a pilot study on the neural basis of fear that capitalized on the utility of our NTFA model. Participants were shown video clips of spiders, looming heights, and threatening social situations that varied in how much fear they evoked both within category (i.e. there were both low and high fear inducing spider videos), and across individuals (i.e. some participants were, on average, more and some less influenced by the videos). Our NTFA model, which was naive to the stimulus category and individual differences in fear experience, nevertheless recovered meaningful subject and stimulus embeddings in a fully unsupervised manner. The stimulus embeddings recovered the stimulus categories of videos, and the participant embeddings correlated with behavioral measures of fear sensitivity.
The main contributions of this paper are:
We demonstrate that embeddings, inferred in a fully unsupervised manner, correlate with experimental design variables and behavioral measures (Section 4.2). To our knowledge, NTFA is the first model that is able to characterize participants and stimuli in this manner.
2 Topographic Factor Analysis Methods for fMRI Data
Factor analysis methods are widely used to reduce the dimensionality of neuroimaging data, while at the same time capturing meaningful regularities. These methods decompose the fMRI signal for a trial with time points and voxels into a product between a lower-rank matrix of weights and a lower-rank matrix of factors , where typically
. General-purpose methods that have been applied to fMRI data include Principal Component Analysis (PCA)(Abdi and Williams, 2010)
and Independent Component Analysis (ICA)(Hyvärinen et al., 2001). A number of methods have also been developed specifically for fMRI analysis, such as Hyper-Alignment (HA) (Haxby et al., 2011) and Topographic Latent Source Analysis (TLSA) (Gershman et al., 2011).
. TFA is a probabilistic factor analysis model that uses radial basis functions to define spatially smooth factors. HTFA itself extends TFA by assuming that model parameters are drawn from a hierarchical Gaussian prior shared across trials.
Concretely, let us consider a dataset comprising trials (i.e. continuous recordings) each of which contain time points for voxels at spatial positions. TFA defines a probabilistic model that approximates each trial as a product between a matrix of time-varying weights and a matrix of spatially-varying factors. To do so, TFA assumes that the data is noisily sampled from the inner product of the weights and factors matrices
TFA combines this likelihood with a prior , which defines a probabilistic model . TFA performs inference by approximating the posterior with a variational distribution , and optimizing its parameters.
TFA assumes meansfor each factor’s weights over time , and defines a hierarchical Gaussian prior of the form
To model the factors , TFA employs a kernel function that ensures spatial smoothness of factor values at similar voxel positions . This kernel function is normally a radial basis function (RBF), which models each factor as a Gaussian with center at a spatial location , whose width is determined by the kernel hyper-parameters ,
with Gaussian priors over both the positions and widths,
Interpreting factor analysis probabilistically enables us to incorporate additional assumptions in order to capture variation and similarities between multiple sets of trials. HTFA (Manning et al., 2014a, 2018), introduces variables and representing each factor’s mean positions and widths across trials,
This prior is able to model multimodal responses to an extent, in the sense that factor positions and widths for individual trials are allowed to vary relative to a shared mean. However, the Gaussian hyperprior in HTFA assumes that neural responses across trials ought to have a unimodal distribution.
3 Neural Topographic Factor Analysis
NTFA extends TFA to model the range of variation across participants and stimuli. We assume exactly the same factor analysis model as TFA, which is to say that we model the fMRI signal as a linear combination of time-dependent weights and spatially varying Gaussian factors. NTFA additionally infers embedding vectors for individual participants and stimuli. We then learn a mapping from embeddings to the parameters of the likelihood model, parameterized by a neural network. This replaces the unimodal Gaussian hyperprior in HTFA with a deep generative model, and incorporates a mechanism for parameter sharing across trials.
We model trials in which participants undergo a set of stimuli and are scanned for time points. We assume that participant embeddings and stimulus embeddings are shared across trials. For simplicity, we will assume that both embeddings have the same dimensionality and are distributed according to a Gaussian prior
For each participant , we define the RBF center and log-width in terms of a neural mapping
Here is a neural network parameterized by a set of weights , which models how variations between participants and stimuli affect the factor positions and widths in brain activations. We similarly assume that a neural network parameterizes the distribution over weights . For each trial and time point with , the embeddings determine the distribution over weights,
The likelihood model is now exactly the same as in TFA,
We summarize the generative model for NTFA in Algorithm 1. This model defines a joint density and posterior distribution conditioned on . We approximate the posterior with a fully-factorized variational distribution,
We learn the parameters and by maximizing the evidence lower bound (ELBO)
with a doubly-reparameterized gradient estimator(Tucker et al., 2019).
The advantage of incorporating neural networks into the generative model is that it enables us to explicitly reason about multimodal response distributions and effects that vary between individual samples. The network weights are shared across trials, as are the stimulus and participant embeddings and . This allows NTFA to capture statistical regularities within a whole experiment. At the same time, the use of neural networks ensures that differences in embeddings can be mapped onto a wide range of spatial and temporal responses. Whereas the hierarchical Gaussian priors in HTFA implicitly assume that response distributions are unimodal and uncorrelated across different factors , the neural network in NTFA is able to model such correlations by jointly predicting all factors.
While neural network models can have thousands or even millions of parameters, we emphasize that NTFA in fact has a lower number of trainable parameters than HTFA. TFA and HTFA assume fully-factorized variational distributions, requiring learned parameters for trials with time points. In NTFA, the networks and will have parameters each, whereas the variational distribution will have parameters.
In practice, scanning time limitations impose a trade-off between and (the number of trials in a scanning run, and the length of a trial). For this reason does not always dominate , since often . This means that we can obtain a lower-dimensional variational parameterization by choosing embedding dimensions that are smaller than the number of factors , and letting the neural networks model the interaction between participants and stimuli . As summarized in Table 1, NTFA can have orders of magnitude fewer parameters when as compared to HTFA.
|Dataset||HTFA parameters||NTFA parameters||HTFA||NTFA|
We evaluate NTFA on four datasets, setting . First, we verify that NTFA can recover a ground-truth embedding structure that, by construction, contains clearly distinguishable participant and stimulus clusters. We then verify that NTFA can reconstruct the publicly available “Pie Man” dataset used to evaluate previous fMRI analysis models (Anderson et al., 2016). We then present analysis on a publicly available dataset with more than one stimulus in each stimulus category, and finally on an in-house dataset, a pilot study pertaining to the subjective experience of fear. We present embedding results from these datasets analyzed without their resting-state trials. These experimental datasets vary in several qualities including the number of participants, time points, and voxels, and also task variables, testing how NTFA performs in a variety of experimental contexts.
Synthetic Data: To demonstrate that in addition to accurate reconstructions NTFA can also learn meaningful embeddings, we created a synthetic fMRI dataset. This dataset consists of three participant groups of five participants each, which we call Group 1, Group 2 and Group 3. All participants underwent two categories of hypothetical stimuli, called Baseline and Task, with five stimuli within each category. Each participant underwent a single hypothetical scanning run with rest trials interleaved between stimuli. We manually defined three distinct factors in a standard MNI_152_8mm brain. We then sampled participant embeddings and stimulus embeddings , from mixtures of three and two distinct Gaussians respectively (Figure 3). We set the means for these Gaussians so that when we noisily combined them, the following conditions would be met:
All participants show no whole-brain response during rest besides random noise.
Under Baseline stimuli, Group 1 exhibits half the response in the first region as compared to under Task stimuli, on average. The rest of the brain shows no response. Similarly, Group 2 and Group 3 exhibits a response in the second and third regions respectively, while the rest of the brain shows no response.
Each Baseline/Task stimulus provokes a response lower or higher than the stimulus category’s average based on the location of the particular stimulus embedding.
“Pie Man” Narrative Listening (Simony et al., 2016): In a between-subjects design, participants were assigned to one of four experimental stimulus categories that varied in the amount of structured narrative content that was presented. Participants either listened to an intact audio recording of a story (N = 36), or the same recording with paragraphs scrambled (N = 18) or with words scrambled (N = 25). A fourth group of participants were not presented with any recordings (N = 36). The fMRI data had 61,367 voxels and 300 time points for each scanning run. The narrative, entitled Pie Man, was presented at a story-telling event organized by The Moth. The full dataset is available online222http://arks.princeton.edu/ark:/88435/dsp015d86p269k.
Emotional Musical and Nonmusical Stimuli in Depression (Lepping et al., 2016): 19 participants with major depressive disorder, and 20 control participants (N=39) underwent emotional musical and nonmusical stimuli to examine how neural processing of emotionally provocative auditory stimuli is altered within the anterior cingulate cortex and striatum in depression. The fMRI data had 353,600 voxels, and 105 time points for each scanning run. The dataset is also available online 333This data was obtained from the OpenfMRI database. Its accession number is ds000171.. We use the shorthand Depression to refer to this data.
The Fear and Affective Videos “AffVids” Dataset: This is a dataset for a pilot study that we have carried out in-house. A total of 22 participants watched videos depicting fear-related content and rated affective and emotional impact after each clip. The stimuli consisted of 36 videos, separated into three fear-related content situations (spiders, heights, and social situations), each clip lasting 20 seconds. Participants provided subjective experience ratings (e.g. how much fear they felt) after each clip. The fMRI data contained 81,638 voxels and 1656 time points per scanning run.
In addition to , we set for analysis of the three real datasets.
We compare NTFA to HTFA in terms of its ability to reconstruct the data, before examining embeddings. For each dataset, we report the log-likelihood as a measure of reconstruction performance and the number of learned parameters to show model complexity.
Across datasets, NTFA exhibits a higher log-likelihood than HTFA, which is representative of the current state of the art (Table 1), with the same number () of latent factors. Example reconstructions from the Depression and AffVids datasets are shown in Figure 2.
Synthetic Data: For synthetic data, NTFA recovers stimulus and participant embeddings that are qualitatively very similar to the embeddings that we used to generate the data (Figure 3
). We emphasize that embeddings are learned directly from the synthetic data in an entirely unsupervised manner, which means that there is in principle no reason that we would expect embeddings to be exactly the same. However, we do observe that learned embeddings for participants and stimuli are well-separated, appear to have some variance, and are invariant under linear transformations. Moreover, given the “true” number of factors (), NTFA is able to reconstruct synthetic data better than HTFA.
The “Pie Man” Narrative Listening dataset: The Pieman dataset (Simony et al., 2016) has previously been used to evaluate TFA and HTFA (Manning et al., 2018). In this study, each participant underwent one trial with one stimulus in each scanning run, so only stimulus embeddings were shared across trials. This does not seem to have reduced NTFA’s reconstruction performance.
Emotional Musical and Nonmusical Stimuli in Depression dataset: On this dataset, NTFA similarly achieved better reconstruction performance than HTFA. An example reconstruction for this dataset can be seen in Figure 2. In analysis without resting-state data, stimulus embeddings display a gradation from tonal/musical to atonal/nonmusical, and participant embeddings show a gradation from control to major depressive participants (Figure 4). It is pertinent to mention that the labels in the figures are used for visualization purposes only; NTFA discovers this structure with no supervision.
: Participant embeddings arranged themselves into a cluster with greater self-reported fear across all stimuli, a smaller triad at the bottom reporting less fear across stimuli, and several outliers whose experience varies between stimuli. A lower value (cooler color) indicates a lower mean fear rating, while a higher value (warmer color) indicates a higher mean fear rating.Right: Stimulus embeddings recovered groups of fear stimuli corresponding to heights, spiders, and social threats. The overlap may reflect intentionally low fear intensity on the part of the stimulus designers, leading to decreased fear response and separability.
Fear and Affective Videos dataset: On the AffVids dataset from our in-house pilot study, NTFA achieves visibly better reconstructions than HTFA, as can be seen in Figure 2. In an analysis without resting-state data, NTFA uncovered stimulus embeddings that clustered by stimulus category (Figure 5). The participant embeddings uncovered a group more afraid across stimuli, a group less afraid across stimuli, and a group whose fear intensity varied between stimuli (Figure 5). Participants were not recruited in specific groups (e.g. arachnophobes and acrophobes), and stimuli could be categorized in multiple ways (e.g. by kind or degree).
We have introduced Neural Topographic Factor Analysis (NTFA), an unsupervised model of spatio-temporal fMRI data which learns low-dimensional embeddings for participants and stimuli. We demonstrated that NTFA can recover ground-truth embedding clusters in synthetically generated data, and that it can reconstruct three datasets of real fMRI data better than the state-of-the-art using as few hidden factors as . NTFA attains higher log-likelihood than HTFA across data sets, for the same number of latent factors. When we set , NTFA learned embeddings from the real fMRI datasets that appeared to vary in a neuroscientifically meaningful way. This suggests that NTFA captures meaningful aspects of the underlying data.
Contrary to expectations of how a model can achieve superior reconstruction, we have seen that NTFA requires fewer parameters. In the case of the AffVids dataset, NTFA required on the order of 8.88 million parameters, as opposed to HTFA’s 164 million parameters, two orders of magnitude of advantage in compressing and representing large datasets. This advantage is replicated in datasets with many more trials than participants (), as shown in Table 1.
NTFA provides a path towards a more data-driven, discovery-oriented approach to investigating when neural activity varies across participants and stimuli, and when it remains relatively similar. Future work will be able to follow up on these initial findings by exploring how various parameter choices might improve sensitivity to such relationships.
Practically, these techniques reflect a growing and urgent need for scalable analysis approaches which can handle large datasets. That is, there are numerous nationwide and international efforts to collect large fMRI datasets of hundreds or thousands of participants performing numerous cognitive tasks. Traditional analyses which do not efficiently compress the underlying data are not suited to guide inferences performed across such large samples. The approach we describe here is ideally suited for such large datasets and might potentially be able to capture the meaningful aspects of the data in a way that is ultimately interpretable by clinicians and researchers alike.
The authors would like to thank Jeremy Manning for insightful conversations during his visit. This work was generously supported by Intel, startup funds from Northeastern University, the National Science Foundation (NCS 1835309), and the US Army Research Institute for the Behavioral and Social Sciences (ARI W911NF-16-1-0191).
- Abdi and Williams (2010) Hervé Abdi and Lynne J. Williams. Principal component analysis. Wiley interdisciplinary reviews: computational statistics, 2(4):433–459, 2010.
- Anderson et al. (2016) Michael J. Anderson, Mihai Capota, Javier S. Turek, Xia Zhu, Theodore L. Willke, Yida Wang, Po Hsuan Chen, Jeremy R. Manning, Peter J. Ramadge, and Kenneth A. Norman. Enabling factor analysis on thousand-subject neuroimaging datasets. Proceedings - 2016 IEEE International Conference on Big Data, Big Data 2016, pages 1151–1160, 2016. doi: 10.1109/BigData.2016.7840719.
Burda et al. (2016)
Yuri Burda, Roger Grosse, and Ruslan Salakhutdinov.
Importance Weighted Autoencoders.In International Conference on Representations, 2016.
- Gershman et al. (2011) Samuel J Gershman, David M Blei, Francisco Pereira, and Kenneth A Norman. A topographic latent source model for fmri data. NeuroImage, 57(1):89–100, 2011.
- Haxby et al. (2011) James V. Haxby, J. Swaroop Guntupalli, Andrew C. Connolly, Yaroslav O. Halchenko, Bryan R. Conroy, M. Ida Gobbini, Michael Hanke, and Peter J. Ramadge. A common, high-dimensional model of the representational space in human ventral temporal cortex. Neuron, 72(2):404–416, 2011.
- Hyvärinen et al. (2001) Aapo Hyvärinen, Juha Karhunen, and Erkki Oja. Independent Component Analysis. Wiley Online Library, 2001.
- Lepping et al. (2016) Rebecca J. Lepping, Ruth Ann Atchley, Evangelia Chrysikou, Laura E. Martin, Alicia A. Clair, Rick E. Ingram, W. Kyle Simmons, and Cary R. Savage. Neural processing of emotional musical and nonmusical stimuli in depression. PLOS ONE, 11(6):1–23, 06 2016. doi: 10.1371/journal.pone.0156859. URL https://doi.org/10.1371/journal.pone.0156859.
- Manning et al. (2014a) Jeremy R. Manning, Rajesh Ranganath, Waitsang Keung, Nicholas B. Turk-Browne, Jonathan D. Cohen, Kenneth A. Norman, and David M. Blei. Hierarchical topographic factor analysis. In Pattern Recognition in Neuroimaging, 2014 International Workshop On, pages 1–4. IEEE, 2014a.
- Manning et al. (2014b) Jeremy R. Manning, Rajesh Ranganath, Kenneth A. Norman, and David M. Blei. Topographic Factor Analysis: A Bayesian Model for Inferring Brain Networks from Neural Data. PLOS ONE, 9(5):e94914, May 2014b. ISSN 1932-6203. doi: 10.1371/journal.pone.0094914.
- Manning et al. (2018) Jeremy R Manning, Xia Zhu, Theodore L Willke, Rajesh Ranganath, Kimberly Stachenfeld, Uri Hasson, David M Blei, and Kenneth A Norman. A probabilistic approach to discovering dynamic full-brain functional connectivity patterns. NeuroImage, 2018.
- Narayanaswamy et al. (2017) Siddharth Narayanaswamy, T. Brooks Paige, Jan-Willem van de Meent, Alban Desmaison, Noah Goodman, Pushmeet Kohli, Frank Wood, and Philip Torr. Learning Disentangled Representations with Semi-Supervised Deep Generative Models. In I. Guyon, U. V. Luxburg, S. Bengio, H. Wallach, R. Fergus, S. Vishwanathan, and R. Garnett, editors, Advances in Neural Information Processing Systems 30, pages 5927–5937. Curran Associates, Inc., 2017.
Ranganath et al. (2014)
Rajesh Ranganath, Sean Gerrish, and David M Blei.
Black Box Variational Inference.
Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, 33:814–822, 2014. ISSN 15337928. URL http://proceedings.mlr.press/v33/ranganath14.html.
- Simony et al. (2016) Erez Simony, Christopher J Honey, Janice Chen, Olga Lositsky, Yaara Yeshurun, Ami Wiesel, and Uri Hasson. Dynamic reconfiguration of the default mode network during narrative comprehension. Nature communications, 7:12141, 2016.
- Tucker et al. (2019) George Tucker, Dieterich Lawson, Shixiang Gu, and Chris J. Maddison. Doubly Reparameterized Gradient Estimators for Monte Carlo Objectives. In International Conference on Learning Representations, number 1, pages 1–12, 2019. URL http://arxiv.org/abs/1810.04152.