I Introduction
Ia Context
Mental disorders are studied through multiple neuroimaging techniques, but restingstate functional MRI (rsfMRI) has become increasingly important because it allows researchers to image the functional dynamics of a subject’s resting brain over time. Progress in the understanding of mental disorders from rsfMRI data has largely been achieved through linear representation learning techniques, such as independent component analysis (ICA)
[1]. These techniques have opened up the ability to study largescale functional connectivity differences in complex mental disorders such as autism spectrum disorder (ASD) and schizophrenia, either statically or dynamically [2]. The success of linear representation learning and the increased use of deep learning methods in the field of neuroimaging paves the way towards analyzing these functional differences using deep learning techniques. Findings obtained with deep learning analyses can be complemented with previous research and linear representation learning to move towards individualized predictions [3], a better understanding of mental disorders, and more effective individualized treatment.Deep learning methods in fields other than rsfMRI analysis are often applied to minimally processed data. An important side note here is that these nonfMRI datasets are often also one or two magnitudes larger than many rsfMRI datasets. The application of deep learning techniques to rsfMRI data generally involves a dimensionality reduction step after which a supervised neural network is trained to perform classification. Supervised classification is attractive because neural networks gained attention for their outstanding classification performance, most famously on ImageNet
[4]. The nonlinearities that neural networks can model are likely interesting in our understanding of mental disorders as well. A 3dimensional adaptation of supervised convolutional neural networks was able to obtain robust discriminative neuroimaging biomarkers
[5].The methods that are used to perform dimensionality reduction, like independent component analysis (ICA) and principal component analysis (PCA), are unsupervised however. These methods assume that the data is generated through some unseen factors. In case of ICA, these factors are often referred to as intrinsic networks because they are spatially independent and interpretable as separate localized functional networks in the brain. Datadriven methods that are used to find these generative factors, like ICA and PCA, have been extended to restricted Boltzmann machines (RBMs)
[6], which is a precursor to contemporary unsupervised deep learning models like variational autoencoders (VAEs) [7]. VAEs have recently been popularized for nonrsfMRI data and gained attention due to their interpretable latent space and ability to variationally learn generative factors that fit a certain prior. Previous work evaluates representation learning with VAEs on rsfMRI data that has first undergone dimensionality reduction [8, 9, 10, 11]. These dimensionality reductions may incur overly specific inductive biases and, as a result, limit the expressivity of deep learning methods, especially since neural networks are considered universal function approximators. [12].IB Problem statement
This work looks at whether unsupervised deep learning methods can learn informative representations from minimally processed voxelwise rsfMRI data that has not undergone spatial dimensionality reduction(s). Given the high dimensionality of rsfMRI data, it is useful to be able to reduce the data dimensionality in such a way that we retain information about the variables we want to study. This work, therefore, evaluates the validity and usefulness of the proposed nonlinear dimensionality reduction by evaluating the model’s latent representations on downstream age regression and sex prediction tasks. Due to its success as an unsupervised representation learning technique, this work uses a variational autoencoder (VAE) [7]. VAEs learns to maximize the lower bound on the marginal likelihood of the training data.
One of the main reasons that this work considers the age regression and sex prediction task as the main downstream tasks, is the availability of a large rsfMRI dataset that records both demographic factors. In this work, we find that large datasets can be important for representation learning from minimally processed rsfMRI data. The introduced method is also evaluated on a schizophrenia diagnosis prediction task, with and without first pretraining the model on the larger age regression and sex prediction dataset to study whether pretraining improves downstream performance [14, 15]. To evaluate the effect of the dimensionality of the representations on downstream task performance, the age and sex prediction tasks are performed with representations of varying sizes. These results are compared to a linear baseline that performs principal component analysis (PCA) with the same varying number of components.
Ii Related work
Some interesting prior work has used variational autoencoders to analyze rsfMRI data. Such work has focused on modeling functional brain networks and ADHD identification [16], representation learning [10], automatically clustering connectivity patterns [8], schizophrenia, bipolar disorder and autism spectrum disorder classification [9], and spatiotemporal trajectory identification [11]. It is important to note however that each of these methods performs spatial and sometimes temporal dimensionality reduction before they use the data as input for their VAE. The nonlinear dimensionality reduction on voxelwise rsfMRI data, that we propose, may however lead to larger gains in terms of meaningful information retention. Using minimally preprocessed data is often seen as the norm in other deep learning fields because it simplifies the data acquisition process and does not reduce any of the potential information in the data.
Iii Variational Autoencoders
VAEs have become an important model architecture for unsupervised representation learning. Variational methods are a way of describing an unknown probability density function that is hard to sample from and/or approximate, byways of parametrizing a simpler probability distribution, such as a Gaussian, in such a way that it can approximate the unknown distribution
[17]. VAEs provide a neural networkbased autoencoder perspective to this problem. Since VAEs are generative models, their goal is to find the underlying latent factors that generate the data. This ties back to the assumption in methods like independent component analysis (ICA) and principal component analysis (PCA), that the dimensionality of the latent factors is smaller than the original dimensionality of the data. Both of these methods are often used to reduce the dimensionality of rsfMRI data.The problem can formally be set as having an rsfMRI dataset with T timepoints and N subjects. Each is generated from an unknown conditional distribution , where is assumed to be a random unseen continuousvalued variable sampled from a prior distribution [7]
. Both the prior distribution and the conditional distribution are unknown and the integral over the marginal probability of
: is therefore intractable. Bayesian variational methods can be used to tackle this problem and VAEs have become a common nonlinear method to do so [7].VAEs approximate the intractable posterior distribution using a recognition model, parameterized as a neural network, [7], which can be thought of as an encoder from a coding theory perspective. The conditional distribution , also parameterized as a neural network, can then be thought of as a decoder. This allows us to perform optimization using the evidence lower bound (ELBO), which is a lower bound on the marginal likelihood of a data point and is explained in detail in the original VAE paper [7]
. The objective function, with which the VAE is trained, is the average over all of the data points in a dataset. The loss can then be split into two parts, the first part minimizes the KLdivergence between an apriori selected prior and the distribution that is parameterized by the encoder. Although there is no clear consensus, rsfMRI data is often seen as or assumed to be normally distributed
[18]. The prior () we thus choose is a diagonal multivariate normal distribution, where each of the dimensions of the normal distribution does not explicitly depend on each other. The second part of then maximizes the loglikelihood of a data pointgiven an estimated latent variable
. Although each rsfMRI time point is assumed to be an i.i.d. sample when training the VAE, the temporal relation between the latent variables for a single subject () is considered during each prediction task.Iv Data
This work uses multiple datasets, one large dataset to do age regression and sex prediction with. The other dataset is used to perform the schizophrenia diagnosis prediction task on.
Iva Age and sex dataset
The rsfMRI data that is used for age and sex prediction are subjects without any diagnoses or selfreported illnesses (). These subjects were selected from the 22,392 subjects that were available in the UK Biobank repository on April 7th, 2019 [19]. The subjects have a mean age of
with a standard deviation of
, are female. The youngest subject is 45 years old and the oldest is 80. The scanning parameters are explained in greater depth in the original UK Biobank paper [19], however, an important parameter in this work is the repetition time (TR = seconds). With an acquisition time of minutes, the UK Biobank data acquires a total of 490 time points. The data is minimally preprocessed with the Melodic pipeline [19] and registered to the MNI EPI template with the help of FMRIB’s Linear Image Registration Tool (FLIRT). The registration is followed by normalization in SPM12, after which it is smoothed with a mm wide FWHM Gaussian kernel. This results in rsfMRI volumes with a size of x x voxels, and timepoints per subject. The size of the volumes and the number of timepoints lead to large memory requirements during training and rsfMRI data can be noisy. To tackle both problems simultaneously, we use a piecewise aggregate approximation (PAA) to reduce the noise and memory consumption, while still keeping the trend of the time series. PAA takes the average over points in consecutive windows with a certain window size. For the UK Biobank dataset, the window size is set to , which is equivalent to taking the average over a period of about seconds. This reduces the number of time points to .IvB Fbirn
The dataset that is used to evaluate whether meaningful information about a mental illness is retained in the representations is the FBIRN study [20]. The dataset is processed using the NeuroMark preprocessing pipeline [21] to obtain rsfMRI volumes with a size of x x voxels. The repetition time for FBIRN is seconds. To stay in line with the temporal preprocessing that is done for the UK Biobank dataset, we apply PAA to FBIRN as well, but to account for the different repetition times, the window size that is used is . This corresponds to a period of seconds.
V Methodology
The downstream task performances will be compared to a linear baseline method. There is not much previous work that looks at voxelwise rsfMRI representation learning, although other work focuses on supervised sex classification and age regression on the same dataset used in this work [13]. Supervised methods are however not a comparable baseline, although it is insightful as a bound to strive towards. A linear dimensionality reduction method that is in some sense comparable to VAE in terms of what components it tends to learn is principal component analysis (PCA) [22]. Given that PCA and a VAE are comparable and that there are online versions of PCA [23] that do not have large memory requirements, the linear baseline used in this work is IncrementalPCA [23, 24].
Since one dataset is significantly smaller and is prone to overfitting, this work also evaluates whether a model that is pretrained on a large dataset like UK Biobank (n=) can be finetuned on FBIRN (n=) to improve representations for schizophrenia diagnosis prediction. We compare pretrained models that have been finetuned for a variable number of epochs on the FBIRN dataset.
Va Model architecture
The architecture of the model is based on a ResNet [25]. Each residual block in the encoder and decoder has a skip connection. These skip connections allow the network to learn longer dependencies and have been used in VAEs before [26] to improve their variational inference. The activations that are used in the network are exponential linear units (ELUs) [27]
. Instead of batch normalization, the network uses weight normalization
[28]. Both the activation function and weight normalization are considered best practices when training VAEs
[28]. Batch normalization may lead to drift during inference in a VAE which can cause unstable results.The encoder consists of five residual blocks, with , , , , and output channels for each block, respectively. These blocks all downscale their original inputs by two until the last residual block produces an output feature map of x x x , which is flattened to
features. These features are then used to learn the mean and variance of the multivariate Gaussian in the latent space from which latent variable
is sampled. The layer computes the square root of the natural logarithm of the variance instead of the standard deviation to increase the stability of training the network and to make sure variations due to gradient updates have a smaller effect on standard deviations near zero. This allows the network to model the standard deviations near zero more accurately. The first layer in the decoder is a linear layer that maps latent variable to features, which are then reshaped to a x x xfeature map. We use trilinear interpolations on the feature maps with a scale of two to double the size. The rest of the decoder consists of five residual blocks, where each residual block is preceded by a trilinear interpolation layer. The final layer is a
x xtranspose convolutional layer, without an activation function. In earlier iterations of this work we tried to use a sigmoid activation on the last layer, this leads to instability, because the combination of the mean squared error (MSE) as a loss function and a sigmoid leads to a nonconvex objective function. Each of the layers is initialized according to work that proposed ResNets
[25], this initialization is also used in the original ELU paper [27]. The VAE is trained for 100 epochs using the ADAM optimizer [29] with a learning rate of . Before the input data is used it is first rescaled to be between [0, 1], values below are then thresholded to remove possible background noise.VB Regression and classification
After training the VAE, there are multiple ways to evaluate what information is contained in the representations (
. To evaluate whether the temporal information improves classification and regression with simple machine learning classifiers, these classifiers are trained with a subject’s latent temporal average
and also with a subject’s concatenated latent time series. The machine learning classifiers that are used in this work are a support vector machine (SVM) and a knearest neighbor classifier (kNN) for the classification tasks and a support vector regression (SVR) and kneighbor regressor (kNR) for the age regression task. These classifiers give us insight into the linear separability of the representations (SVM) or how well they are clustered (kNN). To take the temporal information between the representations into account more specifically, we also train a longshort term memory (LSTM)
[30] on the full latent time series. The LSTM is either trained with a mean squared error (MSE) for the regression task or a binary crossentropy (BCE) loss for the classification task. The hidden size for the hidden states in the LSTM () are twice the size of the input representations, and all of the hidden states in the LSTM are concatenated together to form a feature vector that is then mapped to a prediction using a linear layer. This allows the model to learn from the hidden state at each timestep more directly. The size of that feature vector can be quite big, so we apply dropout to that last linear layer. This is a common technique to counter overfitting and promote a more robust prediction model [31].VC Evaluation measures
To be able to compare the results obtained using each of the methods, we use multiple evaluation measures. The first measure is used for the classification tasks and computes the area under the receiver operating characteristic (ROCAUC), which is a more complete way of comparing binary classifiers. To evaluate the regression task, we use three measures, the first is the mean average error (MAE) which is the L1norm between the predicted age and the correct age. The second measure is the R2score, which is also referred to as the coefficient of determination. The last measure is the Pearson productmoment correlation between the predicted ages and the true ages.
Vi Experiments
The code for the VAE was implemented by the authors in PyTorch
[32], training was performed with Catalyst [33] and TorchIO [34], and the regression and classification pipelines were implemented using RAPIDSAI [35], scikitlearn [24], and NumPy [36]. To minimize costly transfers between the CPU and the GPU, most of the classifications were done using RAPIDSAI [35] , to make sure the computed representations could be kept in GPU memory without any copies or transfers from or to the CPU. All of the experiments were performed on an NVIDIA DGX1 V100. Due to time restrictions, the UK Biobank experiments could only be performed on one train and test split, because each VAE epoch takes around 45 minutes on a single GPU. Training on multiple training and test folds is essential for the schizophrenia diagnosis prediction task because the variance between predictions can be large, especially for deep learning models. The schizophrenia results are thus trained over 5folds, where one fold is used as the heldout set and the other four folds are used as a training and validation set. To make sure the model does not overfit, we use an early stopping criterion that stops the model if its loss objective has not improved on the validation set for 20 epochs. Further, instead of taking the that is sampled from the distribution that the encoder outputs, we use the mean of that distribution. This is to reduce the stochasticity during inference. The reason we do not use the standard deviation is that it did not improve our preliminary results.To determine the effect that the size of the representations has on the performance of the age regression and sex classification tasks, the model is trained with multiple latent dimensionalities, specifically: , , , and . These tests are done on the UK Biobank dataset because we noticed that training on a larger dataset is more stable.
We tested whether initializing a model for the schizophrenia classification task with a pretrained model on UK Biobank improves the results on that task. The number of latent dimensions that are used is . The finetuning is evaluated for , , , , , , and epochs and compared to a nonpretrained model, and the baseline.
The baseline for this work is IncrementalPCA, which is implemented in scikitlearn [24]. Similar to the VAE, the principal components are obtained for each volume in a subject’s time series independently. The components are whitened and temporally averaged. They are then used as input to the downstream classifiers/regressors.
Vii Results
The age and sex downstream tasks are evaluated for multiple latent dimensionalities and compared with a baseline PCA that has the same number of components. The classification methods are referred to as SVM/SVR and kNN/kNR when the representations for each timestep are concatenated to create a single feature vector (, …, ) for each subject. Classifiers mSVM/mSVR and mkNN/mkNR take the average representation over the timesteps as input for each subject. The dimensional VAE did not converge well which led to worse results for that specific model. It is also important to recognize that the representations used to perform age regression are the same as the ones used to do sex classification. The model is thus able to retain information about both the sex classification and age regression tasks, something a supervised model would have to specifically be trained for with multitask learning.
Viia Age regression results
The age regression task is evaluated using three measures, the mean absolute error (MAE) is seen as most important in this work, the results for the task are shown in Figure 1. All of the VAE models, even the nonconverging dimensional VAE, outperforms the baseline PCA method. The best performing model is the dimensional VAESVR with an MAE of years, an R2 score of , and a correlation between the predicted and ground truth ages of . The general trend for the number of latent dimensions is that more latent dimensions improve the downstream performance on the age regression task. The difference between the dimensional VAE and the dimensional VAE is however significantly smaller than between the two smaller latent dimensionalities. Another interesting result is that the SVR and mSVR always outperform the kNR and mkNR, which suggests that the latent space is linearly separable for the age regression task, as opposed to being clustered based on age. Furthermore, the SVR and mSVR also outperform the LSTM. The LSTM is a sequential method and is therefore good at representing temporal data, it may struggle with data where temporal relations do not aid in regression improvements. The LSTM does perform well on the correlation between the predicted and ground truth ages. The lower performance of the LSTM and the near equivalent performance of the mSVR and SVR may imply that there is no (linear) temporal relationship between age and the VAE’s representations.
Comparatively, previous work [13] reports that on the same dataset, a voxelwise supervised deep learning model achieves an MAE of years, an R2 score of , and a correlation of . It is important to note that the VAE model in this work received no supervised signal to model the features necessary to achieve its downstream age regression results. Further, the same previous work [13] finds that using the ICA time series achieves an MAE of , highlighting the benefits of using our proposed dimensionality reduction because it outperforms the ICA time series. Further, they find that taking the mean over the temporal dimension before using the input for the classification task improves performance, which seems to be the same in this work.
ViiB Sex classification results
The results in Figure 2 show that the age classification task for this dataset is fairly trivial and the baseline performs only slightly worse than the VAESVM and VAEmSVM. The baseline outperforms the dimensional VAE because it did not converge well.
All of the models improve with increasing latent dimensionality, although the model only slightly improves with more than dimensions. Interestingly, in contrast to the age regression results, the LSTM performs only slightly worse than the mSVM and SVM. The mkNN and kNN are always outperformed by the mSVM, SVM, and LSTM. The best performing model is the dimensional VAEmSVM with an ROCAUC of . In general, except for the dimensional VAE, the mSVM outperforms the SVM, which suggests that the temporal information for each subject does not help with the linear separability of sex in the latent space. The differences between the SVM and the mSVM are likely insignificant in terms of determining which one is better, but the SVM does not seem to have any performance gain over the mSVM.
ViiC Schizophrenia classification results
The downstream schizophrenia diagnosis classification task is evaluated for a pretrained (PT) and a nonpretrained model (NPT). The pretrained model is the same for every run but is finetuned on the schizophrenia diagnosis prediction task for a variable number of epochs. It turns out that finetuning the pretrained model for epoch in combination with the downstream LSTM yields the best (an average ROCAUC of ) and least variable results over multiple folds. It is clear from Figure 3 that, especially in combination with the LSTM, finetuning with more epochs leads to worse downstream performance. This is likely because the model starts to overfit on FBIRN, which is likely due it being a much smaller dataset. It is also notable that the pretrained model that is finetuned for epochs on FBIRN in combination with the LSTM still performs better than the model that is only trained on FBIRN for epochs. Further, the fact that the LSTM performs the best indicates that the temporal information in the representations is related to the schizophrenia diagnosis of the subjects. It may also imply that the information about schizophrenia diagnosis is nonlinearly entangled in the representations. The mSVM does generally outperform the SVM, which may further suggest that the temporal information in the representations is at least not linearly related to a schizophrenia diagnosis. We also find that the kNNbased classifiers are consistently outperformed by the other classifiers and that the baseline ranks lowest in terms of performance.
To compare this to other work, we look at Whole MILC [14], which uses a novel form of pretraining. Their best supervised classification model achieves an average ROCAUC on FBIRN of roughly , which is not that much higher than what we have achieved with our unsupervised representations. Note that they utilize data that has undergone dimensionality reduction using ICA with a spatially constrained prior [38]. Our pretraining results are also similar in that we both find an improvement in downstream task performance after pretraining, although they do not use an unsupervised autoencoder as the primary model that they propose in their work.
ViiD Sexbased group differences voxel space
A VAE is a generative model and is thus capable of reconstructing locations from the latent space. To visualize the differences in the VAE models between males and females in its latent space, the average representation for both groups is decoded back into the voxel space and the reconstruction for males is subtracted from the reconstruction for females. The resulting volume is then thresholded at the highest 80th quantile absolute value and the differences are shown in Figure
4. The decoded groupwise differences show that women on average have increased rsfMRI activation in a large area of the prefrontal cortex, which has been reported in the literature before [37]. There also appears to be some increased average activity in the left and right inferior parietal lobules. The activity in the inferior parietal lobules and between the occipital lobe and the cerebellum looks more like noise and does not persist when higher thresholds are used.Viii Conclusion
This work investigated whether unsupervised deep learning techniques, more specifically a VAE, can learn robust representations that can be used in downstream neuroimaging tasks from minimally preprocessed voxelwise rsfMRI data. The representations learned by a VAE were evaluated on multiple downstream tasks and for multiple different latent dimensionalities on the UK Biobank dataset. The larger the dimensionality, the better the performance. It turned out that the sex classification task was rather trivial, but the difference between the baseline and our model became clear on the age regression task. The model was also evaluated on FBIRN, a schizophrenia dataset, both without and with pretraining on UK Biobank first. The optimal number of finetuning epochs on FBIRN turned out to be epoch. Our results open up more work into nonlinear dimensionality reduction of voxelwise rsfMRI data with unsupervised deep learning methods while retaining meaningful information.
Ix Discussion
The results in this work show that there is great potential for voxelwise rsfMRI representation learning with a VAE. The representations that are learned by the VAE contain information that allows linear classifiers and regressors to predict the sex and age of a subject with high precision. The model also performed well on a downstream schizophrenia diagnosis prediction task. The results on the latter especially opened up more work into the pretraining of unsupervised models like the one proposed in this work. The pretrained model is first trained on a larger dataset before finetuning it on neuropsychiatric datasets for a few epochs and using the final model to encode lowdimensional representations. Even without finetuning our pretrained model outperformed a model that was trained only on FBIRN.
Important future considerations include the need for a more efficient solution in terms of computational runtime. An important bottleneck during this project was loading the large data files onto the GPU. Improving data movement and transfer will make experimentation more feasible and may also address the shortcomings that smaller batch sizes may have on the training dynamics of a neural network. If training times can be reduced, it is also possible to train on even larger datasets and perform hyperparameter tuning on a large scale. Larger datasets will likely improve the results on all of the downstream tasks, similar to the improvements pretraining had on the schizophrenia diagnosis prediction task’s performance. Smaller datasets are more likely to lead to overfitting, especially since voxelwise rsfMRI data is highly noisy. Not only should future work look at larger datasets, but also move towards models that more efficiently use the information available in (smaller) datasets. Models that incorporate inductive biases and/or forms of regularization are required to move towards meaningful nonlinear representations for mental illnesses from voxelwise rsfMRI data.
It is also relevant to look at the manifold that our model has learned, especially in terms of the schizophrenia diagnosis prediction task. The inherent generative quality of this model supplies us with a wide range of possible interpretability analyses, because representations can be mapped back into brain space using the model’s decoder. One simple example of a generative interpretability analysis was provided in Section VIID. More complex, temporal, analyses should be designed for complex neuropsychiatric disorders like schizophrenia. Complex neuropsychiatric disorders could also benefit from studying a single subject representation, that combines the temporal information in a subject’s time series into a single vector.
References
 [1] MCKEOWN, Martin J.; SEJNOWSKI, Terrence J. Independent component analysis of fMRI data: examining the assumptions. Human brain mapping, 1998, 6.5‐6: 368372.
 [2] DAMARAJU, Eswar, et al. Dynamic functional connectivity analysis reveals transient states of dysconnectivity in schizophrenia. NeuroImage: Clinical, 2014, 5: 298308.
 [3] SUI, Jing, et al. Neuroimagingbased individualized prediction of cognition and behavior for mental disorders and health: methods and promises. Biological psychiatry, 2020, 88.11: 818828.

[4]
Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 2012, 25: 10971105.
 [5] ABROL, Anees, et al. Deep learning encodes robust discriminative neuroimaging representations to outperform standard machine learning. Nature communications, 2021, 12.1: 117.

[6]
Hjelm, R. Devon, et al. Restricted Boltzmann machines for neuroimaging: an application in identifying intrinsic networks. NeuroImage, 2014, 96: 245260.
 [7] Kingma, Diederik P.; Welling, Max. Autoencoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
 [8] Zhao, Qingyu, et al. Variational autoencoder with truncated mixture of gaussians for functional connectivity analysis. In: International Conference on Information Processing in Medical Imaging. Springer, Cham, 2019. p. 867879.
 [9] Matsubara, Takashi, et al. Deep generative model of individual variability in fmri images of psychiatric patients. IEEE Transactions on Biomedical Engineering, 2020, 68.2: 592605.
 [10] Kim, JungHoon, et al. Representation learning of resting state fMRI with variational autoencoder. NeuroImage, 2021, 241: 118423.
 [11] Zhang, Xiaodi; MALTBIE, Eric; KEILHOLZ, Shella. Spatiotemporal Trajectories in Restingstate FMRI Revealed by Convolutional Variational Autoencoder. bioRxiv, 2021.
 [12] Hornik, Kurt; Stinchcombe, Maxwell; White, Halbert. Multilayer feedforward networks are universal approximators. Neural networks, 1989, 2.5: 359366.
 [13] Abrol, Anees; Hassanzadeh, Reihaneh; Calhoun, Vince. Deep learning in restingstate fMRI. Proceedings of the 43rd annual conference of the IEEE Engineering in Medicine and Biology Society, 2021.
 [14] Mahmood, Usman, et al. Whole MILC: generalizing learned dynamics across tasks, datasets, and populations. In: International Conference on Medical Image Computing and ComputerAssisted Intervention. Springer, Cham, 2020. p. 407417.

[15]
Erhan, Dumitru, et al. Why does unsupervised pretraining help deep learning?. In: Proceedings of the thirteenth international conference on artificial intelligence and statistics. JMLR Workshop and Conference Proceedings, 2010. p. 201208.
 [16] Qiang, Ning, et al. Deep Variational Autoencoder for Modeling Functional Brain Networks and ADHD Identification. In: 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI). IEEE, 2020. p. 554557.
 [17] Mackay, David JC. Information theory, inference and learning algorithms. Cambridge university press, 2003.
 [18] Laumann, Timothy O., et al. On the stability of BOLD fMRI correlations. Cerebral cortex, 2017, 27.10: 47194732.
 [19] Miller, Karla L., et al. Multimodal population brain imaging in the UK Biobank prospective epidemiological study. Nature neuroscience, 2016, 19.11: 15231536.
 [20] Keator, David B., et al. The function biomedical informatics research network data repository. Neuroimage, 2016, 124: 10741079.
 [21] Du, Yuhui, et al. NeuroMark: An automated and adaptive ICA based pipeline to identify reproducible fMRI markers of brain disorders. NeuroImage: Clinical, 2020, 28: 102375.

[22]
Rolinek, Michal; Zietlow, Dominik; Martius, Georg. Variational autoencoders pursue pca directions (by accident). In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2019. p. 1240612415.
 [23] Artac, Matej; Jogan, Matjaz; Leonardis, Ales. Incremental PCA for online visual learning and recognition. In: Object recognition supported by user interaction for service robots. IEEE, 2002. p. 781784.
 [24] Pedregosa, Fabian, et al. Scikitlearn: Machine learning in Python. the Journal of machine Learning research, 2011, 12: 28252830.
 [25] He, Kaiming, et al. Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. 2016. p. 770778.
 [26] Kingma, Durk P., et al. Improved variational inference with inverse autoregressive flow. Advances in neural information processing systems, 2016, 29: 47434751.
 [27] Clevert, DjorkArné; Unterthiner, Thomas; Hochreiter, Sepp. Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289, 2015.
 [28] Salimans, Tim; Kingma, Durk P. Weight normalization: A simple reparameterization to accelerate training of deep neural networks. Advances in neural information processing systems, 2016, 29: 901909.
 [29] Kingma, Diederik P.; BA, Jimmy. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [30] Hochreiter, Sepp; Schmidhuber, Jürgen. Long shortterm memory. Neural computation, 1997, 9.8: 17351780.
 [31] Srivastava, Nitish, et al. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 2014, 15.1: 19291958.
 [32] Paszke, Adam; et al. Pytorch: An imperative style, highperformance deep learning library. Advances in neural information processing systems, 2019, 32: 80268037.
 [33] Kolesnikov, Sergey. Accelerated deep learning R&D. GitHub, 2018, https://github.com/catalystteam/catalyst
 [34] PérezGarcía, Fernando; Sparks, Rachel; Ourselin, Sebastien. TorchIO: a Python library for efficient loading, preprocessing, augmentation and patchbased sampling of medical images in deep learning. Computer Methods and Programs in Biomedicine, 2021, 106236.

[35]
Team, R. D. RAPIDS: Collection of libraries for end to end GPU data science. NVIDIA, Santa Clara, CA, USA, 2018.
 [36] Van der Walt, Stefan; Colbert, S. Chris; Varoquax, Gael. The NumPy array: a structure for efficient numerical computation. Computing in science & engineering, 2011, 13.2: 2230.
 [37] Hill, Ashley C.; Laird, Angela R.; Robinson, Jennifer L. Gender differences in working memory networks: a BrainMap metaanalysis. Biological psychology, 2014, 102: 1829.
 [38] Fu, Zening, et al. Altered static and dynamic functional network connectivity in Alzheimer’s disease and subcortical ischemic vascular disease: shared and specific brain connectivity abnormalities. Human brain mapping, 2019, 40.11: 32033221.
Comments
There are no comments yet.