Understanding Cognitive Fatigue from fMRI Scans with Self-supervised Learning

06/28/2021 ∙ by Ashish Jaiswal, et al. ∙ The University of Texas at Arlington Kessler Foundation 20

Functional magnetic resonance imaging (fMRI) is a neuroimaging technique that records neural activations in the brain by capturing the blood oxygen level in different regions based on the task performed by a subject. Given fMRI data, the problem of predicting the state of cognitive fatigue in a person has not been investigated to its full extent. This paper proposes tackling this issue as a multi-class classification problem by dividing the state of cognitive fatigue into six different levels, ranging from no-fatigue to extreme fatigue conditions. We built a spatio-temporal model that uses convolutional neural networks (CNN) for spatial feature extraction and a long short-term memory (LSTM) network for temporal modeling of 4D fMRI scans. We also applied a self-supervised method called MoCo to pre-train our model on a public dataset BOLD5000 and fine-tuned it on our labeled dataset to classify cognitive fatigue. Our novel dataset contains fMRI scans from Traumatic Brain Injury (TBI) patients and healthy controls (HCs) while performing a series of cognitive tasks. This method establishes a state-of-the-art technique to analyze cognitive fatigue from fMRI data and beats previous approaches to solve this problem.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Functional magnetic resonance imaging (fMRI) measures slight changes in blood flow that occur with activity in different brain regions. This imaging technique is completely safe and non-intrusive to the human brain. It is used to identify parts of the brain that handle critical functions and evaluate the effects of conditions such as stroke and other diseases. Some abnormalities can only be found with fMRI scans as it provides detailed access to activity patterns in a human brain. Traumatic Brain Injury (TBI) is one of the most prevalent causes of neurological disorders in the US faul2010traumatic. It is a condition that has been shown to affect working memory christodoulou2001functional, and cognitive fatigue kohl2009neural. In this work, we focus on understanding cognitive fatigue that results from performing standardized cognitive tasks as it is one of the primary indicators of moderate-to-severe TBI.

Cognitive Fatigue (CF) is a subjective lack of mental energy that is perceived by an individual which interferes with everyday activities deluca2008neural. It is a common condition among people suffering from moderate to severe brain injury. There are multiple approaches to assess CF through various cognitive tasks and assessment tests by using objective measures such as response time (RT), error rate (ER) deluca2008neural. However, these measures have certain limitations and do not correlate well with the self-reported scores during the tasks wylie2017cognitive. The inability to relate objective measures to self-reported cognitive fatigue led us to study the blood-oxygen-level-dependent (BOLD) signal associated with neural activation changes. The increased BOLD activation represents an increase in the cognitive work that individuals with TBI must expend to perform a certain task wylie2017cognitive.

With the advancements in deep learning techniques that can efficiently extract meaningful information from images/videos, we attempted to build a model that predicts self-reported cognitive fatigue scores based on neural activations captured through the fMRI scans. The main contribution of this work can be listed as follows:

  • To the best of our knowledge, there has not been enough work done to analyse cognitive fatigue from fMRI data.

  • We worked with a novel dataset that consists of fMRI scans from both TBI patients and healthy subjects.

  • A self-supervised model to understand and predict the correlation between neural activation and self-reported fatigue scores.

The rest of the paper is structured as follows: section 2 discusses some of the previous works that examine cognitive fatigue from fMRI data and results that use deep learning for brain imaging. Section 3 explains the data collection, preprocessing stage, and the system architecture. Finally, section 4 shows our experiments and results, followed by our conclusion and future directions in section 5.

2 Related Work

Previous researches have demonstrated that in people with Traumatic Brain Injury (TBI), the caudate nucleus of the basal ganglia shows a distinct pattern of activation over time than in healthy controls kohl2009neural. This finding was consistent with Chauduri and Behan’s fatigue model chaudhuri2000fatigue, in which the basal ganglia played a crucial part in fatigue experience. On the other hand, Kohl et al. kohl2009neural inferred the presence of fatigue based on the pattern of brain activations across time. The following study by Wylie et al. wylie2017cognitive was the first to look into state fatigue in people who have had a moderate-to-severe TBI. Furthermore, they investigated the involvement of the Caudate nucleus in fatigue by seeing if it changes its activity in direct proportion to the patients’ instantaneous (state) fatigue experience genova2013examination.

While the Caudate nucleus and the Striatum as a whole were previously thought to be solely responsible for motor behavior control middleton2000PL, recent evidence from animal and human research shows that this region is involved in a wide range of cognitive behaviors, including learning tricomi2008feedback; dobryakova2013basal; schultz1993responses, outcome processing delgado2001tracking; delgado2003dorsal, and working memory Lewis2004StriatalCT; Akhlaghpour2016AuthorRD. Recent data indicate that fatigue caused by such cognitive tasks may manifest itself in Caudate nucleus activation dobryakova2015dopamine. In children who have suffered a TBI, cognitive fatigue has been linked to a network of areas in the Striatum and PFC, including the vmPFC, nucleus accumbens, and Anterior Cingulate Cortex (ACC) ryan2016uncover.

With the rapid increase in the availability of medical imaging datasets, deep learning has been adopted efficiently to process the data for diagnosing various diseases ramesh2020multi and rehabilitation purposes farahanipad2020hand

. Specifically, works have been done in predicting diseases and subject traits using fMRI data with machine learning techniques

pereira2009machine; khosla2019machine. Further, convolutional neural networks (CNNs) lecun1995convolutional

, an approach that has been known to be very successful in solving computer vision tasks have been widely used to analyze the spatial features in fMRI images as well. A 4-layer convolutional neural network was proposed in

wang2018task for classification from raw fMRI voxel values.

In shen2019deep, the authors used deep convolutional networks (DCNs) to encode fMRI images into low-dimensional feature space and decode them back for image reconstruction. Similarly, the authors in mozafari2020reconstructing

proposed a large-scale bi-directional generative adversarial network called BigBiGAN to decode and reconstruct natural scenes from fMRI patterns. Furthermore, an architecture based on sparse convolutional autoencoder was used in

huang2017modeling to learn high-level features from handcrafted time series derived from raw fMRI data.

There has also been a recent surge in the use of sequence models to process temporal fMRI data. Mao et al. mao2019spatio applied a specific type of RNN known as Long Short-Term Memory (LSTM) to process spatial features extracted from a CNN network. Another similar work in thomas2018interpretable used bi-directional LSTM along with a CNN.

3 Methodology

To analyze cognitive fatigue from fMRI brain scans, data was recorded from both TBI patients and healthy controls to study the differences in the extent of cognitive fatigue between the two groups. The recorded data was stored in NIfTI format (Neuroimaging Informatics Technology Initiative – a file format for storing neuroimaging data). As raw data in NIfTI format contains noise, a standard preprocessing pipeline was implemented to normalize and smoothen the data as shown in figure 2. Both raw and pre-processed data were used to train different models, and their results were compared. The following sections explain data collection, pre-processing, and system architecture to process the data.

Figure 1: A flow diagram of a series of N-back tasks performed during data collection (SR Score: Self-Reported Fatigue Score)

3.1 Data Collection and Pre-processing

During data collection, fMRI scans of the brain were recorded over a period where each subject was asked to perform a series of standardized cognitive N-back tasks as shown in figure 1. Thirty participants with moderate-severe TBI and 24 healthy controls (HCs) were involved in data collection. The average age of the subjects was 41 years (SD=12.7). Each participant performed four rounds of both 0-back and 2-back tasks. A resting-state fatigue score was reported at the beginning, followed by scores being reported after each round. Functional images were collected in 32 contiguous slices during eight blocks (four at each of two difficulty levels), resulting in 140 acquisitions per block (echo time = 30 ms; repetition time = 2000 ms; field of view = 22 cm; flip angle = 80°; slice thickness = 4 mm, matrix = 64 × 64, in-plane resolution = 3.438 ). Using the Visual Analog Scale of Fatigue (VAS-F), the subjects were asked to rate the amount of fatigue they experienced (in the range 0-100) after each round of the N-back task. The self-reported scores were mapped to six different classes to make it a multi-class classification problem as represented in table 6

. The size of the final 4D tensor acquired in NIfTI format was 140 x 32 x 64 x 64. The raw fMRI images were preprocessed using Analysis of Functional NeuroImages (AFNI)

cox1996afni and other standard techniques as discussed in previous works wylie2017cognitive, shown in figure 2.

Figure 2: Pre-processing pipeline for fMRI scans to convert from raw NIfTI format to normalized and smoothed version

3.2 System Architecture

Encoders are vital in any self-supervised pipeline as they map high-dimensional input to a low-dimensional latent space ensuring significant features of the input are preserved. The choice of encoder architecture determines how well a model learns the feature representations. In our encoder, the output from a specific layer is pooled to get a single-dimensional feature vector for every input sample.

Figure 3: Spatio-temporal Encoder Architecture: CNN layers extract spatial features while LSTM layers model the temporal aspect of the fMRI images followed by attention-based averaging over time

fMRI scans are 4D in shape and are represented as (t, x, y, z), where ’t’ represents the time-steps of individual 3D brain scans, and the other three dimensions represent the intensity of voxels (1 x 1 x 1 mm) in the brain. The temporal relation between the scans recorded at different time steps is captured using a Recurrent Neural Network (RNN) based architecture. We combined a CNN architecture with an LSTM network for the encoder as shown in figure


. We used three layers of 2D convolution and batch normalization to learn the spatial features of the images.

The encoder was pre-trained on a public dataset called BOLD5000 chang2019bold5000 in a self-supervised approach and was fine-tuned on our labeled dataset by adding a linear classifier layer at the end.

3.2.1 Self-supervised Pre-training

The performance of any supervised learning method highly depends on the choice and quality of data features and annotations available for the data. Unfortunately, traditional supervised methods are hitting a bottleneck in terms of performance because they rely on manually expensive annotated labels. But, with the recent advancements in self-supervised approaches as discussed in jaiswal2021survey, unlabeled samples can be efficiently utilized to learn data representations. They utilize meta-data generated from the dataset that acts as pseudo-labels during the training process. One such approach is contrastive learning which has proven to show great results in recent times to learn image and video representations effectively. In this case, 4D fMRI data can be treated as a series of videos such that self-supervised methods zadeh2020self can be applied to it.

Figure 4: Self-supervised Pre-training Framework: MoCo algorithm for pre-training on BOLD5000 dataset. The green arrows represent positive pairs and red arrows represent negative pairs.

For pre-training the encoder, we used a contrastive-based approach where for every sample in the batch containing samples, two augmented versions are generated, resulting in 2N number of samples in the batch. For every sample, its augmented version is considered as the positive candidate whose similarity is encouraged to be maximum. In contrast, the similarity between the positive and negative pair is encouraged to be as low as possible. This condition is represented in figure 4

with green and red double-headed arrows. We use cosine similarity to measure the closeness between two samples in the batch. Cosine similarity between two vectors is the cosine of the angle between them and is defined as,


During training, we applied extensive spatial and temporal augmentation for the image data. As part of spatial transformation, methods such as random affine, z-normalization, re-scale intensity, and one of random blur, random gamma, random motion, and random noise were used. Similarly, a random starting time t is selected for temporal augmentation, and ’n’ consecutive scans are extracted. Finally, the loss is calculated using a variant of the Noise Contrastive Estimation function (NCE) called InfoNCE, which is used when there is more than one negative sample present during the learning process and is defined in equation



In the equation 2, represents the current sample, represents the positive sample (augmented version of ), and represents the negative samples (other samples in the batch). represents the temperature coefficient.

In this experiment, we used a variant of the contrastive approach called MoCo he2020momentum that includes negative samples as a dictionary queue and has been proven to be effective compared to other methods. Two encoders with the same architectural configuration are used; the main encoder Q (Query Encoder) is trained end-to-end on the sample pairs, while the second encoder (Momentum Encoder) shares the same parameters as Q. The momentum encoder generates a dictionary as a queue of encoded keys with the current mini-batch enqueued and the oldest mini-batch dequeued. It gets updated based on the parameters of the query encoder using an update parameter called momentum coefficient as represented by equation 3. In equation 3, is the momentum coefficient. Only the parameters are updated by back-propagation.


4 Experiments and Discussion

Figure 5: Confusion matrix for self-supervised (fine-tuned) method for classes 0-5 (left-to-right). The diagonal represents the number of correctly classified instances
Fatigue Score (SR) Class
0-10 0
10-20 1
20-40 2
40-60 3
60-80 4
80-100 5
Figure 6: Mapping self-reported (SR) scores to respective class labels

For fMRI images, most of the publicly available datasets contain data in NIfTI format. We used two different data formats to train the models: one using the raw NIfTI version and the other using pre-processed normalized version as shown in figure 2. For self-supervised pre-training, we only used NIfTI data for all four subjects in the BOLD5000 dataset. As shown in figure 4, the encoder was trained using MoCo he2020momentum

algorithm and Adam optimizer. The pre-training was carried out for a total of 200 epochs. The starting learning rate was set to

with a weight decay factor of and momentum parameter as 0.9. The learning rate was decayed by a factor of 10 at 120 and 160 epochs, respectively.

Initially, the primary encoder was trained on the collected dataset separately using a supervised approach. It allowed us to set a benchmark for other approaches. For the supervised method, we used both NIfTI and pre-processed data to train two different models. On the other hand, once the encoder was pre-trained on the BOLD5000 dataset in a self-supervised way, it was finetuned with the NIfTI version of our dataset. We can analyze the performance of different models from table 1. Four NVIDIA GTX 1080 Ti GPUs were used to train the models, whereas, for testing, only one GPU was used. As shown in table 1, our method beats the previous state-of-the-art model on classifying the level of cognitive fatigue from fMRI scans.

Approach Dataset Used Data Format Accuracy
Zadeh et al. zadeh2020towards Private (with Caudate mask) NIfTI 73.00
Supervised Ours NIfTI 74.35
Supervised Ours Pre-processed 82.79
Self-supervised (finetuned) BOLD5000 + Ours NIfTI 86.84
Table 1: Accuracy results for different models on cognitive fatigue classification task. Accuracies are calculated with 3-fold cross-validation

5 Conclusion

In this paper, we presented a spatio-temporal architecture pre-trained with self-supervised methods for processing 4D fMRI data that predicts cognitive fatigue in both TBI and healthy subjects. Unlike previous works that used masks to focus on a particular region of the brain, our method learns essential areas of the brain with maximum neural activations on its own. Motivated by video classification, we were able to use CNN layers to extract spatial features and an LSTM network for temporal modeling over a period of time. Our architecture obtained state-of-the-art performance for classifying different levels of cognitive fatigue from fMRI data. Future works include exploring large-scale public datasets with better self-supervised algorithms to improve the overall performance and the granularity of cognitive fatigue prediction.