Deep Transfer Learning for Brain Magnetic Resonance Image Multi-class Classification

06/14/2021
by   Beentherize, et al.
0

Magnetic Resonance Imaging (MRI) is a principal diagnostic approach used in the field of radiology to create images of the anatomical and physiological structure of patients. MRI is the prevalent medical imaging practice to find abnormalities in soft tissues. Traditionally they are analyzed by a radiologist to detect abnormalities in soft tissues, especially the brain. The process of interpreting a massive volume of patient's MRI is laborious. Hence, the use of Machine Learning methodologies can aid in detecting abnormalities in soft tissues with considerable accuracy. In this research, we have curated a novel dataset and developed a framework that uses Deep Transfer Learning to perform a multi-classification of tumors in the brain MRI images. In this paper, we adopted the Deep Residual Convolutional Neural Network (ResNet50) architecture for the experiments along with discriminative learning techniques to train the model. Using the novel dataset and two publicly available MRI brain datasets, this proposed approach attained a classification accuracy of 86.40 curated dataset, 93.80 accuracy on the School of Biomedical Engineering dataset. Results of our experiments significantly demonstrate our proposed framework for transfer learning is a potential and effective method for brain tumor multi-classification tasks.

READ FULL TEXT VIEW PDF

page 4

page 5

page 6

page 7

page 9

page 10

page 11

page 12

04/28/2021

Medical Transformer: Universal Brain Encoder for 3D MRI Analysis

Transfer learning has gained attention in medical image analysis due to ...
03/21/2019

Deep Radiomics for Brain Tumor Detection and Classification from Multi-Sequence MRI

Glioma constitutes 80 classified as HGG and LGG. The LGG tumors are less...
11/16/2020

A Transfer Learning Based Active Learning Framework for Brain Tumor Classification

Brain tumor is one of the leading causes of cancer-related death globall...
02/02/2021

Transfer Learning in Magnetic Resonance Brain Imaging: a Systematic Review

Transfer learning refers to machine learning techniques that focus on ac...
08/02/2019

A Structural Graph-Based Method for MRI Analysis

The importance of imaging exams, such as Magnetic Resonance Imaging (MRI...
10/05/2020

A Comparative Study of Existing and New Deep Learning Methods for Detecting Knee Injuries using the MRNet Dataset

This work presents a comparative study of existing and new techniques to...

1 Introduction

Recent advances in Artificial Intelligence (AI) and Machine Learning (ML) techniques are massively revolutionising healthcare as new capabilities of automation are being applied in electronic patient record analysis, radical personalization, medical image analysis, drug discovery, etc.

[oguzmachine]. The application AI and ML in different sectors of Healthcare is impacting its outcomes in new and profound ways. One of these outcomes is observed in Magnetic Resonance Imaging analysis [litjens2017survey].

Magnetic Resonance Imaging is a leading modality used in radiology to study anatomical and the physiological processes of the patients. It is the prevalent medical imaging method to identify tomors in brain scans of patients. MRI is frequently used to provide soft-tissue contrast because of its non-invasive approach towards medical imaging [liang2000principles]. MRI images are traditionally analyzed by a radiologist to detect abnormalities of the brain. This process of interpreting huge volumes of patient MRI scans is painstakingly difficult and time-consuming [Talo2019]. In applications where distinguishing between abnormal and healthy tissue is delicate, precise interpretations become imperative [mohan2018mri]

. Machine learning has shown considerable ability to classify, detect correctly, and segment images with precise accuracy and processing speed

[litjens2017survey]. In this research, we propose a framework that uses Deep Transfer Learning to perform fine-grain classification of brain MRI images. This process entails specific categorization of brain tumor from MRI scans of patients. Our proposed framework uses a Convolutional Neural Network (CNN) based on ResNet50 architecture [ResNet].

Brain MRI image classification is an area of extensive research at the intersection of Computer Vision (CV), Machine Learning, and Biomedical Imaging. Researchers have proposed novel methods that are geared toward tackling this research problem

[zacharaki2009classification, mohan2018mri, Summers288]

. These methods range from learning the statistical characteristics of images to dimension reduction techniques to present-day Machine Learning approaches. However, these approaches thus far present inherent limitations of dataset size requirements and computational cost.Current Machine Learning approaches require massive training datasets and extended training time to generalize unseen samples better. Deep learning methodologies have been proposed in

[moritz2000whole, mallat1989theory, chaplot2006classification, el2014computer]; in their studies, they found that a better performance is achieved, though the method required substantial data sets that require massive computational requirements and training procedures. We propose a Deep Transfer Learning framework in this paper to improve further the classification performance of brain tumor using MRI images.

Our proposed approach attained a classification accuracy of 86.40% on the National Institute of Neuroscience & Hospitals, Bangladesh dataset, 93.80% on the Harvard Whole Brain Atlas dataset and 97.05% accuracy on the Southern Medical University School of Biomedical Engineering dataset. Our experimental results indicate the proposed framework for transfer learning is robust, feasible and efficient for brain tumor classification using MRI images.

The remaining portions of the paper are organized as follows: Section 2 presents the problem statement of MRI brain tumor classification and contribution of this research. Section 3 explains the preliminaries of Deep Transfer Learning. Section 4 the proposed methods are discussed while section 5 experimental results, and section 6 discussion and conclusion.

1.1 Related Work

The task of classifying brain MRI is an area that is actively undergoing research. There has been an extensive corpus of the literature of proposed approaches amongst researchers studying brain MRI classification. These approaches are at the intersection of Computer Vision, Image Processing and Artificial Intelligence, and Machine Learning, which have been evolving over the years. The maturation of big data, compute power, and robust Machine Learning approaches have led to predominant use cases. Supervised learning algorithms

[uddin2019comparing]

like Support Vector Machine, Artificial Neural Network, Decision Tree and Random Forest, and unsupervised learning approaches like Fuzzy C-Means, Self-Organising Map, K-Mean Clustering etc., have been pivotal to these range of use cases

[mohan2018mri].

In their paper, Zacharaki et al. [zacharaki2009classification]

reported a multi-stage framework with a Support Vector Machine and K-Nearest Neighbour classification algorithms to first predict three glioma grades with an accuracy of 85% and to further the predict the glioma case into either high or low grade with an accuracy of 88%. Many pieces of literature exist on frameworks for solving brain tumor classification using MRI images

[moritz2000whole, mallat1989theory, chaplot2006classification, el2014computer]. Paul et al., in their 2017 work, presented two deep learning methods to predict brain tumor using axial MRI data. They proposed a fully connected neural network as their first method and a deep convolutional neural network as their second approach with a 91.43% overall accuracy [paul2017deep].

Chaplot et al., in their 2006 paper[chaplot2006classification]

, proposed methods using Self-Organizing Map, Artificial Neural Network, and a Support Vector Machine for brain MRI classification. El-Dahshan et al.

[el2014computer]

, in their work, presented a novel technique consisting of Discrete Wavelet Transform, Principal Component Analysis to reduce feature space of the dataset, and a Feed-Forward Neural Network for a binary classification task of 101 normal and abnormal brain MRI images. Prior research in this direction strongly underscores the relevance of automating brain MRI classification to aid neuroscientists improve healthcare outcomes. Contrary to conventional classification methods that extensively depend on feature engineering as a precursor to the classification task, a Convolutional Neural Networks learns feature representations hierarchically and directly using the data

[mohan2018mri]. This approach is crucial to enabling the classifier to learn complex, rich representations of factors of variation from phenomena. Ertosun and Robin utilized a Convolutional Neural Network pipeline comprising two modules to detect Lower Grade Glioma (LGG) vs. Glioblastoma Multiforme (GBM) and determination of tumor grade using pathological data from The Cancer Genome Atlas [ertosun2015automated]. Their experimental results achieved 96% accuracy of GBM v LGG classification, and 71% for the identification of the tumor grades. Paul et al., proposed two approaches using Fully Connected Neural Networks and Convolutional Neural Network for the classification of three tumor variants: meningioma, glioma, and pituitary [paul2017deep]. Their results show that using axial brain images increases the classification accuracy of the neural network. Afshar et al., in [afshar2019capsule] have proposed a Capsule Network that combines brain MRI images with coarse tumor boundaries to learn spatial relations to enable better classification performance. Their study obtained an accuracy of 90.89%. Anaraki et al., in their 2019 paper [anaraki2019magnetic]

, presented an approach that combines a Convolutional Neural Network with a Genetic Algorithm for Optimization used to classify three Glioma sub-types achieving 90.0% accuracy and in another experiment Glioma, Meningioma, and Pituitary tumor variants obtaining a 94.2% accuracy. Zang et al.

[zhang2013mr]

have presented a hybrid approach that utilized Digital Wavelet Transform as feature extractor and Principal Component Analysis for dimensionality reduction, and a Kernel Support Vector Machine with Particle Swarm Optimization algorithm for the estimation of C and

parameters. Saritha et al. [saritha2013classification] demonstrated Wavelet Entropy-based Spider Web Plots as features coupled with a Probabilistic Neural Network to classify brain tumor images achieving 100% accuracy. Wang et al., in [wang2015feed]

used a pipeline of Stationary Wavelet Transform (SWT) for feature extraction, Principal Component Analysis for dimensionality reduction combined with Particle Swarm Optimization, and Artificial Bee Colony for MRI brain images. Zhang et al.

[zhang2015preclinical]

in their 2015 paper, proposed a novel Computer-Aided Diagnostic system that comprises Discrete Wavelet Packet Transform (DWPT), which serves as a wavelet packet coefficient extractor, Shannon and Tsallis entropies which obtained entropy features from DWPT. Finally, they used a generalized Eigenvalue Proximate Support Vector Machine with a Radial Basis Function to classify MRI brain images into normal and abnormal classes.

Nayak et al., in their 2016 work, proposed a novel classification framework the utilized 2-D discrete Wavelet Transform to extract feature vectors, Probabilistic Principal Component Analysis for feature space reduction, and AdaBoost with Random Forests classifiers. Nayak et al. [ranjan2017stationary] further proposed a Computer-Aided Diagnosis framework the employed contrast limited adaptive histogram equalization as a mechanism to enhance tumor regions in brain MRI images, 2-D stationary wavelet transform for feature extraction, AdaBoost and Support Vector Machine algorithms for normal and abnormal brain MRI classification achieving 99.45% accuracy [nayak2016brain]. Gudigar et al., in a comparative study assessed three multi-resolution analysis methods - Discrete Wavelet Transform, Curvelet Transform and Shearlet Transform, and Particle Swarm Optimization for textual feature extraction from transformed images using Support Vector Machine for classification [gudigar2019application]. [sultan2019multi] have proposed a Deep Neural Network model using Convolutional Neural Network to classify three tumor variants (glioma, meningioma and pituitary tumor) and three glioma sub-types ( Grade II, Grade III and Grade IV) achieving an overall performance accuracy of 96.13% and 98.7% respectively.

Deep learning (DL) is becoming a cornerstone to a great many applications across disciplines [plawiak2018novel, yildirim2017texture, plawiak2014estimation, rzecki2018application]

. Back-propagation algorithm has enabled DL models to construct hierarchies of concepts and representations through layered non-linear transforms with minimal feature engineering and domain knowledge

[bengio2015editorial]. Various DL architectures are applied increasingly to problems across disciplines, from the reconstruction of satellite imagery to drug discovery and protein structure/function prediction. The ability to learn complex hierarchies of factors of variability in signals is a focal reason for the surge in Deep Learning research and practice [cao2018deep]. Deep Learning models are applied to a range of use cases across domains and disciplines [yildirim2018arrhythmia, acharya2017deep, kiranyaz2015real, yildirim2018novel, acharya2017automated, yildirim2018efficient, yildirim2018deep].

2 Problem Statement and Contribution

2.1 Problem Statement

The field of neuroscience is traditionally tasked with analyzing MRI data for the detection of tumors. The procedure, however, grapples with a considerable challenge and requires a substantial level of domain expertise through extensive formal skills acquisition. Researchers have proposed various methods to address this classification task using feature selection and reduction techniques as a precursor to classification

[mallat1989theory, chaplot2006classification, el2014computer]. In our research, we propose a framework using Deep Transfer Learning as a mechanism to solve the multi-class classification task of brain MRI images. Our research uses a Deep Residual CNN based on the ResNet50 architecture.

2.2 Contributions

Our main contribution in research is a novel end-to-end Deep Transfer Learning framework using a ResNet50 CNN architecture that performs multi-classification of brain MRI images. We aim to investigate the transferability of source domain invariant knowledge to the target domain to reduce the algorithmic training time in conjunction with improving the generalization performance of the trained model using minimal target domain dataset. Our second contribution is a publication of a novel brain MRI dataset of 130 patients consisting of 5285 images belonging to 37 categories in collaboration with the National Insitute of Neuroscience & Hospitals (NINS) in Bangladesh. In our study, we conducted experiments with our proposed NINS dataset and two public benchmark datasets: the Biomedical School of Engineering brain tumor dataset [Cheng2017] that consists of 3064 T1-weighted MRI images, and the Whole Brain Atlas that contains 1133 T2-weighted Harvard Medical School benchmark dataset [Summers288].

3 Preliminaries

3.1 Transfer Learning

3.1.1 Formal definition and notation

The framework for transfer learning comprise domain, task, and marginal probabilities [pan2009survey, PMID:27712798] defined as follows: A domain, , is a two-element tuple that consists of input space indicated by

, and a marginal probability,

, is a sample data point. Thus, we can represent the domain formally as

(1)

A Domain consist of two elements: where:

  • Input Space:

  • Marginal distribution:

    (2)

Hence represents a specific vector as represented in the above depiction. A task, , on the other hand, is a two-element tuple of the label space, , and target function, . From a probabilistic standpoint, the target function can be stated thus: as a conditional probability. Given a domain , the definition of a Task is thus stated by two elements:

(3)
  • A Label Space:

  • A predictive function , learned from the feature vectors/label pairs

  • For each feature vector in the domain, predicts its corresponding label:

The formulation of a Transfer Learning setting where is the source domain, is the source task, is the target domain, and is the target task is thus:

(4)
(5)
(6)
(7)

Considering a classification task , with as the input space and as the set of labels. Given two sets of samples drawn from both the source and target domains:

(8)
(9)

The goal of Transfer Learning is to build a functional mapping with a low target classification error on the target domain assuming the and are i.i.d.
probability of miss-classification . Therefore, Transfer Learning as follows is defined as: given a source domain with a source task ,and a target domain with its target task

, the objective of Transfer Learning is to allow the transferability of target conditional probability distribution

in using latent domain invariant knowledge from and given or . Transfer Learning is well suited to cases where the number of target labels is disproportionately smaller than the number of source labels [tan_sun].

Fig. 1: High-level formal representation of Transfer Learning

4 Proposed Method

4.1 System Architecture

In this paper, we used ImageNet

[deng2009imagenet] as the source domain, a dataset with over one million images in one thousand categories of concepts from which the model learned and transferred invariant factors of variation to the target domain Brain MRI. Fig. 2 shows the architectural framework for this Representational Transfer Learning task. Transfer learning is a well-suited framework for healthcare computer vision tasks where target domain datasets for learning are significantly small, and model generalisation is a key consideration [pan2009survey].

Fig. 2: Proposed System Architecture

4.2 System Model and Assumptions

We have shown a high-level overview of our proposed Transfer Learning framework in Fig. 3

. The initial stage entails the setup of the dataset which includes loading both the brain MRI image data and associated class labels followed by batch normalization and cross-validation split into train and validation sets. We have used various data augmentation approaches to overcome overfitting by creating virtual copies of brain MRI images. This includes methods such as zooming, flipping, rotation, mirroring etc. to up-sample the datasets. Afterwards, we used ResNet50 CNN architecture, Optimal Learning Rate Finder, Gradient Descent with Restart (SGDR), and Adaptive Moment Estimation (ADAM). We trained the model through the three stages presented in Fig.

4 accordingly.

Fig. 3: A flow diagram of our proposed transfer learning framework

4.2.1 Mechanism of Deep Transfer Learning Model

This work proposes a Transfer Learning approach to the fine-grain brain MRI classification problem using state-of-the-art pre-trained ResNet50 as the CNN architecture due to its robustness to the exploding gradient problem

[ResNet]. We have used a pre-trained network on the ImagetNet, a dataset that comprises over a million images in a thousand classes of concepts. ResNet architecture has applications in a variety of Computer Vision tasks due to its faster convergence speed relative to Inception and Visual Geometric Group, AlexNet. The architecture is very robust, simple, and useful.

4.2.2 Proposed stages of Deep Transfer Learning

The framework of Deep Transfer Learning overcomes the problem of lack of labelled data, for example, in medical imaging applications by utilizing domain-invariant tacit knowledge to solve related problems. This approach towards learning can lead to better generalization of models with minimal target domain data. Fig.4 illustrates the three stages of our proposed transfer learning framework.

Fig. 4: Deep Transfer Learning Stages 1: Retraining ResNet50; 2: Retraining the frozen layers of the network with augmented data; 3:Unfreezing and retraining the network with augmented data

4.3 Dataset

In this paper, we have published a dataset consisting of 5285 T1-weighted MRI images belonging to 37 categories in collaboration with the National Insitute of Neuroscience & Hospitals (NINS) of Bangladesh. Besides that, we have used two other public datasets, the School of Biomedical Engineering brain MRI dataset that comprises 3064 T1-weighted contrast-enhanced images of 233 study patients indicated in TableII [Cheng2017]. The second dataset is the Harvard Medical School Whole Brain Atlas brain MRI benchmark [Summers288] that consists of 1133 T2-weighted brain MRI images of 38 patients shown in Table III. The size of all images is 256x256 pixels in the axial plane.

Fig. 5: Sample Harvard Whole Brain MRI Dataset

The School of Biomedical Engineering, Southern Medical University brain MRI dataset comprises of three classes of brain tumors, Meningioma, Glioma, and Pituitary tumor, as depicted in Table II [Cheng2017]. The Harvard Medical School dataset entails five classes, Cerebrovascular class (stroke), Neoplastic, Degenerative, and Inflammatory disease, as shown in Table III. We have used a 5-fold cross-validation technique to perform the network re-training, which ensures that we overcome the size limit of our dataset without over-fitting.

Category Total Patients Total Slices
Brain Atrophy 8 264
Brain Infection 2 38
Brain Infection with abscess 2 76
Brain Tumor 3 76
Brain Tumor (Ependymoma) 1 36
Brain Tumor (Hemangioblastoma
Pleomorphic xanthroastrocytoma metastasis)
2 74
Brain tumor (Astrocytoma Ganglioglioma) 1 38
Brain tumor (Dermoid cyst craniopharyngioma) 1 38
Brain tumor - Recurrenceremnant of previous lesion 3 114
Brain tumor operated with ventricular hemorrhage 1 76
Cerebral Hemorrhage 1 36
Cerebral venous sinus thrombosis 1 76
Cerebral abscess 1 36
demyelinating lesions 1 38
Encephalomalacia with gliotic change 2 76
focal pachymeningitis 1 36
Glioma 2 76
Hemorrhagic collection 1 38
Ischemic change demyelinating plaque 1 38
Left Retro-orbital Haemangioma 2 112
Leukoencephalopathy with subcortical cysts 1 38
Malformation (Chiari I) 1 38
Microvascular ischemic change 2 72
Mid triventricular hydrocephalus 1 38
NMOSD ADEM 1 36
Normal 44 1749
Obstructive Hydrocephalus 2 76
Post-operative Status with Small Hemorrhage 1 38
Postoperative encephalomalacia 1 38
Small Vessel Diease Demyelination 1 38
Stroke (Demyelination) 1 38
Stroke (Haemorrhage) 13 564
Stroke(infarct) 19 906
White Matter Disease 1 36
meningioma 2 76
pituitary tumor 1 76
small meningioma 1 36
TABLE I: National Institute of Neurscience & Hospitals brain MRI dataset that contains 5285 T1-weighted contrast-enhanced images
Category Total Patients Total Slices
Meningioma - 708
Glioma - 1426
Pituitary Tumor - 930
TABLE II: Southern Medical University, School of Biomedical Engineering brain MRI dataset that contains 3064 T1-weighted contrast-enhanced images
Category Total Patients Total Slices
Normal 2 65
Degenerative Disease 8 223
Neoplastic Disease 8 277
Inflammatory Infectious Disease 5 189
Cerebrovascular Disease 15 376
TABLE III: Harvard Whole Brain Atlas dataset contains 1133 T2-weighted contrast-enhanced images

4.4 Essential deep learning techniques

In this research, we have applied state-of-the-art techniques in Deep Learning, like, data augmentation, Adam optimizer, Optimal Learning Rate Finder algorithm, and neural network hyper-parameter fine-tuning for the proposed framework.

4.4.1 Data augmentation:

To avoid overfitting the model, which happens because of small training set size, we have used data augmentation techniques such as flipping (vertical and horizontal), rotation, mirroring, and zooming to create virtual copies of the brain MRI images to improve model generalization performance.

4.4.2 Optimal Learning Rate Finder (OLRF

Hyper-parameter tuning is vital towards model performance improvement, but this process is often cumbersome [smith2017cyclical]. Therefore, we used the Optimal Learning Rate Finder algorithm to identify a stable set of learning rates to boost model generalization performance. The learning rate indicates the step size used to update the model weights during training. This affects the rate of convergence; if it is too small, convergence to the optimum on the error surface takes much time, and tiny updates on the model weights are performed. Conversely, when the learning rate is too high, the optimization algorithm shoots over the minimum, leading to divergence, thereby affecting model performance. The strategy used to set the step size has a critical role in out of sample generalization performance.

4.4.3 Stochastic Gradient Descent with Restarts (SGDR)

SGDR is a form of learning rate annealing which uses cosine annealing to incrementally reduce the step size during the process of training a neural network. This approach tends to make minimal changes when the optimizer tends towards desired weight parameter updates of the model [huang2017snapshot].

4.4.4 Classification Cost function: Softmax Classifier(Multinomial Logistic Cost)

The goal in a classification task is to maximize log-likelihood or associate higher probability mass for correct class and lower probability mass for incorrect classes. Therefore we define the loss and cost functions are defined as thus: S = f(x_i,W) scores is the unnormalized log
probabilities of the classes.
P(Y=k—X=x_i) = eskjesj
L_i = -logP(Y=y_i—X=x_i)
Loss Function L_i = -log(esyijesj)
Cost Function  L(W) = 1N∑_i=1^N L_i(f(x_i,W),y_i)  {(x_i,y_i)}_i=1^N

4.5 Performance Evaluation

This subsection presents the metrics we have applied to evaluate the performance of the proposed learning framework. We have used the standard classification performance metric to evaluate the model.

4.5.1 Performance metrics

(10)
(11)
(12)
(13)

where,

False Negatives (B) are class labels that are predicted negative which in the ground truth are positive. The is also known as a type two error.

False Positives () are class examples that are predicted to be positive which in the ground truth are negative. The is also known as a type one error.

True Positive (A) are class labels that are predicted to be positive which in the ground truth are positive.

True Negative () are class labels that are predicted to be negative which in the ground truth are negative.

4.6 Simulation Environment

In this research, we carried out all experiments in Python using fast.ai [howard2018fastai]

, a wrapper framework built on top of PyTorch library. PyTorch is used for Graphics Processing Unit-based accelerated computing

[ketkar2017introduction]. We carried out all experiments on a Linux Server (Ubuntu 10.04.4 LTS) Tesla V100. To evaluate the learning performance, we applied five-fold cross-validation using 80% - 20% dataset split.

5 Experimental Results

Experimental results of our work throughout the three progressive stages of transfer learning are presented. We present the model performance for the three stages of learning using ResNet50, ResNet34, AlexNet, and VGG19 deep learning CNN architectures. Training the neural network comprises three consecutive steps:

  1. Training the network with base layers frozen

  2. Retraining the network with augmented data

  3. Unfreezing the Network and fine-tuning with the augmented data

5.1 Experiment I: National Institute of Neuroscience and Hospitals dataset

5.1.1 Stage I: Retraining the Network

In the first stage, we re-trained the model as a benchmark by freezing the convolution base layers of the network and did not update its weights during this stage. The weight updates only occurred in the fully-connected layers of the network. We utilized the multinomial logistic cost function to measure the loss and a step size of

. We set the number of epochs to 8 to ensure the model does not overfit on the training set because small, and highly imbalanced dataset tends towards model overfitting, which affects out-of-sample performance. On completing the first stage of the training, the model showed an overall accuracy of 87.43% using a fivefold cross-validation strategy. Fig

6 shows the top miss-classified images during this phase of training.

Fig. 6: Illustration of top miss-classified images after stage I of training

We applied OLRF combined with SGDR algorithm to learn a step size in the weight space. This was achieved by boosting the learning rate with respect to the validation loss to discover an optimal learning rate to train the model. SDGR uses cosine annealing to reset the learning rate to traverse regions of the error surface to find the optimal minimum.

5.1.2 Stage II: Retraining the Network with Augmented Data

In this second stage, we retrained the model with the augmented data and set the learning rate hyper-parameter by utilizing OLRF and SGDR algorithms between to . After this stage of retraining, we did not notice a decrease in the train and validation losses as well as the accuracy as shown in Fig 10.

Fig. 7: Illustration of top miss-classified images after stage II of training with augmented training data

5.1.3 Stage III: Unfreezing the Network and Fine-tuning it with the Augmented Data

At this third and final stage, we unfroze all the base convolution layers, fine-tuned the network, and jointly trained them using the augmented data. This is Stage-3 in Fig 9 graph of the error rate across the model architectures. We trained the network for four epochs at this stage to adjust the trained weights to preserve the learned representation. To attain this goal, we set the step size of the last layers higher than the preceding ones during this fine-tuning process. We set the step size in the range of to across the network. After fine-tuning, we achieved a validation accuracy of 84.40% due to the imbalanced nature of the dataset and a degree of noise in it as well.

Fig. 8: Illustration of top miss-classified images after stage III of fine-tuning

We have plotted the error rates of the models, as shown in Fig.9. ResNet50 converges faster during the last four epochs of the third training stage compared to VGG, Alexnet, and ResNet34. This convergence property of ResNet50 is demonstrated through the nearly smooth learning curve, which indicates the model was able to find a stable set of weights though the data is highly noisy, and classes are imbalanced. The VGG model is second to ResNet50 regarding convergence to the local minimum during the training phases consistently across the three stages of training, which indicates superior performance to Alexnet and ResNet34.

Fig. 9: Error Rates across the four model architectures

In Table IV, we have presented a comparison of metrics across the three stages of Transfer Learning in this experiment. The graph in Fig10 gives information on the training and validation accuracy for all training stages for the four CNN model architectures. We observed the network under-fitted since the validation error was lower than the training error. This was solved by extending the number of epochs. At stage two, we used techniques such as vertical flipping, max zooming, and max lighting to augment the MRI images.

Fig. 10: Training vs. Validation Loss for all stages with ResNet50

We assess the classification performance on the validation data for the three stages. During the five-fold cross-validation training, we obtained the confusion matrices, which are shown in Fig.12 through Fig. 14. The matrices give a summary of the classification performance of the model, where the leading diagonals indicate correctly predicted classes and while miss-classified samples are outside the diagonals. At the end of stage I, the model reached an accuracy of 87.43% as shown in Fig.11.

Fig. 11: Accuracy across four different model architectures

However, the average five-fold cross-validation accuracy value did not improve at the end of stage 2, which is mostly due to class-imbalance problem with most classes in the dataset having a smaller number of examples.

Stage Train Loss Validation Loss Error Rate Accuracy Precision Recall F1-Score
I 0.169824 0.401473 0.125662 0.874338 0.874338 0.874338 0.874338
II 0.565008 0.541435 0.189251 0.810749 0.810749 0.810749 0.810749
III 0.434329 0.460459 0.155942 0.844058 0.844058 0.844058 0.844058
TABLE IV: Comparison of metrics across the 3 stages of training with NINS dataset
Fig. 12:

Stage I confusion matrix after training the network with base layers frozen

Fig. 13: Stage II confusion matrix after retraining the network with augmented data
Fig. 14: Stage III confusion matrix after unfreezing the Network and fine-tuning with the augmented data

5.2 Experiment II: Harvard Whole Brain Atlas Dataset

5.2.1 Stage I: Retraining the Network

Following the same methodology used in experiment 5.1 stage 5.1.1

, we froze the convolution layers of the network and did not update it during this stage. The learning rate hyperparameter was assigned

and the number of epochs was set to 6. The epoch set to 6 to avoid overfitting the model on the training set. At this first stage, the training model achieved an overall accuracy of 92.47% in a fivefold cross-validation strategy. Fig 15 shows the top miss-classified images during this phase of training.

Fig. 15: Illustration of top miss-classified images after stage I of training

5.2.2 Stage II: Retraining the Network with Augmented Data

Having completed stage I, we used the OLRF algorithm combined with SDGR to find a stable step size in the weight space. SDGR uses cosine annealing to reset the learning rate to traverse regions of the error surface to find the minimum. We retrained the model with the augmented data and a learning rate set between the intervals to . After this stage of retraining, we saw a decrease in the train and validation losses while the accuracy, however, did not increase.

Fig. 16: Illustration of top miss-classified images after stage II of training with augmented training data

5.2.3 Stage III: Unfreezing the Network and Fine-tuning it with the Augmented Data

Similar to the experiment 5.1.3, we unfroze all the convolution base layers of the network, fine-tuned it, and trained them all using the augmented data. This is the Stage-3 in Fig.18 graph of the error rate across the four model architectures. We have set the step size in the range of to across the network. After fine-tuning, we re-trained the model and achieved a validation accuracy of 93.80%.

Fig. 17: Illustration of top miss-classified images after stage III of fine-tuning

We have plotted the error rates of the models as shown in Fig.18. ResNet50 converges faster during the last two epochs of the third training stage compared to VGG, Alexnet, and ResNet34. This convergence property of ResNet50 is demonstrated through the nearly smooth learning curve, which indicates the model was able to find a stable set of weights. The VGG model is second to ResNet50 regarding convergence to the local minimum during the training phases consistently across the three stages of training, which indicates superior performance to Alexnet and ResNet34.

Fig. 18: Error Rates across the four model architectures

In Table. V, we have presented the train and validation accuracy graph for the experiment. The graph in Fig.19 gives information on the training and validation accuracy for all three stages of training. We observed the network under-fitted since the validation error was lower than the training error. This was solved by extending the number of epochs. At stage two, we used techniques such as vertical flipping, max zooming, and max lighting to augment the MRI images.

Fig. 19: Training vs. Validation Loss for all stages with ResNet50

We assess the classification performance on the validation data for the three stages. During the five-fold cross-validation training, we obtained the confusion matrices, which are presented in Fig.21 through Fig. 23. The matrices give a summary of the classification performance of the model were the leading diagonals indicate correctly predicted classes and while miss-classified samples are outside the diagonals. At the end of stage I, the model reached an accuracy of 92.47% with 17 incorrectly classified images, as shown in Fig.21. This model correctly classified all normal cases in stage II, as shown in Fig. 19.

Fig. 20: Accuracy across four different model architectures

However, the average five-fold cross-validation accuracy value improved from 92.24% to 94.42% at the end of stage 2. Finally, Fig.23 depicts stage III results. 14 images were incorrectly classified, and the model reached 93.80% average five-fold cross-validation accuracy. The whole training procedure takes 310 seconds.

Stage Train Loss Validation Loss Error Rate Accuracy Precision Recall F1-Score
I 0.225288 0.246468 0.075221 0.924779 0.924779 0.924779 0.924779
II 0.399370 0.189434 0.057522 0.942478 0.942478 0.942478 0.942478
III 0.318317 0.184569 0.061947 0.938053 0.938053 0.938053 0.938053
TABLE V: Comparison of metrics across the 3 stages of training with Harvard dataset
Fig. 21: Stage I confusion matrix after training the network with base layers frozen
Fig. 22: Stage II confusion matrix after retraining the network with augmented data
Fig. 23: Stage III confusion matrix after unfreezing the Network and fine-tuning with the augmented data

5.3 Experiment III: Biomedical School Brain MRI dataset

5.3.1 Stage I: Retraining the Network

In this third and final experiment, we used the brain MRI dataset from the School of Biomedical Engineering, which consists of T-1 weighted images. We re-trained the model as a benchmark. We froze the convolution layers of the network and did not update it during this stage. We only trained the weights of the fully-connected layers of the model. We used the multinomial logistic cost function to measure the loss and error rate with a step size of and trained the model for 4 epochs. We set the number of epochs to 4 to ensure the model does not overfit on the training set. The model memorises the given small dataset which affects out-of-sample performance. On completing the first stage of the training, the model showed an overall accuracy of 96.73% using a fivefold cross-validation strategy. Fig 24 shows the top miss-classified images during this phase of training.

Fig. 24: Illustration of top miss-classified images after stage I of training

5.3.2 Stage II: Retraining the Network with Augmented Data

Similar to the preceding experiments, we utilized the OLRF with SGDR algorithms to learn an optimal step size in the weight space. This is achieved by boosting the learning rate with respect to the validation loss to discover an optimal learning rate to train the model. We used an initial learning rate set between the intervals to in this stage of the experiment. SDGR uses cosine annealing to reset the learning rate to traverse regions of the error surface to find the minimum. After this stage of retraining, we saw a decrease in the train and validation losses, as shown in Fig.28. We used techniques such as vertical flipping, max zooming, and max lighting to augment the MRI images. The performance metrics are presented in Table. VI.

Fig. 25: Illustration of top miss-classified images after stage II of training with augmented training data

5.3.3 Stage III: Unfreezing the Network a and fine-tuning it with the Augmented Data

Similar to 5.1.3 and 5.2.3, we unfroze all the convolution layers of the network, fine-tuned the it, and jointly trained them using the augmented data. This is the Stage-3 in Fig.27 graph of the error rate across the model architectures. At this stage, we trained the fully connected layers for 4 epochs, and we want the trained weights to adjust in line with training steps. To attain this goal, we set the step size of the last layers higher than the preceding layers during this process of fine-tuning. For this reason, we varied the learning rate across the layers of the network. At this stage, we set the learning rate to through across the network. After fine-tuning, we re-trained the model and achieved a validation accuracy of 97.05%.

Fig. 26: Illustration of top miss-classified images after stage III of fine-tuning

We have plotted the error rates of the models, as presented in Fig.27. ResNet50 converges faster during the last two epochs of the third training stage compared to VGG, Alexnet, and ResNet34. This convergence property of ResNet50 is demonstrated through the nearly smooth learning curve, which indicates the model was able to find a stable set of weights. The VGG model is second to ResNet50 regarding convergence to the local minimum during the training phases consistently across the three stages of training, which indicates superior performance to Alexnet and ResNet34.

Fig. 27: Error Rates across four different model architectures

In Table. VI, we have presented the training and validation accuracy values for the experiment. The graph in Fig.28 gives information on the training and validation accuracy for all stages of training. We observed the network under-fitted since the validation error was lower than the training error. This was solved by extending the number of epochs. At stage two, we used techniques such as vertical flipping, max zooming, and max lighting to augment the MRI images.

Fig. 28: Training vs. Validation Loss for all stages with ResNet50
Stage Train Loss Validation Loss Error Rate Accuracy Precision Recall F1-Score
I 0.136737 0.121995 0.032680 0.967320 0.967320 0.967320 0.967320
II 0.114440 0.104330 0.035948 0.964052 0.964052 0.964052 0.964052
III 0.084613 0.067653 0.029412 0.970588 0.970588 0.970588 0.970588
TABLE VI: Comparison of metrics across the 3 stages of training with School of Biomedical Engineering dataset

We assess the classification performance on the validation data for the three stages. During the five-fold cross-validation training, we obtained the confusion matrices, which are shown in Fig.30 through Fig. 32. The matrices give a summary of the classification performance of the model were the leading diagonals indicate correctly predicted classes and while miss-classified samples are outside the diagonals. At the end of stage I, the model reached an accuracy of 96.73% with 20 incorrectly classified images across the three classes, as shown in Fig.30. The model in stage II, as shown in Fig. 19 had a steady number of incorrectly classified images.

Fig. 29: Accuracy across four different model architectures

However, the average five-fold cross-validation accuracy value improved from 96.73% to 97.05% at the end of stage III. Finally, Fig.32 depicts stage III results. 17 images were incorrectly classified and the model reached 97.05% average five-fold cross-validation accuracy. The whole training procedure takes 430 seconds.

Fig. 30: Stage I confusion matrix after training the network with base layers frozen
Fig. 31: Stage II confusion matrix after retraining the net-work with augmented data
Fig. 32: Stage III confusion matrix after unfreezing the Network and fine-tuning with the augmented data

6 Discussion and Conclusion

6.1 Discussion

Many researchers have proposed methods and techniques in Machine Learning using standard publicly available datasets for brain MRI classification tasks. These methods vary in their approaches from feature engineering, dimensionality reduction to deep learning-based. Table VII presents a comparison of research on the classification of MRI brain images. The table presents our proposed dataset Study I and three publicly available datasets: Study II (Harvard Whole Brain Atlas) [Summers288], Study III (The Cancer Imaging Archive) [clark2013cancer], Study IV (School of Biomedical Engineering, Guangzhou University) [cheng2015correction]. In their paper, Chaplot et al. [chaplot2006classification] proposed a 3-level Wavelet Transform method for feature extraction and a Support Vector Machine binary brain MRI classifier using 52 MRI datapoints obtaining an accuracy of 98%. Gudigar et al. [gudigar2019application] in a comparative study assessed three multi-resolution analysis methods - Discrete Wavelet, Curvelet, and Shearlet Transform, and Particle Swarm Optimization for textual feature extraction from transformed images using SVM classifier reported accuracy of 97.38% with a dataset of 612 brain MRI scans based on Shearlet Transform. In their 2015 paper, Zhang et al. [zhang2015preclinical], proposed a novel Computer-Aided Diagnostic system that comprises Discrete Wavelet Packet Transform (DWPT), which serves as a Wavelet Packet Coefficient Extractor, Shannon and Tsallis entropies which obtained entropy features from DWPT. Finally, they used a Generalized Eigenvalue Proximate Support Vector Machine with Radial Basis Function for brain MRI binary classification achieved 99.33% accuracy using a dataset of 255 brain MRI images. Saritha et al. [saritha2013classification] have used Wavelet Entropy-based Spider Web Plots’ for dimensionality reduction combined with a Probabilistic Neural Network (PNN) for the classification reported accuracy of 100%. El-Dahshan et al. [el2014computer] in their work presented a novel technique consisting of Discrete Wavelet Transform, PCA to reduce the feature vectors dimensions, and a Feed-Forward Artificial Neural Network to classify 101 brain MRI images achieved 98.6% accuracy. Wang et al. [wang2015feed] used a pipeline of Stationary Wavelet Transform for feature extraction, Principal Component Analysis to reduce the feature space combined with Particle Swarm Optimization and Artificial Vee Colony to classify MRI brain images reported accuracy of 99.45%. Nayak et al.[nayak2016brain] proposed a Computer-Aided Diagnosis framework the employed contrast limited adaptive histogram equalisation as a mechanism to enhance tumor regions in brain MRI images, 2-D Stationary Wavelet Transform to extract features, AdaBoost with a Support Vector Machine algorithms for normal and abnormal brain MRI classification achieving 99.45% accuracy with 255 MRI images. In this paper, we have found in the existing literature that feature engineering, dimensionality reduction, and other hybrid approaches have been predominantly used to solve the brain MRI image classification problem. Various dimensionality reduction methods like Principal Component Analysis are employed to reduce the feature space of training datasets. Most recently, Talo et al. [Talo2019] used Transfer Learning to perform binary classification of the brain, but our study proposes an expressive, fine-grain classifier that was trained on three distinct datasets. We have obtained an accuracy of 84.40% on the National Institute of Neuroscience and Hospitals dataset which comprises 5,285 T1-weighted, 93.80% on the Harvard Whole Brain Atlas comprising 1,133 T2-weighted images and 97.05% on the Biomedical School of Engineering which comprise 3,064 T1-weighted brain images using 5-fold cross-validation with ResNet50 CNN model.

Study Accuracy Study I Accuracy Study II Accuracy Study III Accuracy Study IV Task Method
Cheng et al [cheng2015correction] - - - 91.28% Mutli

SVM and KNN

Paul et al[paul2017deep] - - - 91.43% Multi CNN
Afshar et al [afshar2019capsule] - - - 90.89% Multi CNN
Anaraki et al[anaraki2019magnetic] - - 90.9% 94.2% Multi GA-CNN
Zacharaki et al[zacharaki2009classification] - - 85% - Multi SVM and KNN
Zacharaki et al[zacharaki2009classification] - - 88% - Binary SVM and KNN
El-Dahshan et al[el2014computer] - - 98% - Binary ANN and KNN
Ertosum et al[ertosun2015automated] - - 71% - Multi CNN
Ertosum et al[ertosun2015automated] - - 96% - Binary CNN
Sultan et al[sultan2019multi] - - 98.7% 96.13% Multi CNN
Gudigar et al.[gudigar2019application] - 97.38% - - Binary PSO and SVM
Zhang et al.[zhang2015preclinical] - 99.33% - - Binary GEPSVM
Talo et al.[Talo2019] - 100% - - Binary Deep Transfer Learning
Proposed Study 85.23% 93.80% - 97.05% Multi Deep Transfer Learning
TABLE VII: A comparative survey of brain MRI image classification methodologies

In Fig. 33, we have presented a proposed system deployment diagram depicting the infrastructural setup and configuration of our proposed framework.

Fig. 33: Proposed system deployment diagram

6.2 Conclusion and Future Direction

identified the novelty of this research work that uses Deep Transfer Learning for multi-class classification of brain MRI images. Deep Transfer Learning demonstrates the potential of quickly adapting a model to solving a problem rather than building one from scratch and has shown to be suitable in terms of evaluation performance. Transfer Learning, as an active area of Machine Learning research, highlights the potential of accelerated adaptation of models across varied problem domains where underlying knowledge is invariant and transferable. We conducted three case studies using our novel dataset and two other publicly available datasets with separate contrast-enhancement techniques. Our research utilized data augmentation to address the limitations of the small dataset set size for the multi-class classification problem. We have shown the potential impact of Deep Transfer Learning as a potential research direction in solving the brain MRI multi-class classification task.

References