Within the UK over 11,000 brain tumor cases are diagnosed each year [cancer_research_uk_brain_2015]. The survival rate and period of progression have been shown to improve with treatment [zacharaki_classification_2009]. The process to determine treatment can be difficult and time consuming [yang_discrete_2015], this is further complicated due to limited numbers of staff available to perform these tasks [grant_overview:_2004]. These tasks also lead to the most errors in diagnosis [grant_overview:_2004]. The automation of the tasks involved in diagnosis could help to increase the accuracy and reduce the time it takes for the application of treatment to a patient. Magnetic Resonance Spectroscopy (MRS), also known as MR Spectroscopic Imaging (MRSI) or Chemical Shift Imaging (CSI), provides a non-invasive method for the diagnosis of human tissue such as lung, bone or brain matter. Similar to MR Imaging (MRI), MRS is based upon the principles of Nuclear Magnetic Resonance (NMR) [andrew_nuclear_1992]; however, whereas MRI uses the resultant proton signals to create detailed graphical output, MRS uses the signals to determine the quantity in parts per million (ppm) of various metabolites within cells [gujar_magnetic_2005] which can be seen in the example shown in Figure 1. One area in which MRS has been shown to provide valuable insights is in the case of brain tumors [howe_1h_2003]. In medical imaging research, MRS has proven to produce accurate results in identification of tumor grade classification [sibtain_clinical_2007, howe_1h_2003].
Current state-of-the-art computer vision models in medical imaging have begun to utilize the advancements made in machine learning, specifically in the field of deep learning[litjens_survey_2017]. These deep learning models take advantage of multi-layered networks to extract feature information from the input data. To power the ability for these deep models to extract such feature-rich information, large swaths of data are required [lecun_deep_2015]. This can be an obstacle in the medical imaging field as restrictions in data protection [cios_uniqueness_2002] along with non-standardized practices makes it difficult to collect the required amount of data to work with deep learning models. To be able to apply current state-of-the-art classifiers to MRS brain tumor images more data is required, hence techniques to fabricate or generate data are essential. Advancements in computer vision and machine learning have also given the rise to accurate generative techniques for the creation of artificial data. One such generative technique is Generative Adversarial Networks (GANs) [goodfellow_generative_2014] which have been developed to help create more domain specific/accurate artificial data. GANs use multiple models working against each other to create accurate data, with one model (Generator) attempting to produce artificial data capable of “fooling” a model trained to determine “real” from “fake” data (Discriminator). By using approaches such as GANs the process of creating data that adheres to the domain can be achieved. Recently, variations of the original GAN model have been developed to produce better results. The Deeply Convolutional GAN (DCGAN) [radford_unsupervised_2015], which takes into account the improvements that deep learning models have shown against their traditional counterparts uses a multi-convolutional layered Generator and Discriminator. Although GANs have been shown to produce accurate results in various fields [radford_unsupervised_2015, salimans_improved_2016] they are known to be unstable and hard to train [arjovsky_towards_2017] whereas their more linear counterparts are considered to be easier to train but less expressive. This paper applies three of the state-of-the-art methods in generating synthetic data (GAN, DCGAN and a modified MRS brain tumor classifier [olliverre_pairwise_2017]
) to the domain of MRS for review. To determine the accuracy of the artificial data created each generated dataset was used to train a Random Forest for the classification of brain tumors. The results from the trained Random Forest classifiers were bench-marked against the results of one trained on the real MRS images.
2 Materials & Methods
The MRS dataset used in this study was obtained by St. George’s University London and consisted of a single-voxel MRS training and testing set. Both the test and training sets were acquired using a GE Signa Horizon 1.5T MR system with a Repetition Time (TR) and a short Echo Time (TE) of 2000ms and 30ms respectively. A Point-Resolved spectroscopic sequence protocol was used to acquire the training and test dataset. The World Health Organization classifies tumours into 4 grades (WHO) [centre_international_de_recherche_sur_le_cancer_pathology_2004]: Grades I - IV, with GI and GII being deemed low grade tumors and GIII and GIV said to be high grade, malignant tumors. The composition of the training and test dataset was 137 samples. Of the training set 70 were classified as healthy tissue, 20 as GII (low grade), 10 as GIII and 20 GIV (high grade). The test set consisted of 9 healthy, 3 low grade and 5 high grade which were taken through random sampling of the entire dataset prior to training. Due to the similarity between GIII and GIV tumour tissue in MRS imaging, the GIII data was merged with the GIV data by labelling each as high grade. Figure 2 illustrates MRS spectra of different grades.
The positioning for every scan captured in the dataset was placed on a homogeneous, representative tumor region determined by an expert using post-Gd contrast T1w, T2w and FLAIR structural contrast images alongside the relevant histopathological information. This was to ensure accuracy within the training data and that there was a heterogeneity of MRS characteristics represented within each voxel scanned. The individual labels for the data were achieved via the diagnosis of a biopsy by a practiced physician in which the clinical, radiological and histopathological information of each patient was incorporated to the diagnosis. The nature of MRS data is high-dimensional (of roughly 1,024 dimensions) thus a reduction with Principal Component Analysis (PCA) was applied to explore the data, see Figure3
. Clustering with k-means results in a good separation of the classes, especially between healthy and high grade tissue (Figure2(b)).
This study took a three step approach. First, three different models (GAN, DCGAN and a generative adaptation of PMM) were used to generated synthetic MRS images based off of the training -MRS- dataset. Second, these samples were then each used to train a Random Forest to be able to classify MRS images as either healthy, low or high grade tissue. Finally, the results from the Random Forest classifiers were then analyzed and the synthetic data from the generative models were compared against the mean signal of the training dataset classes. The GAN model used in this study, based on [goodfellow_generative_2014]nair_rectified_2010]
as the activation function with the output layer having 1,024 nodes but utilizing TanH[agostinelli_learning_2014] as the activation function. The discriminator consisted of a linear input layer which took in an MRS image of 1,024 values, followed by a hidden layer consisting of 1,024 nodes using LeakyReLU [agostinelli_learning_2014] as the activation function, the output layer was a singular sigmoidal node [rumelhart_learning_1986]
activation. The full model was trained on each class of tissue for over 150,000 epochs using the Adam optimizer[kingma_adam:_2014]
. Compared to GANs, DCGANs use a deeper learning architecture which normally requires more data. To try and generate more expressive and domain accurate data, alterations were required to the training of the network to accommodate the limited data. The training process was modified to not take random batch samples but to deliberately take the full set of the data available for training at each epoch. The samples were then normalized with batch normalization. The architecture for the generator used in the study was a fully connected input layer comprised of 2048 nodes followed by four transposed one-dimensional convolutional layers using LeakyReLU as the activation function for all but the output layer which used TanH. The discriminator was the inverse with the first four layers being one-dimensional convolutions (using LeakyReLU as the activation function) followed by a single Sigmodial output node. Similar to the GAN model the DCGAN model was trained for at least 250,000 epochs on each tissue class. GANs are a non-linear method for the generation of data which, theoretically, should lead to more expressive data over that of more linear methods but to do so requires larger amounts of training data. To test the benefits, a comparison to a linear based generator in the study was required. For medical imaging data, factors such as patient orientation and relationship between values matters. Therefore, testing requires a method which can generate data that adheres to the domain. The selected method for this study was the Pairwise Mixture Model. The Pairwise Mixture Model (PMM)[olliverre_pairwise_2017] is a model for representing different brain tissue from MRS images and based on the work by asad_supervised_2016 [asad_supervised_2016], the purpose of which was to solve the problem of the heterogeneity of tissue types found within multi-voxel MRS images. The possible types were defined as normal, low (GI and GII type brain tumour tissue) and high (GIII and GIV brain tumour tissue). Each tissue model is expressed as a mean signal and the variation around the mean, calculated from applying PCA to a labelled dataset of homogeneous MRS images each relating to a specific grade. The models could then be as defined:
where is the mean signal, and
respectively are the alpha weight coefficients and eigenvectors - which encode the variation around the mean signal - with K representing the number of eigenvectors determined for model. To calculate the amount of each tissue type found within a certain voxel the assumption that each voxel was a weighted sum of the possible tissue types gives the following:
which can then be viewed as an optimization problem where:
|with representing the available surrounding model signals of .|
The estimated coefficients can be considered to represent the amount of each tissue type (normal, low or high) found within a voxel. By taking the models of the various tissue types from the PMM, it is straightforward to see how by varying the value of the coefficients of the models it is possible to create data that holds to the original domain. There is a limit to the amount of possibly created data (where represents the coefficients) but it is still enough to train a classifier model with a deep learning architecture, based on the generated data alone. The testing of the generated data from each network used a set of simple, shallow Random Forest classifiers which were constructed and trained on the generated data from the GAN, DCGAN and the PMM generation method. A Random Forest was also trained on the training dataset alone as a control set. The Random Forest classifiers were then tested on the MRS image test set with the classification accuracy results recorded and examined. Furthermore the generated MRS images were compared to the mean signal from the training dataset for comparison to determine the adherence to the domain and possibility of expressiveness.
Each model, the GAN, DCGAN and PMM generator, was trained on the MRS single-voxel training dataset for each class (normal, low and high). The GAN and DCGAN each had the architecture detailed in Section 2.2.
From the results shown in Table 1 the GAN produced higher accuracy in the Random Forest classifier trained on its data than with the DCGAN generated method for low grade tumor patients and for normal (healthy) tissue. The classifier trained on PMM data produced higher accuracy for low grade tumor patients over that of the ground truth. The DCGAN produced the worst results over all tissue types.
The most variation within the resultant signals can be seen from both GAN methods, however, the GAN appears to cohere closer to the shape of the real MRS signals as shown in Figure 4, this can be shown through the deltas between signals shown in Table 2.
This paper examines the possibility of generating MRS brain tumor images from limited and uneven data through one state-of-the-art generation technique, a deeper version of the same model and a modified MRS brain tumor model (GAN, DCGAN and PMM respectively). The results showed that the generated data could train a shallow Random Forest classifier to accurately determine the grades of brain tissue to the same level of one trained on real MRS data. The GAN trained model produced higher accuracy in testing and shows the expressiveness and capability in adversarial networks whereas the DCGAN model produced the lowest accuracy in classification as well as in similarity to the training data, highlighting the need for larger amounts of data when working with deep learning models. The linear model was able to produce spectra that were closer in appearance to the mean signal of the training dataset. The next stage is to examine the generated voxels by a domain expert to acknowledge their potential accuracy and expressiveness, the analysis drawn can then be used to determine what needs to be done in order to have the generated data successfully train a deep learning model for the classification of MRS images.
We would like to thank Professor Franklyn Howe at St George’s, University of London, for the brain tumour MR data used in this research as well as his insights into MRS. We are also grateful for the hardware provided to us by NVIDIA that was used in this research.