Multiple sclerosis (MS) is a disabling disease of the central nervous system that disrupts the flow of information within the brain and between the brain and body. It is characterized by the presence of lesions in the brain and spinal cord. Magnetic resonance imaging (MRI) has become one of the most important clinical tools to diagnose and monitor MS, since structural MRI depicts white matter (WM) lesions with high sensitivity (Rovira et al., 2015). The pattern and evolution of lesions has made MRI abnormalities invaluable criteria for the early diagnosis of MS. MRI allows high specificity and sensitivity visualization of the dissemination of WM lesions in time and space, which is a key factor in recent diagnostic criteria (Filippi et al., 2016)
. However, in both cross-sectional and longitudinal studies, manual or semiautomated segmentations have been used to compute the total number of lesions and the total lesion volume, which are challenging and time-consuming and prone to manual errors and inter- and intraobserver variability. This has lead to the development of different automated strategiesLladó et al. (2012).
Recently, deep neural networks have attracted substantial interest. Deep convolutional neural networks (CNN) have demonstrated groundbreaking performance in brain imaging, especially in tissue segmentation (Zhang et al., 2015; Moeskops et al., 2016) and brain tumor segmentation (Kamnitsas et al., 2017; Havaei et al., 2017)
. In contrast to previously supervised learning methods, CNNs do not require manual feature engineering or prior guidance. Furthermore, the increase in computing power makes them a very interesting alternative for automated lesion segmentation. CNN-based methods have achieved top ranking performance on all of the international MS lesion challenges(Styner et al., 2008; com, 2016; Carass et al., 2017; Hashemi et al., 2019).
Studying MS lesions using supervised machine learning algorithms on MRI images requires a large number of samples to be annotated by expert radiologists. However, obtaining the annotations of medical images is time consuming. Several attempts have been made to overcome this challenge by using data augmentation. One of the most common data augmentation methods is to modify the dataset of images using geometric transformation such as image translation, rotation, or flip (Krizhevsky et al., 2012). However, the generated samples may not represent image appearances in real data, or the generated samples may be very similar to the existing images in the training dataset due to the parameters and image operators used (Zhang et al., 2017). In contrast, we propose the generation of synthetic MS lesions on patient or healthy MRI images as the solution to the lack of expert annotations.
The synthesis of MRI images has attracted much interest in several areas of neuroimaging, including how to replace the missing MRI modalities with synthetic data (van Tulder and de Bruijne, 2015), to generate a subject-specific pathology-free image that is not present in the input modality (Bowles et al., 2016), to improve image segmentation and registration performance (Iglesias et al., 2013) and others. The current state of the art in brain MRI synthesis is the work of Chartsias et al. (2018). The authors proposed a deep fully convolutional neural network (FCNN) model for MRI synthesis, which takes different modalities as inputs and outputs synthetic images of the brain in one or more new modalities. This approach could be used for the synthesis of new lesions. However, there are some limitations that should be considered, such as the ability to control the intensity and the texture inside the lesions and the requirement of ground-truth masks for obtaining the lesion model.
In this paper, we propose a deep fully convolutional neural network model for MS lesion synthesis. The model takes as inputs T1-w and FLAIR images without MS lesions and outputs synthetic T1-w and FLAIR images with MS lesions. The MS lesion information is encoded as different binary masks passed to the model stacked with the input images. To overcome the limitations of the Chartsias et al. (2018) model, we divide the lesions into different regions based on voxel intensities, encoding this information as different binary masks. These binary masks are computed directly by thresholding the hyperintensities in the FLAIR image, so there is no need for the lesions’ ground truth. That means the proposed MS lesion synthesis model is trained end-to-end without the need of manual expert MS lesion annotations in the training sets. Therefore, to tackle the lack of available ground-truth data needed for supervised MS lesion detection and segmentation strategies, we use the generated synthetic MS lesion images as data augmentation to improve the lesion detection and segmentation performance. This is done by synthesizing the lesions in new brain images, coming from either healthy subjects or from patients with lesions. Our evaluation included a clinical dataset and public MS data from the International Symposium on Biomedical Imaging (ISBI) 2015 MS challenge (Carass et al., 2017). The accuracy of the generated synthetic images with MS lesions is evaluated qualitatively and quantitatively in terms of similarity performance and in terms of lesion detection and segmentation using a well-known state-of-the-art MS lesion segmentation method (Valverde et al., 2017). For the data augmentation evaluation, we analyzed the effect of adding synthetic images on the segmentation performance while training with a different number of training images. To simulate a situation with very limited training data, we also analyzed the effect of the synthetic data augmentation starting from the one-image training scenario.
2.1 MS lesion segmentation approach
The segmentation framework used for evaluating the proposed MS lesion generator is the state-of-the-art CNN model proposed by Valverde et al. (2017). Within this MS lesion segmentation framework, a cascade of two identical CNNs is optimized, where the first network is trained to be more sensitive to revealing possible candidate lesion voxels, while the second network is trained to reduce the number of false positive outcomes. For a complete description of the details and motivations for the proposed architecture, please refer to the original publication.
2.2 Synthetic MS lesion generation pipeline
To learn a model for the generation of synthetic MS lesions, images without lesions (used as inputs to the model) and the correspondent images with lesions (used as outputs to the model) are required. This kind of image set is not easy to obtain. One way to solve this would be using a longitudinal MS dataset; however, MS lesions in the baseline images and new MS lesions on the follow-up images should be annotated. Moreover, the baseline and follow-up images should also be registered. In that way, the model would be trained to generate new lesions in the follow-up scans. Nevertheless, in this scenario, new lesions on the follow-up images may not be sufficient to train the model since the volume of most of the new lesions can be relatively low (Salem et al., 2018). Therefore, to overcome the lack of available ground-truth, we use the MS lesion generation pipeline shown in Figure 1 which consists of three main stages. First, the creation of an approximate white matter hyperintensity (WMH) mask and several intensity level masks to encode the intensity profile of the WMH voxels (Section 2.2.1). Second, the filling of this WMH mask in the MR images with intensities resembling WM (Section 2.2.2). Finally, the generation of MS lesions using the MS lesion generator network on the filled images (Section 2.2.3). Notice that the proposed MS generator was trained using only a cross-sectional MS dataset. These filled images were considered as images without lesions (used as inputs to the model), while the original images contained MS lesions (used as outputs to the model during the training process). The following subsections explain the full pipeline in more detail.
2.2.1 WMH mask and intensity level masks
Creating the WMH mask and the intensity level masks is an important step in the proposed MS lesion generator pipeline. The aim is that training the model with intensity level masks instead of MS lesion masks avoids the limitation of having ground-truth. First, the FLAIR image is thresholded to obtain an approximate WMH mask. This mask is used to fill the WMH regions with intensities similar to the ones of the surrounding WM voxels. To learn the model for the generation of WMH voxels and their intensity profile, the range of intensities starting from the initial threshold is divided into different small ranges by increasing the intensity threshold at different steps. These created masks are considered as intensity level masks, which are then used to encode the intensity profile of the WMH voxels. The intensity level masks are stacked with the filled MR images when training the MS generator model. Therefore, the model can be trained with any dataset without requiring manual expert annotations. The approximate WMH mask is computed by FLAIR thresholding. The threshold and intensity level mask are computed as follows:
where and are the intensity’s distribution parameters of gray matter (GM) tissue on the FLAIR image (Cabezas et al., 2014). A small value of must be chosen to obtain an approximate WMH mask so that all the WMH voxels are included in this mask. Different intensity level masks are obtained by increasing the value. The higher the value of , the more brighter WMH voxels are included in the mask.
In this study, the approximate WMH mask was obtained with = 0.5. This value was found empirically to ensure that all the WMH voxels were included in the WMH mask. Eight intensity level masks with 0.5, 0.8, 1.1, 1.4, 1.7, 2.1, 2.4, and 2.7 were used to encode the WMH intensity profile. This was a trade-off between the memory required and the minimum number of training samples inside each intensity level mask while training the model. Figure 2 describes the creation of the eight intensity level masks (IL, IL, …, and IL). The WMH mask is used to fill the WMHs in the original image, and the intensity level masks are used to encode the intensity profile in the obtained WMH mask.
2.2.2 WMH filling
After creating the intensity level masks described in the previous section, the WMH mask regions are filled in the input modalities. Similar to the work of Battaglini et al.
, a local filling method is used here to fill the WMH area with the surrounding WM voxels in all input modalities. First, for each slice in the MR image, the WMHs are split into individual connected regions. Second, each connected region is dilated twice. Each connected region is filled using values normally sampled using the mean and standard deviation of the WM voxels that were laid in the first dilated area. Furthermore, the filled area with its surrounding voxels (voxels in the filled connected region and the two dilated areas) is smoothed using a local Gaussian filter.
2.2.3 MS lesion generation model
Figure 3 shows our MS lesion generator architecture, which is inspired by the work of Chartsias et al. (2018). As shown in Figure 3(a), it is a two-inputs-two-outputs model based on two encoders and two decoders (T1-w Encoder, FLAIR Encoder, T1-w Decoder, and FLAIR Decoder). The encoders are used to learn the latent representation for the input modalities, while the decoders are also used to generate the output modalities. Each decoder is used three times (i.e., shared decoder): one to decode each of the two individual latent representations (T1-w latent representation and FLAIR latent representation) and one to decode the fused latent representation. The fused latent representation is computed as the max function of the two individual latent representations. At testing time, we used the synthesis result from the fused latent representation as our output. The model has two 2D input patches with nine channels each (one input patch for each input modality). The eight intensity level masks computed as explained in Section 2.2.1 are stacked with each of the filled input modalities. The first channel is the filled image modality and the other eight channels are the intensity level masks.
Encoder architecture: One independent encoder is built for each input modality following the architecture shown in Figure 3(b). The encoders embed input images into a latent space of 32-channel size. This architecture is inspired by the work of Guerrero et al. (2018). It is a fully convolutional network that follows a U-shaped architecture (Ronneberger et al., 2015). The U-Net’s downsampling followed by the upsampling and skip connections allow the network to exploit information at large spatial scales, while not losing useful local information. Moreover, as discussed in Drozdzal et al. (2016), skip connections facilitate gradient flow during training. Our encoders are shallower than the original U-Net, having three downsample and upsample steps compared to the original four steps.
Decoder architecture: One decoder is built for each output modality following the architecture shown in Figure 3(b). The model is a fully convolutional network to map a multichannel image-sized latent representation to a single channel image of the required modality with synthetic MS lesions.
2.3 Data augmentation application: generating new synthetic MS lesions
One of the applications of our synthetic MS lesion pipeline is to generate synthetic MS lesions on patient or healthy images and use these synthetic images as data augmentation to increase the MS lesion segmentation and detection performance. The main idea is to modify the original eight intensity level masks of the target image before passing it through the generator network. At testing time, if the intensity level masks are used without any modification, the output images are a generated synthetic version of the input ones containing all the WMHs found in the input image. Passing modified intensity level masks to the generator network will generate these desired modifications (i.e, new MS lesions) on the output images.
Figure 4 depicts how lesion expert annotations for a patient image can be generated on a healthy one through linear and nonlinear registration. After registration, the lesion mask and the eight intensity level masks of the patient subject are resampled to the healthy space. We split the resampled binary lesion mask into individual lesion volumes, in which every single lesion was defined as a spatially disconnected volume. After the lesion separation, the individual lesion volumes are dilated to incorporate the hyperintensities surrounding the lesions that are not annotated as lesion voxels. The intensity level masks of the dilated lesion volumes are copied from the patient resampled masks to the healthy masks. Finally, the healthy images plus their modified intensity level masks are passed through the generator network to add new MS lesions to the synthetic output images. In the same way, new MS lesions can be generated in patient images using patient-to-patient registration. Furthermore, more lesions could be added to the follow-up scans in the longitudinal MS analysis.
3 Experimental setup
Clinical MS dataset: This dataset consists of 15 healthy subjects and 65 different patients with a clinically isolated syndrome or early relapsing MS (Vall d’Hebron Hospital Center, Barcelona, Spain) who underwent brain MR imaging for monitoring disease evolution and treatment response. Each patient underwent brain MRI within the first months after the onset of symptoms. The scans for all the patients were obtained in the same 3T magnet (Tim Trio; Siemens, Erlangen, Germany) with a 12-channel phased array head coil. The MRI protocol included the following sequences: 1) transverse proton density (PD)- and T2-w fast spin-echo (TR ms, TE ms, voxel size mm), 2) transverse fast FLAIR (TR = ms, TE ms, TI ms, flip angle , voxel size mm), and 3) sagittal T1-w 3D magnetization-prepared rapid acquisition of gradient echo (TR ms, TE ms, TI ms, voxel size mm). The dataset was preprocessed as follows: for each patient, the T1-w image was linearly registered to the FLAIR using Nifty Reg tools111https://sourceforge.net/projects/niftyreg/ (Modat et al., 2014, 2010). Afterwards, a brain mask was identified and delineated on the registered T1-w image using the ROBEX Tool222https://www.nitrc.org/projects/robex (Iglesias et al., 2011). Then, the two images underwent a bias field correction step using the N4 algorithm from the ITK library333https://itk.org/Doxygen/html/classitk_1_1N4BiasFieldCorrectionImageFilter.html with the standard parameters for a maximum of 400 iterations (Tustison et al., 2010).
ISBI2015 dataset: This dataset consists of 5 training and 14 testing subjects with 4 or 5 different image time-points per subject from the ISBI2015 MS lesion challenge (Carass et al., 2017). Each scan was imaged and preprocessed in the same manner by the own organizers, with data acquired on a 3.0 Tesla MRI scanner (Philips Medical Systems, Best, The Netherlands) with T1-w MPRAGE, T2-w, PD and FLAIR sequences. For more information about the image protocol and preprocessing details, refer to the challenge organizers website444http://iacl.ece.jhu.edu/index.php/MSChallenge/data. On the challenge competition, each subject image was evaluated independently, which led to a final training set and a testing set composed of 21 and 61 images, respectively. Additionally, manual delineations of MS lesions performed by two experts were included for each of the 21 training images.
For both datasets, brain tissue volume was computed using the FAST segmentation method Zhang et al. (2001). Finally, the WMH mask and the eight intensity level masks were computed by FLAIR thresholding as explained Section 2.2.1, and T1-w and FLAIR were filled using the WMH mask computed using the method explained in Section 2.2.2.
3.2 MS lesion generator training and implementation details
To perform our experimental tests, we trained the lesion generator models into two different scenarios, one being the MS clinical dataset and the other one the ISBI2015 dataset (see Table 1
for the images used for training). For training the generation network, 2D 64x64 patches with step size of 32x32 were extracted from the original images, the filled images, and the eight intensity level masks. The extracted patches were split into training and validation sets (70% for training and 30% for validation). The training set was used to adjust the weights of the neural network, while the validation set was used to measure how well the trained model was performing after each epoch. The extracted patches were passed to the network for training in mini batches of size 32 and the network was set to train for 200 epochs. To prevent overfitting, the training process was automatically terminated when the validation accuracy did not increase after 15 epochs. Regarding the MS lesion segmentation framework, the CNN training and inference procedures were identical to those proposed byValverde et al. (2017).
The proposed method has been implemented in Python5558https://www.python.org
, using Keras666https://keras.io
with the TensorFlow777https://www.tensorflow.org/ backend (Abadi et al., 2015). All experiments have been run on a GNU/Linux machine box running Ubuntu 18.04, with 128 GB RAM memory. The model training was carried out on a single TITAN-X GPU (NVIDIA Corp, United States) with 12 GB RAM memory. To promote the reproducibility and usability of our research, the proposed MS lesion generation pipeline is currently available for downloading at our research website888https://github.com/NIC-VICOROB/MS_Lesions_Generator..
|Datasets||Total number of images||MS lesion generator||MS lesion segmentation model|
|MS clinical dataset||
- 65 patient images
Group A (36 images)
Group B (29 images)
- 15 healthy images (VHhealthy)
Training: Group A (36 images)
Testing: Group B (29 images)
Group B (29 images) is split into:
Training: VHtrain (15 images)
Testing: VHtest (14 images)
- 21 patient images (ISBItrain)
- 61 patient images (ISBItest)
3.3 Evaluation metrics
To evaluate the performance of the proposed MS lesion generator, we computed the similarity between the original and the synthetic images using the following similarity metrics:
Mean Square Error (MSE):
where and are the intensities of the generated and the real images, respectively, and is the number of voxels in the image.
Structural Similarity Index (SSIM):
where (, ) and (,
) are the intensity’s (mean, variance) of the generated and the real images, respectively, andis the covariance between them, and are two constants to stabilize the division with weak denominator.
On the other hand, the quantitative evaluation of the proposed MS lesion generator was performed by segmenting both the original and synthetic images individually using the same MS lesion segmentation framework and comparing the difference between the segmentation results. As explained before, the segmentation framework used to evaluate the proposed MS lesion generator is the MS lesion segmentation method proposed by Valverde et al. (2017)
, although the proposed data augmentation strategy could be applied to any approach. The evaluation of the resulting segmentations against the available lesion annotations was carried out using the following evaluation metrics:
Dice Similarity Coefficient (DSC), which measures the overall segmentation accuracy between the manual lesion annotations and the output segmentation masks:
denote the number of voxels correctly and incorrectly classified as a lesion, respectively, anddenotes the number of voxels incorrectly classified as a nonlesion.
Sensitivity of the method in detecting lesions between manual lesion annotations and output segmentation masks:
where and denote the number of correctly and missed lesion region candidates, respectively.
Precision of the method in detecting lesions between manual lesion annotations and output segmentation masks:
where and denote the number of correctly and incorrectly classified lesion region candidates, respectively.
A paired t-test at the 5% level was used to evaluate the significance of the data augmentation results. Significant results are shown in bold in all tables.
4 Experiments and results
4.1 MS lesion synthesis
In these experiments, qualitative and quantitative evaluations were undertaken by measuring the similarities between the real and the synthetic images in terms of MSE and SSIM metrics and in terms of MS lesion detection and segmentation using a state-of-the-art MS lesion segmentation method (Valverde et al., 2017) and the evaluation metrics described in section 3.3 (see Table 1 for the images used).
Clinical MS dataset: Both VHtrain and VHtest sets were generated using the proposed MS generator yielding VHtrainGen and VHtestGen, respectively. The evaluation of the proposed MS generator on this dataset was performed by measuring the MSE and SSIM metrics between the real and the synthetic images (using Group B images, see Table 1) and by training and testing the MS lesion segmentation model (Valverde et al., 2017) as follows: 1) training with the VHtrain set and testing on the VHtest set; 2) training with the VHtrainGen set and testing on the VHtestGen set; 3) training with the VHtrainGen set and testing on the VHtest set; and 4) training with the VHtrain set and testing on the VHtestGen set.
ISBI2015 dataset: The ISBItrain set was generated using the proposed MS generator yielding ISBItrainGen. Note that the evaluation of the ISBI 2015 challenge is performed blind by submitting the segmentation masks of the 61 testing cases to the challenge website evaluation platform999https://smart-stats-tools.org/node/26. The evaluation of the proposed MS generator on this dataset was performed by measuring the MSE and SSIM metrics between the real and the synthetic images (using ISBItrain set, see Table 1). The performance of the two MS lesion segmentation models, one trained with the ISBItrain set and the other trained with the ISBItrainGen set, was evaluated by submitting to the challenge’s evaluation platform, and comparing the accuracy between them.
MS lesion generation on healthy subjects: To evaluate the generation of MS lesions on healthy subjects by using registration, the MS lesions of the VHtrain dataset were generated on the VHhealthy images using linear and nonlinear registration as described in section 2.3. We refer to them as VHGenLinear and VHGenNonlinear, respectively. The evaluation of the proposed MS generator on these datasets was performed by training 3 MS lesion segmentation models using the VHGenLinear, the VHGenNonlinear, and (VHGenLinear + VHGenNonlinear) and testing on the VHtest set.
Table 2 summarizes the MSE and SSIM between the real and synthetic images of the clinical MS and ISBI2015 datasets. Furthermore, the MSE and SSIM of WMH mask voxels are reported. Figure 5 and 6 show the qualitative assessment of the proposed MS lesion generator of the clinical MS/ISBI2015 datasets and synthetic MS lesions generated on healthy subjects using linear/nonlinear registration, respectively. The slices are also displayed using jet color maps to show the similarity of intensities inside the original and the synthetic lesions. Table 3 summarizes the MS lesion detection and segmentation results, showing the obtained mean values when training with the original and synthetic images of the clinical MS and ISBI2015 datasets. The mean results when training with the synthetic MS lesions generated on healthy images using the clinical MS dataset lesion set are shown in Table 4.
|Clinical MS Dataset (Group B set)|
|Non-background voxels||WMH mask voxels|
|ISBI2015 Dataset (ISBItrain images)|
|Non-background voxels||WMH mask voxels|
|Clinical MS Dataset|
4.2 Data augmentation experiments
In these experiments, we evaluated the use of the proposed MS lesion generator as a data augmentation method by generating the lesion masks on healthy images from the same domain using registration as described in section 2.3. The two deformed generated lesion masks (from linear and nonlinear registration) and the correspondent two synthetic images were added to the original patient image during training as data augmentation.
Clinical MS dataset: For each patient image from the VHtrain set, we created two synthetic images with lesions on a healthy image from the VHhealthy set (VHGenLinear and VHGenNonlinear) as described in section 2.3. Those two synthetic images were used together with the original image as data augmentation in the following experimental tests: 1) to analyze the effect of the synthetic data augmentation images on the segmentation performance while training with different number of training images, two models were trained using 1, 2, 3, 5, 10 or all of the available training images, with one model using the original images and the other using the same original images plus their synthetic data augmentation images; and 2) to simulate a situation with limited training data, we analyzed the effect of the synthetic data augmentation on the segmentation performance in the scenario of having only one-image for training. Using a single training image with a lesion volume in the range of ml, two models were trained. One model used the original image (i.e., from VHtrain) and the other used the same original image plus the two synthetic images generated on the healthy image (i.e., from VHGenLinear and VHGenNonlinear).
ISBI2015 dataset: To simulate a situation with limited training data, we analyzed the effect of the synthetic data augmentation images on the segmentation performance in the one-image training scenario on the overall performance of the testing set. To do so, we chose a single training image from each training subject (ISBItrain), which led to 5 different training sets with a varying number of lesions and a total lesion volume in the range ml. Since there were no healthy subjects available from this challenge, we chose the fourth training subject (this image has the smallest lesion load; ml) and filled it as described in section 2.2.1 (but only MS lesions were filled instead of the WMH areas). We considered this image as a healthy subject and we refer to it as ISBI-H. The MS lesions of each of the four selected ISBI images were generated on the ISBI-H using linear and nonlinear registration, as described in section 2.3, yielding, for each patient image from the selected four, two generated images and their correspondent lesion masks that were used as data augmentation. Based on this, we undertook the following experiments. 1) To simulate a situation with limited training data, we analyzed the effect of the synthetic data augmentation images on the segmentation performance in the one-image training scenario. Using a single training image from the four images selected, two models were trained, one using the original image and the other using the original image plus its two synthetic images generated on ISBI-H using linear and nonlinear registration. 2) To determine the performance of all the models trained on the blind test set, all trained models from the previous experiment were sent to the challenge’s evaluation platform, comparing its accuracy to those of the other submitted MS lesion segmentation pipelines fully trained using the entire available training set. Among the set of evaluated coefficients computed in the challenge, only the DSC, sensitivity and precision metrics are shown for comparison.
Regarding the Clinical MS dataset, Figure 7 shows the DSC, sensitivity and precision coefficients of different models trained using different number of training images, which ranged from 1 to 15 images. Table 5 shows the DSC, sensitivity and precision coefficients of the models under the one-image training scenario. Regarding the ISBI2015 dataset, Table 6 shows the performance of each of the one-image scenario models when trained on different images with varying degrees of lesion size. Table 7 shows the performance of the models trained with ISBI02 plus DA against different top rank participant challenge strategies. From the list of compared methods, the best five strategies were based on CNN models (Andermatt et al. (2018); Salehi et al. (2017); Valverde et al. (2017); Birenbaum and Greenspan (2017)), while the others were based on either other supervised learning techniques (Valcarcel et al. (2018); Deshpande et al. (2015); Sudre et al. (2015)) or unsupervised intensity models (Shiee et al. (2010); Jain et al. (2015)).
|lesion vol (num lesions)|
|0.34 ml (18 lesions)||ORG|
|1.0 ml (6 lesions)||ORG|
|2.0 ml (25 lesions)||ORG|
|5.5 ml (15 lesions)||ORG|
|7.6 ml (42 lesions)||ORG|
|21.5 ml (181 lesions)||ORG|
|49.4 ml (53 lesions)||ORG|
|lesion vol (num lesions)|
|Andermatt et al. (2018)|
|Salehi et al. (2017)|
|Valverde et al. (2017)|
|Birenbaum and Greenspan (2017)|
|Deshpande et al. (2015)|
|Jain et al. (2015)|
|Shiee et al. (2010)|
|Valcarcel et al. (2018)|
|Sudre et al. (2015)|
|ISBI02 + DA|
5 Discussion and future work
We proposed a synthetic MS lesion generator pipeline that generates synthetic images with MS lesions. The use of the intensity level masks introduced in our proposal enabled us to train the model without the need of ground truth. Furthermore, the intensity level masks help the MS lesion generator to preserve the intensity gradients inside the synthetic MS lesion. Although the proposed pipeline was used to generate MS lesions on T1-w and FLAIR images using only two encoders and two decoders, the model can be easily extended to new input/output modalities through the addition of new encoders/decoders.
We demonstrated the similarity between the synthetic and real lesions qualitatively and quantitatively on patient and healthy subjects. Synthetic images are very similar to the real ones in terms of the two similarity metrics for nonbackground and WMH mask voxels for both datasets. Regarding the MS lesion segmentation results, the experiments show how similar the training is using real or synthetic images in terms of MS lesion detection. Regarding the MS clinical dataset, the performance is 2% less in terms of DSC and precision when training with the synthetic images than training with the real images. However, similar results were obtained when training with real images and testing on synthetic images. From the results obtained, synthetic images could be used as training or testing images. Regarding the ISBI2015 datasets, the performance is very similar in terms of the three coefficients. Regarding the training using synthetic MS lesions generated on healthy subjects, good segmentation and detection results were obtained when training with synthetic images generated on healthy subjects. The performance is also very similar when training with synthetic images generated using linear, nonlinear registration or both.
Regarding the data augmentation experiments, we demonstrated the effect of data augmentation on the MS lesion segmentation performance when increasing the number of the training images. The difference in performance between training with original images and original images plus DA decreases in terms of the three metric coefficients as the number of the training images increases. The DA images generated from linear and nonlinear registration do not give more variability to the training data when increasing the number of training images. Furthermore, to simulate a situation with limited training data, we analyzed the effect of one-image training scenario. Regarding the MS clinical dataset, significant improvement was obtained in terms of the three metric coefficients with a lesion volume in the range of ml. Regarding the ISBI2015 dataset, a significant improvement was obtained in terms of the three metric coefficients, except for ISBI03, where only a significant improvement in precision was obtained. Comparing the accuracy of the best performing model (ISBI02+DA) to those of the other submitted MS lesion segmentation pipelines fully trained using the entire available training set, the proposed one image plus its data augmentation images reported a performance similar to that of the same fully trained cascaded CNN architecture (score ) (Valverde et al., 2017), which shows the improvement of the proposed data augmentation strategy to the training used with limited training data.
Currently, work is underway to build a lesion dictionary containing the MS lesion information (the annotation and the intensity level masks) of different MS lesions grouped by lesions load, and the extension could be an automatic selection of suitable insertion location so that the lesions selected from the dictionary could be generated synthetically in multiple locations without manual user involvement. Choosing the automatic locations of lesions is not an easy task because inserting lesions in incorrect locations may mislead the training process and decrease the overall performance. We believe that generating synthetic MS lesions on healthy subjects using the dictionary and the automatic locations will provide more variability to the training data than the linear/nonlinear registration, as data augmentation and the overall performance of the proposed pipeline will improve accordingly.
In conclusion, the obtained results indicate that the proposed pipeline is able to generate useful T1-w and FLAIR synthetic images with MS lesions that do not differ from real images. Furthermore, the combination of the synthetic MS lesions generated on healthy images and original patient images from the same domain increases the segmentation and detection accuracy of MS lesions.
Mostafa Salem holds a grant for obtaining the Ph.D. degree from the Egyptian Ministry of Higher Education. This work has been partially supported by La Fundació la Marató de TV3, by Retos de Investigación TIN2014- 55710-R, TIN2015-73563-JIN and DPI2017-86696-R from the Ministerio de Ciencia y Tecnología. The authors gratefully acknowledge the support of NVIDIA Corporation with their donation of the TITAN-X PASCAL GPU used in this research.
- com  MSSEG Challenge Proceedings: Multiple Sclerosis Lesions Segmentation Challenge Using a Data Management and Processing Infrastructure, Athènes, Greece, 2016.
- Abadi et al.  M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Jozefowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL https://www.tensorflow.org/. Software available from tensorflow.org.
Andermatt et al. 
S. Andermatt, S. Pezold, and P. C. Cattin.
Automated segmentation of multiple sclerosis lesions using multi-dimensional gated recurrent units.In Brainlesion: Glioma, Multiple Sclerosis, Stroke and Traumatic Brain Injuries, pages 31–42, 2018.
-  M. Battaglini, M. Jenkinson, and N. De Stefano. Evaluating and reducing the impact of white matter lesions on brain volume measurements. Human Brain Mapping, 33(9):2062–2071.
Birenbaum and Greenspan 
A. Birenbaum and H. Greenspan.
Multi-view longitudinal cnn for multiple sclerosis lesion
Engineering Applications of Artificial Intelligence, 65:111 – 118, 2017.
- Bowles et al.  C. Bowles, C. Qin, C. Ledig, R. Guerrero, R. Gunn, A. Hammers, E. Sakka, D. A. Dickie, M. V. Hernández, N. Royle, J. Wardlaw, H. Rhodius-Meester, B. Tijms, A. W. Lemstra, W. van der Flier, F. Barkhof, P. Scheltens, and D. Rueckert. Pseudo-healthy image synthesis for white matter lesion segmentation. In Simulation and Synthesis in Medical Imaging, pages 87–96, 2016.
- Cabezas et al.  M. Cabezas, A. Oliver, E. Roura, J. Freixenet, J. C. Vilanova, L. Ramió-Torrentà, À. Rovira, and X. Lladó. Automatic multiple sclerosis lesion detection in brain mri by flair thresholding. Computer Methods and Programs in Biomedicine, 115(3):147 – 161, 2014.
- Carass et al.  A. Carass, S. Roy, A. Jog, J. L. Cuzzocreo, E. Magrath, A. Gherman, J. Button, J. Nguyen, F. Prados, C. H. Sudre, M. J. Cardoso, N. Cawley, O. Ciccarelli, C. A. Wheeler-Kingshott, S. Ourselin, L. Catanese, H. Deshpande, P. Maurel, O. Commowick, C. Barillot, X. Tomas-Fernandez, S. K. Warfield, S. Vaidya, A. Chunduru, R. Muthuganapathy, G. Krishnamurthi, A. Jesson, T. Arbel, O. Maier, H. Handels, L. O. Iheme, D. Unay, S. Jain, D. M. Sima, D. Smeets, M. Ghafoorian, B. Platel, A. Birenbaum, H. Greenspan, P.-L. Bazin, P. A. Calabresi, C. M. Crainiceanu, L. M. Ellingsen, D. S. Reich, J. L. Prince, and D. L. Pham. Longitudinal multiple sclerosis lesion segmentation: Resource and challenge. NeuroImage, 148:77 – 102, 2017.
- Chartsias et al.  A. Chartsias, T. Joyce, M. V. Giuffrida, and S. A. Tsaftaris. Multimodal mr synthesis via modality-invariant latent representation. IEEE Transactions on Medical Imaging, 37(3):803–814, March 2018.
- Deshpande et al.  H. Deshpande, P. Maurel, and C. Barillot. Classification of multiple sclerosis lesions using adaptive dictionary learning. Computerized Medical Imaging and Graphics, 46:2 – 10, 2015. Sparsity Techniques in Medical Imaging.
- Drozdzal et al.  M. Drozdzal, E. Vorontsov, G. Chartrand, S. Kadoury, and C. Pal. The importance of skip connections in biomedical image segmentation. In Deep Learning and Data Labeling for Medical Applications, pages 179–187. 2016.
- Filippi et al.  M. Filippi, M. A. Rocca, O. Ciccarelli, N. De Stefano, N. Evangelou, L. Kappos, A. Rovira, J. Sastre-Garriga, M. Tintoré, J. L. Frederiksen, et al. MRI criteria for the diagnosis of multiple sclerosis: MAGNIMS consensus guidelines. The Lancet Neurology, 15(3):292 – 303, 2016.
- Guerrero et al.  R. Guerrero, C. Qin, O. Oktay, C. Bowles, L. Chen, R. Joules, R. Wolz, M. Valdés-Hernández, D. Dickie, J. Wardlaw, and D. Rueckert. White matter hyperintensity and stroke lesion segmentation and differentiation using convolutional neural networks. NeuroImage: Clinical, 17:918 – 934, 2018.
Hashemi et al. 
S. R. Hashemi, S. S. M. Salehi, D. Erdogmus, S. P. Prabhu, S. K. Warfield, and
Asymmetric loss functions and deep densely-connected networks for highly-imbalanced medical image segmentation: Application to multiple sclerosis lesion detection.IEEE Access, 7:1721–1735, 2019.
- Havaei et al.  M. Havaei, A. Davy, D. Warde-Farley, A. Biard, A. Courville, Y. Bengio, C. Pal, P.-M. Jodoin, and H. Larochelle. Brain tumor segmentation with deep neural networks. Medical Image Analysis, 35:18 – 31, 2017.
- Iglesias et al.  J. E. Iglesias, C.-Y. Liu, P. M. Thompson, and Z. Tu. Robust brain extraction across datasets and comparison with publicly available methods. IEEE Transactions on Medical Imaging, 30(9):1617–1634, 2011.
- Iglesias et al.  J. E. Iglesias, E. Konukoglu, D. Zikic, B. Glocker, K. Van Leemput, and B. Fischl. Is synthesizing mri contrast useful for inter-modality analysis? In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2013, pages 631–638, 2013.
- Jain et al.  S. Jain, D. M. Sima, A. Ribbens, M. Cambron, A. Maertens, W. V. Hecke, J. D. Mey, F. Barkhof, M. D. Steenwijk, M. Daams, F. Maes, S. V. Huffel, H. Vrenken, and D. Smeets. Automatic segmentation and volumetry of multiple sclerosis brain lesions from mr images. NeuroImage: Clinical, 8:367 – 375, 2015.
- Kamnitsas et al.  K. Kamnitsas, C. Ledig, V. F. Newcombe, J. P. Simpson, A. D. Kane, D. K. Menon, D. Rueckert, and B. Glocker. Efficient multi-scale 3d cnn with fully connected crf for accurate brain lesion segmentation. Medical Image Analysis, 36:61 – 78, 2017.
- Krizhevsky et al.  A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In Proceedings of the 25th International Conference on Neural Information Processing Systems - Volume 1, NIPS’12, pages 1097–1105, 2012.
- Lladó et al.  X. Lladó, A. Oliver, M. Cabezas, J. Freixenet, J. C. Vilanova, A. Quiles, L. Valls, L. Ramió-Torrentà, and À. Rovira. Segmentation of multiple sclerosis lesions in brain mri: A review of automated approaches. Information Sciences, 186(1):164 – 185, 2012.
- Modat et al.  M. Modat, G. R. Ridgway, Z. A. Taylor, M. Lehmann, J. Barnes, D. J. Hawkes, N. C. Fox, and S. Ourselin. Fast free-form deformation using graphics processing units. Computer Methods and Programs in Biomedicine, 98(3):278 – 284, 2010.
- Modat et al.  M. Modat, D. M. Cash, P. Daga, G. P. Winston, J. S. Duncan, and S. Ourselin. Global image registration using a symmetric block-matching approach. Journal of Medical Imaging, 1(2):024003, 2014.
- Moeskops et al.  P. Moeskops, M. A. Viergever, A. M. Mendrik, L. S. de Vries, M. J. N. L. Benders, and I. Išgum. Automatic segmentation of mr brain images with a convolutional neural network. IEEE Transactions on Medical Imaging, 35(5):1252–1261, May 2016.
- Ronneberger et al.  O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 234–241, 2015.
- Rovira et al.  À. Rovira, M. P. Wattjes, M. Tintoré, C. Tur, T. a. Yousry, M. P. Sormani, N. De Stefano, M. Filippi, C. Auger, M. a. Rocca, F. Barkhof, F. Fazekas, L. Kappos, C. Polman, D. Miller, and X. Montalban. Evidence-based guidelines: MAGNIMS consensus guidelines on the use of MRI in multiple sclerosisclinical implementation in the diagnostic process. Nature Reviews Neurology, 11(August):1–12, 2015.
- Salehi et al.  S. S. M. Salehi, D. Erdogmus, and A. Gholipour. Tversky loss function for image segmentation using 3d fully convolutional deep networks. In Machine Learning in Medical Imaging, pages 379–387, 2017.
- Salem et al.  M. Salem, M. Cabezas, S. Valverde, D. Pareto, A. Oliver, J. Salvi, À. Rovira, and X. Lladó. A supervised framework with intensity subtraction and deformation field features for the detection of new t2-w lesions in multiple sclerosis. NeuroImage: Clinical, 17:607 – 615, 2018.
- Shiee et al.  N. Shiee, P.-L. Bazin, A. Ozturk, D. S. Reich, P. A. Calabresi, and D. L. Pham. A topology-preserving approach to the segmentation of brain images with multiple sclerosis lesions. NeuroImage, 49(2):1524 – 1535, 2010.
- Styner et al.  M. Styner, J. Lee, B. Chin, M. Chin, O. Commowick, H. Tran, S. Markovic-Plese, V. Jewells, and S. Warfield. 3d segmentation in the clinic: A grand challenge ii: Ms lesion segmentation. Midas, pages 1 – 6, 11 2008.
- Sudre et al.  C. H. Sudre, M. J. Cardoso, W. H. Bouvy, G. J. Biessels, J. Barnes, and S. Ourselin. Bayesian model selection for pathological neuroimaging data applied to white matter lesion segmentation. IEEE Transactions on Medical Imaging, 34(10):2079–2102, Oct 2015.
- Tustison et al.  N. J. Tustison, B. B. Avants, P. A. Cook, Y. Zheng, A. Egan, P. A. Yushkevich, and J. C. Gee. N4ITK: Improved n3 bias correction. IEEE Transactions on Medical Imaging, 29(6):1310–1320, 2010.
- Valcarcel et al.  A. M. Valcarcel, K. A. Linn, S. N. Vandekar, T. D. Satterthwaite, J. Muschelli, P. A. Calabresi, D. L. Pham, M. L. Martin, and R. T. Shinohara. Mimosa: An automated method for intermodal segmentation analysis of multiple sclerosis brain lesions. Journal of Neuroimaging, 28(4):389–398, 2018.
- Valverde et al.  S. Valverde, M. Cabezas, E. Roura, S. González-Villà, D. Pareto, J. C. Vilanova, L. Ramió-Torrentà, À. Rovira, A. Oliver, and X. Lladó. Improving automated multiple sclerosis lesion segmentation with a cascaded 3d convolutional neural network approach. NeuroImage, 155:159 – 168, 2017.
- van Tulder and de Bruijne  G. van Tulder and M. de Bruijne. Why does synthesized data improve multi-sequence classification? In Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pages 531–538, 2015.
- Zhang et al.  C. Zhang, W. Tavanapong, J. Wong, P. C. de Groen, and J. Oh. Real data augmentation for medical image classification. In Intravascular Imaging and Computer Assisted Stenting, and Large-Scale Annotation of Biomedical Data and Expert Label Synthesis, pages 67–76, 2017.
- Zhang et al.  W. Zhang, R. Li, H. Deng, L. Wang, W. Lin, S. Ji, and D. Shen. Deep convolutional neural networks for multi-modality isointense infant brain image segmentation. NeuroImage, 108:214 – 224, 2015.
Zhang et al. 
Y. Zhang, M. Brady, and S. Smith.
Segmentation of brain mr images through a hidden markov random field model and the expectation-maximization algorithm.IEEE Transactions on Medical Imaging, 20(1):45–57, Jan 2001.