Acquiring large medical datasets is an expensive and time consuming endeavour, especially when they have to be manually annotated by experts. In this paper, we propose a combined Variational Autoencoder - Generative Adversarial Network (VAE - GAN) method for producing highly realistic cine-MR images together with their pixel-accurate groundtruth.
are well-known for their ability to generate images whose distribution fits that of a predefined set of data. A subset of GANs are the image-to-image translation networks[isola2017image] that transform an input image from one domain, e.g. segmentation maps, to another domain, e.g. realistic images. Unfortunately, in the context of medical image segmentation, the ground-truth labels are either the result of a segmentation network (whose variety is limited to that of its input images) [shin2018medical] or hand drawn [abhishek2019mask2lesion]. In this paper, we overcome this problem by using a VAE that learns the underlying cardiac latent distribution and thus can generate an arbitrary large number of cardiac maps.
Our image generation pipeline uses a module called ”SPatially-Adaptive (DE)Normalization” (SPADE)[park2019semantic] that is a conditional normalization layer within the generator (fig.LABEL:fig:method). The segmentation map is used as the condition for the SPADE module, which forces the generator to output an image whose structure fits that of the cardiac shape. In addition to the GAN is an anatomical variational autoencoder (VAE) [painchaud2019cardiac] whose latent space can be sampled to produce anatomically valid cardiac shapes.
During training, the GAN is fed a MRI and, conditioned on its associated anatomical map, outputs a generated MRI, which are then both given to the discriminator. Simultaneously, the VAE is trained to learn the cardiac shapes’ latent distribution. At test time, the system is fed a MRI and, conditioned on an arbitrary anatomical map generated by the VAE decoder, generates a MRI whose cardiac shape fits the anatomical map (fig.LABEL:fig:example_mris).
3 Results and discussion
We trained and tested our framework on two cine-MRI datasets, namely ACDC [bernard2018deep] (1902 training slices and 1078 testing slices) and the Sunnybrook Cardiac Data [radau2009evaluation] (478 training slices and 236 testing slices), although the latter was re-annotated by an expert to match the segmentation specification of the ACDC dataset. These datasets contain cine-MRI images at end diastole and end systole and their associated expert segmentation for the left ventricular cavity, the myocardium and the right ventricular cavity.
We trained our VAE-GAN separately on the ACDC and Sunnybrook datasets and then generated 100k synthetic images by sampling the anatomical VAE’s latent space. We then trained an ENet CNN [paszke2016enet] on the original datasets as well as on the 100k synthetic datasets. Table LABEL:tab:acdc_res
summarizes the test results of ENet with and without fine tuning (training for a few epochs on the original datasets). Also, since our VAE-GAN can be seen as a sophisticated data augmentation, we trained the ENet with and without traditional data augmentation, i.e. random rotations, flips and shifts.
The ENet trained on the VAE-GAN generated datasets with data augmentation has Dice scores higher by 2 to 4 percent compared to an ENet trained on the original datasets (table LABEL:tab:acdc_res). This gap is even more prominent when ENet is fine-tuned as it’s Dice score increases by 6 to 10 percent. For instance, the ENet trained on the 100k synthetic Sunnybrook dataset with fine tuning on the original Sunnybrook dataset has a Dice score of 0.874 compared to only 0.776 when trained on the original Sunnybrook. This is a significant improvement considering that the used Sunnybrook training set contains only 2D slices.
Results also underline that our VAE-GAN coupled with data augmentation provides even better results (in table LABEL:tab:acdc_res, 0.888 vs 0.849 and 0.853 vs 0.803). Moreover, the ENet trained on our VAE-GAN generated dataset has better generalization capabilities; the model trained on the synthetic ACDC dataset has a Dice score of 0.853 on the Sunnybrook dataset versus 0.773 when trained on the orginial ACDC dataset.
We presented a novel VAE-GAN cine-MRI cardiac generation model. This method has the sole ability of generating both a realistic cardiac MR images as well as its associated groundtruth. Results have been positive after training and testing on two datasets, especially when using data augmentation and fine tuning. Further investigations could be done on the sampling from the VAE’s latent space to help overcome the problem of class imbalance which may be present in certain medical imaging datasets. Time consistency of cine-MRI could also be explored to augment the generation process in future works.