1 Introduction
A cohort of virtual cardiac magnetic resonance images (CMR) can be simulated to aid the development and adaptation of data-hungry deep-learning (DL) based medical image analysis methods. Recent studies have shown the effectiveness of image simulation in the context of training a DL model for CMR image segmentation [12, 9]
. Although such models provide accurate anatomical information, their performance is still suboptimal as a result of the realism gap, missing texture and simplistic appearance of the simulated images. This holds especially for models trained completely on simulated images and evaluated on real ones. Generative adversarial networks (GANs)
[5], on the other hand, promise to synthesize realistic examples, as demonstrated by applications for multi-modal medical image translation [1, 8, 3]. However, GAN-generated images may not necessarily represent plausible anatomy.The purpose of the current research is to reconcile the two worlds of simulation and synthesis, as defined in [4]
, and take advantage of recent developments in the field of computer vision to reduce the realism gap between simulated and real data using GANs for unpaired/unsupervised style transfer. The contributions are two-fold: 1) Physics-based simulation of cardiac MR images on a population of XCAT subjects 2) GANs-based image-to-image translation for style (texture) transfer from real images. The framework is named sim2real translation.
2 Material and Method
The 4D XCAT phantom [11] is utilized as the basis of the anatomical model for creating virtual subjects by carefully adjusting available parameters for altering the geometry of the human anatomy. We employ our in-house CMR image simulation framework based on the analytical Bloch equations to generate varying image contrast on the labels of the XCAT virtual subjects [9].

An unsupervised GAN model based on contrastive learning, known as CUT [10], is used for the task of unpaired translation between the real and the simulated images to transfer the realistic style (texture) from real images to simulated ones while preserving the anatomical information (content). Contrastive learning encourages encoded features of two patches from the same location in the real and translated images to be similar yet different to other patches. Compared to other unpaired translation frameworks such as CycleGAN [13], CUT is a one-sided network with a much lighter generator architecture hence requiring few data for training. The content of the simulated image is preserved through a multilayer patch-wise contrastive loss added to the adversarial loss, as shown in Figure 1.
The M&Ms challenge data [2] are used as the source of real cardiac MR images. To explores the effects of multi-vendor data, we utilize a subset of the data consisting of 150 subjects with a mix of healthy controls and patients with a variety of heart conditions scanned using Siemens (Vendor A) and Philips (Vendor B). We extract four mid-ventricular slices at end diastolic (ED) and end systolic (ES) phases for each subject. All subjects are resized, centre cropped to the size of 128 x 126, and normalized to the range of [0, 1]. The same pre-processing is applied on the simulated images, despite the fact we use the available ground truth labels of the simulated data to find a bounding box around the heart and crop accordingly instead of centre cropping.
Two identical sim2real models are trained using the data from vendor A and vendor B (sim2real A, and sim2real B) to investigate the network’s ability to transfer vendor-specific appearance images on simulated ones. We calculate the widely-used Fréchet Inception Distance (FID) score [6]
between feature vectors calculated for real and translated images to evaluate the similarity between the simulated database and its respective real data, before and after translation.
Additionally, we evaluate the usefulness of our sim2real data in aiding a DL segmentation model for the task of cardiac segmentation. We utilize a nnUNet [7], trained to segment the left ventricle (LV), right ventricle (RV), and the left ventricular myocardium (MYO). First, we train a model using 150 sim2real images with the style of vendors A and B and compare it to a model trained on 150 real images. We additionally train a model with a mixed set of real and sim2real data to observe the applicability of generated data for data augmentation.


3 Results
Two examples of simulated images and statistics of the XCAT virtual subjects’ distribution in terms of left ventricular volumes are depicted in Figure 2.
The FID score is computed between the simulated data, sim2real A data, sim2real B data and the data from vendor A and vendor B. The lower value for the FID score suggests more realistically generated images and thus higher similar feature statistics to real database. The results are shown in Figure 3. As expected, the original simulated data has a high FID score on both real A and real B data. Generally, the sim2real model substantially reduces the FID between the simulated data and real images, indicating improvement in image realism. Moreover, the vendor-specific imaging features are captured by the network and transferred to the simulated images. One real example from each vendor and each sim2real translation is shown for visual comparison.
Training | Testing | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Vendor A | Vendor B | ||||||||||||
LV | RV | MYO | LV | RV | MYO | ||||||||
Real | Simulated | Dice | HD | Dice | HD | Dice | HD | Dice | HD | Dice | HD | Dice | HD |
N/A | N=160 | 0.887 | 9.25 | 0.851 | 12.45 | 0.801 | 14.72 | 0.871 | 10.38 | 0.861 | 11.21 | 0.831 | 12.11 |
N=160 | N/A | 0.901 | 8.19 | 0.878 | 9.35 | 0.863 | 9.88 | 0.893 | 9.31 | 0.872 | 10.67 | 0.849 | 9.76 |
N=160 | N=160 | 0.915 | 7.85 | 0.882 | 10.21 | 0.872 | 12.32 | 0.911 | 7.28 | 0.874 | 10.85 | 0.851 | 10.21 |
The segmentation performance of three different models can be observed in Table 1, presenting the evaluation of all models on a separate test set from the M&Ms challenge. The results suggest that the model trained with sim2real images already adapts well to real data, exhibiting a slight drop in performance compared to the model trained with real data. Additionally, we observe that augmenting the training with sim2real data has a positive impact on segmentation accuracy (Dice score), particularly for the LV.
4 Discussion and Conclusion
In this work, we created a database of virtual cardiac MR images simulated on the XCAT anatomical phantom and investigated the effectiveness of an unsupervised GAN for the task of simulation-to-real translation, named sim2real. We attempted to reduce the realism gap between the simplified image simulation and complex realistic image textures. Our sim2real model could learn the vendor-specific imaging features and map them onto the simulated images, resulting in reduction of the FID scores which can be attributed to more similarity between the simulated and real databases. Our usability experiments suggested that sim2real data exhibits a good potential to augment real training data, particularly in scenarios where data is scarce.
References
- [1] (2021) Generation of annotated multimodal ground truth datasets for abdominal medical image registration. International journal of computer assisted radiology and surgery 16 (8), pp. 1277–1285. Cited by: §1.
- [2] (2021) Multi-centre, multi-vendor and multi-disease cardiac segmentation: the m&ms challenge. IEEE Transactions on Medical Imaging 40 (12), pp. 3543–3554. Cited by: §2.
- [3] (2017) Adversarial image synthesis for unpaired multi-modal cardiac data. In International workshop on simulation and synthesis in medical imaging, pp. 3–13. Cited by: §1.
- [4] (2018) Simulation and synthesis in medical imaging. IEEE transactions on medical imaging 37 (3), pp. 673–679. Cited by: §1.
- [5] (2014) Generative adversarial networks. arXiv preprint arXiv:1406.2661. Cited by: §1.
- [6] (2017) Gans trained by a two time-scale update rule converge to a local nash equilibrium. Advances in neural information processing systems 30. Cited by: §2.
- [7] (2021) NnU-net: a self-configuring method for deep learning-based biomedical image segmentation. Nature methods 18 (2), pp. 203–211. Cited by: §2.
- [8] (2019) Deep ct to mr synthesis using paired and unpaired data. Sensors 19 (10), pp. 2361. Cited by: §1.
- [9] (2020) Heterogeneous virtual population of simulated cmr images for improving the generalization of cardiac segmentation algorithms. In International Workshop on Simulation and Synthesis in Medical Imaging, pp. 68–79. Cited by: §1, §2.
- [10] (2020) Contrastive learning for unpaired image-to-image translation. In European conference on computer vision, pp. 319–345. Cited by: §2.
- [11] (2010) 4D xcat phantom for multimodality imaging research. Medical physics 37 (9), pp. 4902–4915. Cited by: §2.
-
[12]
(2021)
Simulator-generated training datasets as an alternative to using patient data for machine learning: an example in myocardial segmentation with mri
. Computer Methods and Programs in Biomedicine 198, pp. 105817. Cited by: §1. - [13] (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. In Proceedings of the IEEE international conference on computer vision, pp. 2223–2232. Cited by: §2.