In current clinical practice, multiple imaging modalities may be available for disease diagnosis and surgical planning . For a specific patient group, a certain imaging modality might be more popular than others. Due to the proliferation of multiple imaging modalities, there is a strong clinical need to develop a cross-modality image transfer analysis system to assist clinical treatment, such as radiation therapy planning .
Machine learning (ML) based methods have been widely used for medical image analysis [40, 39], including detection, segmentation, and tracking of an anatomical structure. Such methods are often generic and can be extended from one imaging modality to the other by re-training on the target imaging modality. However, a sufficient number of representative training images are required to achieve enough robustness. In practice, it is often difficult to collect enough training images, especially for a new imaging modality not well established in clinical practice yet. Synthesized data are often used to as supplementary training data in hope that they can boost the generalization capability of a trained ML model. This paper presents a novel method to address the above-mentioned two demanding tasks (Figure 1). The first is cross-modality translation and the second is improving segmentation models by making use of synthesized data.
to formulate it as an image-to-image translation task. These methods require pixel-to-pixel correspondence between two domain data to build direct cross-modality reconstruction. However, in a more common scenario, multimodal medical images are in 3D and do not have cross-modality paired data. A method to learn from unpaired data is more general purpose. Furthermore, tomography structures (e.g. shape), in medical images/volumes, contain diagnostic information. Keeping their invariance in translation is critical. However, when using GANs without paired data, due to the lack of direct reconstruction, relying on discriminators to guarantee this requirement is not enough as we explain later.
It is an active research area by using synthetic data to overcome the insufficiency of labeled data in CNN training. In the medical image domain, people are interested in learning unsupervised translation between different modalities , so as to transfer existing labeled data from other modalities. However, the effectiveness of synthetic data heavily depends on the distribution gap between real and synthetic data. A possible solution to reduce such gap is by matching their distributions through GANs [30, 3].
In this paper, we present a general-purpose method to realize both medical volume translation as well as segmentation. In brief, given two sets of unpaired data in two modalities, we simultaneously learn generators for cross-domain volume-to-volume translation and stronger segmentors by taking advantage of synthetic data translated from another domain. Our method is composed of several 3D CNNs. From the generator learning view, we propose to train adversarial networks with cycle-consistency  to solve the problem of data without correspondence. We then propose a novel shape-consistency scheme to guarantee the shape invariance of synthetic images, which is supported by another CNN, namely segmentor. From the segmentor learning view, segmentors directly take advantage of generators by using synthetic data to boost the segmentation performance in an online fashion. Both generator and segmentor can take benefits from another in our end-to-end training fashion with one joint optimization objective.
On a dataset with 4,496 cardiovascular 3D image in MRI and CT modalities, we conduct extensive experiments to demonstrate the effectiveness of our method qualitatively and quantitatively from both generator and segmentor views with our proposed auxiliary evaluation metrics. We show that using synthetic data as an isolated offline data augmentation process underperforms our end-to-end online approach. On the volume segmentation task, blindly using synthetic data with a small number of real data can even distract the optimization when trained in the offline fashion. However, our method does not have this problem and leads to consistent improvement.
2 Related work
There are two demanding goals in medical image synthesis. The first is synthesizing realistic cross-modality images [12, 24], and second is to use synthetic data from other modalities with sufficient labeled data to help classification tasks (e.g. domain adaption ).
In computer vision, recent image-to-image translation is formulated as a pixel-to-pixel mapping using encoder-decoder CNNs[16, 21, 42, 18, 21, 34, 9]. Several studies have explored cross-modality translation for medical images, using sparse coding [12, 33], GANs [24, 26], CNN , etc. GANs have attracted wide interests in helping addressing such tasks to generate high-quality, less blurry results [10, 1, 2, 41]. More recent studies apply pixel-to-pixel GANs for brain MRI to CT image translation [24, 17] and retinal vessel annotation to image translation . However, these methods presume targeting images have paired cross-domain data. Learning from unpaired cross-domain data is an attractive yet not well explored problem [33, 22].
Synthesizing medical data to overcome insufficient labeled data attracted wide interests recently [30, 14, 13]. Due to the diversity of medical modalities, learning an unsupervised translation between modalities is a promising direction .  demonstrates the benefits on brain (MRI and CT) images, by using synthetic data as augmented training data to help lesion segmentation.
Apart from synthesizing data, several studies [20, 23, 36, 35] use adversarial learning as an extra supervision on the segmentation or detection networks. The adversarial loss plays a role of constraining the prediction to be close to the distribution of groundtruth. However, such strategy is a refinement process, so it is less likely to remedy the cost of data insufficiency.
3 Proposed Method
This section introduces our proposed method. We begin by discussing the recent advances for image-to-image translation and clarify their problems when used for medical volume-to-volume translation. Then we introduce our proposed medical volume-to-volume translation, with adversarial, cycle-consistency and shape-consistency losses, as well as dual-modality segmentation. Figure 2 illustrates our method.
3.1 Image-to-Image Translation for Unpaired Data
GANs have been widely used for image translation in the applications that need pixel-to-pixel mapping, such as image style transfer . ConditionalGAN  shows a strategy to learn such translation mapping with a conditional setting to capture structure information. However, it needs paired cross-domain images for the pixel-wise reconstruction loss. For some types of translation tasks, acquiring paired training data from two domains is difficult or even impossible. Recently, CycleGAN  and other similar methods [18, 37] are proposed to generalize ConditionalGAN to address this issue. Here we use CycleGAN to illustrate the key idea.
Given a set of unpaired data from two domains, and , CycleGAN learns two mappings, and , with two generators and , at the same time. To bypass the infeasibility of pixel-wise reconstruction with paired data, i.e. or , CycleGAN introduces an effective cycle-consistency loss for and . The idea is that the generated target domain data is able to return back to the exact data in the source domain it is generated from. To guarantee the fidelity of fake data and , CycleGAN uses two discriminators and to distinguish real or synthetic data and thereby encourage generators to synthesize realistic data .
3.2 Problems in Unpaired Volume-to-Volume Translation
Lacking supervision with a direct reconstruction error between and or and brings some uncertainties and difficulties towards to the desired outputs for more specified tasks. And it is even more challenging when training on 3D CNNs.
To be specific, cycle-consistency has an intrinsic ambiguity with respect to geometric transformations. For example, suppose generation functions, and , are cycle consistent, e.g., . Let be a bijective geometric transformation (e.g., translation, rotation, scaling, or even nonrigid transformation) with inverse transformation .
It is easy to show that and are also cycle consistent. Here, denotes the concatenation operation of two transformations. That means, using CycleGAN, when an image is translated from one domain to the other it can be geometrically distorted. And the distortion can be recovered when it is translated back to the original domain without provoking any penalty in data fidelity cost. From the discriminator perspective, geometric transformation does not change the realness of synthesized images since the shape of training data is arbitrary.
Such problem can destroy anatomical structures in synthetic medical volumes, which, however, has not being addressed by existing methods.
3.3 Volume-to-Volume Cycle-consistency
To solve the task of learning generators with unpaired volumes from two domains, and , we adopt the idea of the cycle-consistency loss (described above) for generators and to force the reconstructed synthetic sample and to be identical to their inputs and :
where is a sample from domain and is from domain . uses the L1 loss over all voxels, which shows better visual results than the L2 loss.
3.4 Volume-to-Volume Shape-consistency
To solve the intrinsic ambiguity with respect to geometric transformations in cycle-consistency as we pointed out above, our method introduces two auxiliary mappings, defined as and , to constrain the geometric invariance of synthetic data. They map the translated data from respective domain generators into a shared shape space (i.e. a semantic label space) and compute pixel-wise semantic ownership. The two mappings are represented by two CNNs, namely segmentors. We use them as extra supervision on the generators to support shape-consistency (see Figure 2), by optimizing
where denote the groundtruth shape representation of sample volumes and , respectively, where represent one voxel with one out of classes. is the total number of voxels in a volume. is formulated as a standard multi-class cross-entropy loss.
Regularization Shape-consistency provides a level of regularization on generators. Recall that different from ConditionalGAN, since we have no paired data, the only supervision for and is the adversarial loss, which is not sufficient to preserve all types of information in synthetic images, such as the annotation correctness.  introduces a self-regularization loss between an input image and an output image to force the annotations to be preserved. Our shape-consistency performs a similar role to preserve pixel-wise semantic label ownership, as a way to regularize the generators and guarantee the anatomical structure invariance in medical volumes.
3.5 Multi-modal Volume Segmentation
The second parallel task we address in our method is to make use of synthetic data for improving the generalization of segmentation network, which is trained together with generators. From the segmentor view (Figure 2) of and , the synthetic volumes and provide extra training data to help improve the segmentors in an online manner. During training, and take both real data and synthetic data that are generated by generators online (see Figure 2). By maximizing the usage of synthetic data, we also use reconstructed synthetic data, and , as the inputs of segmentors.
Note that the most straightforward way to use synthetic data is fusing them with real data and then train a segmentation CNN. We denote this as an ad-hoc offline data augmentation approach. Compared with it, our method implicitly performs data augmentation in an online manner. Formulated in our optimization objective, our method can use synthetic data more adaptively, which thereby offers more stable training and thereby better performance than the offline approach. We will demonstrate this in experiments.
Given the definitions of cycle-consistency and shape-consistency losses above, we define our full objective as:
The adversarial loss (defined in [42, 16]) encourages local realism of synthetic data (see architecture details). is set to and is set to during training. To optimize , , and , we update them alternatively: optimizing with and fixed and then optimizing and (they are independent), respectively, with fixed.
The generators and segmentors are mutually beneficial, because to make the full objective optimized, the generators have to generate synthetic data with lower shape-consistency loss, which, from another angle, indicates lower segmentation losses over synthetic training data.
4 Network Architecture and Details
This section discusses necessary architecture and training details for generating high-quality 3D images.
Training deep networks end-to-end on 3D images is much more difficult (from optimization and memory aspects) than 2D images. Instead of using 2.5D  or sub-volumes , our method directly deals with holistic volumes. Our design trades-off network size and maximizes its effectiveness. There are several keys of network designs in order to achieve visually better results. The architecture of our method is composed by 3D fully convolutional layers with instance normalization 
(performs better than batch normalization
) and ReLU for generators or LeakyReLU for discriminators. CycleGAN originally designs generators with multiple residual blocks. Differently, in our generators, we make several critical modifications with justifications.
First, we find that using both bottom and top layer representations are critical to maintain the anatomical structures in medical images. We use long-range skip-connection in U-net  as it achieves much faster convergence and locally smooth results. ConditionalGAN also uses U-net generators, but we do not downsample feature maps as greedily as it does. We apply
times downsampling with stride-2convolutions totally, so the maximum downsampling rate is . The upsampling part is symmetric. Two sequential convolutions are used for each resolution, as it performs better than using one. Second, we replace transpose-convolutions to stride nearest upsampling followed by a convolution to realize upsampling as well as channel changes. It is also observed in  that transpose-convolution can cause checkerboard artifacts due to the uneven overlapping of convolutional kernels. Actually, this effect is even severer for 3D transpose-convolutions as one pixel will be covered by overlapping kernels (results in 8 times uneven overlapping). Figure 3 compares the results with CycleGAN, demonstrating that our method can obtain significantly better visual quality111We have experimented many different configurations of generators and discriminators. All trials did not achieve desired visual results compared with our configuration. .
For discriminators, we adopt the PatchGAN proposed by  to classify whether an overlapping sub-volume is real or fake, rather than to classify the whole volume. Such approach limits discriminators to use unexpected information from arbitrary volume locations to make decisions.
For segmentors, we use an U-Net , but without any normalization layer. Totally 3 times symmetric downsampling and upsampling are performed by stride max-poling and nearest upsampling. For each resolution, we use two sequential convolutions.
4.2 Training details
We use the Adam solver  for segmentors with a learning rate of and closely follow the settings in CycleGAN to train generators with discriminators. In the next section, for the purpose of fast experimenting, we choose to pre-train the and separately first and then train the whole network jointly. We hypothesized that fine-tuning generators and segmentors first is supposed to have better performance because they only affect each other after they have the sense of reasonable outputs. Nevertheless, we observed that training all from scratch can also obtain similar results. It demonstrates the effectiveness to couple both tasks in an end-to-end network and make them converge harmonically. We pre-train segmentors for epochs and generators for epochs. After jointly training for epochs, we decrease the learning rates for both generators and segmentors steadily for epochs till 0. We found that if the learning rate decreases to a certain small value, the synthetic images turn to show clear artifacts and the segmentors tend to overfit. We apply early stop when the segmentation loss no longer decreases for about epochs (usually takes epochs to reach a desired point). In training, the number of training data in two domains can be different. We go through all data in the domain with larger amount as one epoch.
|w/ SC (Ours)||69.2||69.6|
5 Experimental Results
This section evaluates and discusses our method. We introduce a 3D cardiovascular image dataset. Heart is a perfect example of the difficulty in getting paired cross-modality data as it is a nonrigid organ and it keeps beating. Even if there are CT and MRI scans from the same patient, they cannot be perfectly aligned. Then we evaluate the two tasks we addressed in our method, i.e., volume segmentation and synthesis, both qualitatively and quantitatively with our proposed auxiliary evaluation metrics.
We collected 4,354 contrasted cardiac CT scans from patients with various cardiovascular diseases ( volumes per patients). The resolution inside an axial slice is isotropic and varies from 0.28 mm to 0.74 mm for different volumes. The slice thickness (distance between neighboring slices) is larger than the in-slice resolution and varies from 0.4 mm to 2.0 mm. In addition, we collected 142 cardiac MRI scans with a new compressed sensing scanning protocol. The MRI volumes have a near isotropic resolution ranging from 0.75 to 2.0 mm. This true 3D MRI scan with isotropic voxel size is a new imaging modality, only available in handful top hospitals. All volumes are resampled to 1.5 mm for the following experiments. We crop volumes around the heart center. The endocardium of all four cardiac chambers is annotated. The left ventricle epicardium is annotated too, resulting in five anatomical regions.
We denote CT as domain data and MRI as domain . We organize the dataset in two sets and . For , we randomly select 142 CT volumes from all CT images to match the number of MRI volumes. For both modalities, data is used as training and validation and the rest as testing data. For , we use all the rest 4,212 CT volumes as an extra augmentation dataset, which is used to generate synthetic MRI volumes for segmentation. We fix the testing data in for all experiments.
5.2 Cross-domain Translation Evaluation
We evaluate the generators both qualitatively and quantitatively. Figure 4 shows some typical synthetic results of our method. As can be observed visually, the synthetic images are close to real images and no obvious geometric distortion is introduced during image translation. Our method well preserves cardiac anatomies like aorta and spine.
Shape invariance evaluation For methods of GANs to generate class-specific natural images,  proposes to use the Inception score to evaluate the diversity of generated images, by using an auxiliary trained classification network.
Inspired by this, we propose the S-core (segmentation score) to evaluate the shape invariance quality of synthetic images. We train two segmentation networks on the training data of respective modalities and compare the multi-class Dice score of synthetic volumes. For each synthetic volume, S-score is computed by comparing to the groundtruth of the corresponding real volume it is translated from. Hence, higher score indicates better matched shape (i.e. less geometric distortion). Table 1 shows the S-score of synthetic data from CT and MRI for generators without the shape-consistency loss, denoted as w/o SC. Note that it is mostly similar with CycleGAN but using our optimized network designs. As can be seen, our method ( w/ SC) with shape-consistency achieves large improvement over the baseline on both modalities.
|Method||Dice score ()|
5.3 Segmentation Evaluation
Here we show how well our method can use the synthetic data and help improve segmentation. We compare to an ad-hoc approach as we mentioned above. Specifically, we individually train two segmentors, denoted as and . We treat the segmentation performance of them as Baseline (R) in the following. Then we train generators and with the adversarial and cycle-consistency losses (setting the weight of the shape-consistency loss to 0). Then by adding synthetic data, we perform the following comparison:
Ad-hoc approach (ADA): We use and to generate synthetic data (To make fair comparison, both synthetic data and reconstructed data are used). We fine-tune using synthetic together with real data (Figure 5 left)222At each training batch, we take half real and half synthetic data to prevent possible distraction from low-quality synthetic data..
Our method: We join , , , and (also with discriminators) and fine-tune the overall networks in an end-to-end fashion (Figure 5 right), as specified in the training details.
Note that the comparing segmentation network is U-net . For medical image segmentation, U-Net is well recognized as one of the best end-to-end CNN. Its long-range skip connection performs usually better or equal well as FCN or ResNet/DenseNet based architectures do , especially for small size medical datasets. The results of U-net is very representative for state-of-the-art medical image segmentation on our dataset.
We perform this experimental procedure on and both. In the the first experiment on , we test the scenario that how well our method uses synthetic data to improve segmentation given only limited real data. Since we need to vary the number of data in one modality and fix another, we perform the experiments on both modalities, respectively.
By using real data and all synthetic data from the counter modality, Table 2 compares the segmentation results. We use the standard multi-class Dice score as the evaluation metric . As can be observed, our method achieves much better performance on both modalities. For CT segmentation, ADA even deteriorates the performance. We speculate that it is because the baseline model trained with very few real data has not been stabilized. Synthetic data distracts optimization when used for training offline. While our method adapts them fairly well and leads to significant improvement.
We also demonstrate the qualitative results of our method in Figure 6. By only using extra synthetic data, our method largely corrects the segmentation errors. Furthermore, we show the results by varying the number of real data used in Figure 7 (left and middle). Our method has consistently better performance than the ADA. In addition, we notice the increment is growing slower as the number of real data increases. One reason is that more real data makes the segmentors get closer to its capacity, so the effect of extra synthetic data gets smaller. But this situation can be definitely balanced out by increasing the size of segmentors with sufficient GPU memory.
The second experiment is applied on , which has much more CT data, so we aim at boosting the MRI segmentor. We vary the number of used synthetic data and use all real MRI data. Figure 7 (right) compares the results. Our method still shows better performance. As can be observed, our method uses synthetic data to reach the accuracy of the ADA when it uses synthetic data.
5.4 Gap between synthetic and real data
Reducing the distribution gap between real and synthetic data is the key to make synthetic data useful for segmentation. Here we show a way to interpret the gap between synthetic and real data by evaluating their performance to improve segmentation. On dataset , we train a MRI segmentor using real data. Then we boost the segmentor by adding 1) pure MRI real data, 2) using ADA, and 3) using our method. As shown in Figure 8, our method reduces the gap of the ADA significantly, i.e., by given real data and given real data.
Moreover, we found that, when using the synthetic data as augmented data offline (our comparing baseline), too much synthetic data could diverge the network training. While in our method, we did not observe such situation. However, we also observe that the gap is more difficult to reduce as the number of read data increases. Although one of reasons is due to the modal capacity. We believe the solution of this gap-reduction worth further study.
In this paper, we present a method that can simultaneously learn to translate and segment medical 3D images, which are two significant tasks in medical imaging. Training generators for cross-domain volume-to-volume translation is more difficult than that on 2D images. We address three key problems that are important in synthesizing realistic 3D medical images: 1) learn from unpaired data, 2) keep anatomy (i.e. shape) consistency, and 3) use synthetic data to improve volume segmentation effectively. We demonstrate that our unified method that couples the two tasks is more effective than solving them exclusively. Extensive experiments on a 3D cardiovascular dataset validate the effectiveness and superiority of our method.
-  M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein gan. arXiv preprint arXiv:1701.07875, 2017.
-  D. Berthelot, T. Schumm, and L. Metz. Began: Boundary equilibrium generative adversarial networks. arXiv preprint arXiv:1703.10717, 2017.
-  K. Bousmalis, N. Silberman, D. Dohan, D. Erhan, and D. Krishnan. Unsupervised pixel-level domain adaptation with generative adversarial networks. CVPR, 2017.
-  N. Burgos, M. J. Cardoso, F. Guerreiro, C. Veiga, M. Modat, J. McClelland, A.-C. Knopf, S. Punwani, D. Atkinson, S. R. Arridge, et al. Robust ct synthesis for radiotherapy planning: application to the head and neck region. In MICCAI, 2015.
-  X. Cao, J. Yang, Y. Gao, Y. Guo, G. Wu, and D. Shen. Dual-core steered non-rigid registration for multi-modal images via bi-directional image synthesis. Medical Image Analysis, 2017.
-  P. Costa, A. Galdran, M. I. Meyer, M. D. Abràmoff, M. Niemeijer, A. M. Mendonça, and A. Campilho. Towards adversarial retinal image synthesis. arXiv preprint arXiv:1701.08974, 2017.
-  L. R. Dice. Measures of the amount of ecologic association between species. Ecology, 26(3):297–302, 1945.
-  M. Drozdzal, E. Vorontsov, G. Chartrand, S. Kadoury, and C. Pal. The importance of skip connections in biomedical image segmentation. In International Workshop on Large-Scale Annotation of Biomedical Data and Expert Label Synthesis, 2016.
-  Y. Gong, S. Karanam, Z. Wu, K.-C. Peng, J. Ernst, and P. C. Doerschuk. Learning compositional visual concepts with mutual consistency. arXiv preprint arXiv:1711.06148, 2017.
-  I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In NIPS, 2014.
-  K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition. In CVPR, 2016.
Y. Huang, L. Shao, and A. F. Frangi.
Simultaneous super-resolution and cross-modality synthesis of 3d medical images using weakly-supervised joint convolutional sparse coding.CVPR, 2017.
-  Y. Huo, Z. Xu, S. Bao, A. Assad, R. G. Abramson, and B. A. Landman. Adversarial synthesis learning enables segmentation without target modality ground truth. ISBI, 2018.
-  J. E. Iglesias, E. Konukoglu, D. Zikic, B. Glocker, K. Van Leemput, and B. Fischl. Is synthesizing mri contrast useful for inter-modality analysis? In MICCAI, 2013.
-  S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In ICML, 2015.
P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros.
Image-to-image translation with conditional adversarial networks.CVPR, 2017.
-  K. Kamnitsas, C. Baumgartner, C. Ledig, V. Newcombe, J. Simpson, A. Kane, D. Menon, A. Nori, A. Criminisi, D. Rueckert, et al. Unsupervised domain adaptation in brain lesion segmentation with adversarial networks. In IPMI, 2017.
-  T. Kim, M. Cha, H. Kim, J. Lee, and J. Kim. Learning to discover cross-domain relations with generative adversarial networks. In ICML, 2017.
-  D. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
-  S. Kohl, D. Bonekamp, H.-P. Schlemmer, K. Yaqubi, M. Hohenfellner, B. Hadaschik, J.-P. Radtke, and K. Maier-Hein. Adversarial networks for the detection of aggressive prostate cancer. arXiv preprint arXiv:1702.08014, 2017.
-  M.-Y. Liu, T. Breuel, and J. Kautz. Unsupervised image-to-image translation networks. arXiv preprint arXiv:1703.00848, 2017.
-  M.-Y. Liu and O. Tuzel. Coupled generative adversarial networks. In NIPS, 2016.
-  P. Luc, C. Couprie, S. Chintala, and J. Verbeek. Semantic segmentation using adversarial networks. NIPS Workshop on Adversarial Training, 2016.
-  D. Nie, R. Trullo, C. Petitjean, S. Ruan, and D. Shen. Medical image synthesis with context-aware generative adversarial networks. arXiv preprint arXiv:1612.05362, 2016.
-  A. Odena, V. Dumoulin, and C. Olah. Deconvolution and checkerboard artifacts. Distill, 2016.
-  A. Osokin, A. Chessel, R. E. C. Salas, and F. Vaggi. Gans for biological image synthesis. arXiv preprint arXiv:1708.04692, 2017.
-  O. Ronneberger, P. Fischer, and T. Brox. U-net: Convolutional networks for biomedical image segmentation. In MICCAI, 2015.
-  H. R. Roth, L. Lu, A. Seff, K. M. Cherry, J. Hoffman, S. Wang, J. Liu, E. Turkbey, and R. M. Summers. A new 2.5 d representation for lymph node detection using random sets of deep convolutional neural network observations. In MICCAI, pages 520–527, 2014.
-  T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen. Improved techniques for training gans. In NIPS, 2016.
-  A. Shrivastava, T. Pfister, O. Tuzel, J. Susskind, W. Wang, and R. Webb. Learning from simulated and unsupervised images through adversarial training. CVPR, 2017.
-  D. Ulyanov, A. Vedaldi, and V. Lempitsky. Instance normalization: The missing ingredient for fast stylization. arXiv preprint arXiv:1607.08022, 2016.
-  H. Van Nguyen, K. Zhou, and R. Vemulapalli. Cross-domain synthesis of medical images using efficient location-sensitive deep network. In MICCAI, 2015.
-  R. Vemulapalli, H. Van Nguyen, and S. Kevin Zhou. Unsupervised cross-modal synthesis of subject-specific scans. In ICCV, 2015.
-  J. Xue, H. Zhang, K. Dana, and K. Nishino. Differential angular imaging for material recognition. In CVPR, 2017.
-  Y. Xue, T. Xu, H. Zhang, R. Long, and X. Huang. Segan: Adversarial network with multi-scale loss for medical image segmentation. arXiv preprint arXiv:1706.01805, 2017.
-  D. Yang, D. Xu, S. K. Zhou, B. Georgescu, M. Chen, S. Grbic, D. Metaxas, and D. Comaniciu. Automatic liver segmentation using an adversarial image-to-image network. MICCAI, 2017.
-  Z. Yi, H. Zhang, P. T. Gong, et al. Dualgan: Unsupervised dual learning for image-to-image translation. arXiv preprint arXiv:1704.02510, 2017.
R. Zhang, P. Isola, and A. A. Efros.
Colorful image colorization.In ECCV, 2016.
-  Z. Zhang, P. Chen, M. Sapkota, and L. Yang. Tandemnet: Distilling knowledge from medical images using diagnostic reports as optional semantic references. In MICCAI, 2017.
-  Z. Zhang, Y. Xie, F. Xing, M. Mcgough, and L. Yang. Mdnet: A semantically and visually interpretable medical image diagnosis network. In CVPR, 2017.
-  Z. Zhang, Y. Xie, and L. Yang. Photographic text-to-image synthesis with a hierarchically-nested adversarial network. In CVPR, 2018.
-  J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired image-to-image translation using cycle-consistent adversarial networks. ICCV, 2017.