I Introduction
Imagetoimage (I2I) translation aims to learn a function by mapping images from one domain to another. The I2I framework is applied to many tasks in the fields of machine learning and computer vision such as imageinpainting
[1][2, 3][4], style transfer [5, 25, 26, 28, 27], domain adaptation [6, 7, 8, 9, 42], and person reidentification [10, 11, 12, 13, 14]. We face challenges either in collecting aligned image pairs for training (e.g., summer winter) or inexistence (e.g., artwork photo); thus, most work focuses on unpaired I2I models under the assumption that paired data are not available. However, tradeoffs arise in training stability due to the absence of paired supervision. Even more problematic, the unpaired setting is based on illposed problems having infinitely many solutions and multimodal outputs where a single input may correspond to multiple possible outputs. To handle this, models employ complex and disentangled architectures [22, 23, 24], which pose substantial difficulties from an optimization perspective.In recent years, several variants of cycleconsistency constraints [42], normalization techniques [25, 26, 27, 28, 29, 30], and different latent space assumptions [20, 22, 24, 31] have been investigated to achieve semanticaware cycles, control style information, and disentangle features. Despite these advances, stabilization is rarely discussed because the issue is avoided by dealing with narrow translations or utilizing refined datasets with similar poses and backgrounds. Furthermore, when it comes to realworld applications such as domain adaptation and person reidentification, data include various blurs, illuminations, or noise; therefore, training is not straightforward.
In the GANs community, several normalizations and gradientbased regularization techniques for the GAN discriminator have been studied such as batch normalization
[32], layer normalization [33], spectral normalization [34], and the gradient penalty [37]. However, [36] empirically revealed that simultaneous enforcement of both normalization and gradientbased regularization provides marginal gains or fails. Especially, I2I models adopt several normalization techniques [25, 26, 27, 28, 29, 30] and accordingly I2I models with stateoftheart gradientbased regularization are associated with more theoretical uncertainty than standard GANs.Zhang et al. [40] first put forward consistency regularized GAN (CRGAN) whereby consistency regularization was introduced to the GAN discriminator from semisupervised learning. CRGAN surpasses gradientbased approaches, but CRGAN is limited to real samples and regularization failures can occur using generated images for standard GANs.
In this work, we propose augmented cyclic consistency regularization (ACCR), a novel regularization technique in unpaired I2I translation without gradient information which incorporates consistency regularization on the discriminators leveraging three types of samples: real, fake, and reconstructed images. We augment these data feeding to the discriminators and penalize sensitivity to perturbations. We show an intuitive illustration of our method in CycleGAN [18] in Fig. 1.
Qualitatively, I2I models guarantee quality of both fake and reconstructed samples due to faster learning and lower potential of mode collapse. Thus, we justify the use of these images. Quantitatively, our method outperforms the CycleGAN baseline, the CRGAN method, and models with consistency regularization using fake and reconstructed samples respectively on MNIST MNISTM and MNIST SVHN. ACCRCycleGAN improves the baseline by 0.3% on MNIST MNISTM, 2.3% on MNISTM MNIST, 3.9% on MNIST SVHN, and 3.7% on SVHN MNIST as measured by classification accuracy on fake samples. Moreover, ACCR outperforms the CRGAN method in other types of data augmentation and cycleconsist constraints.
The contributions of our work are summarized as follows:

We propose a novel, simple, and effective training stabilizer in unpaired I2I translation using real, fake, and reconstructed samples.

We qualitatively explain why consistency regularization employing fake and reconstructed samples performs well in unpaired I2I models.

Our ACCR quantitatively outperforms the CycleGAN baseline and the CRGAN method in several datasets, for various cycleconsistent constraints, and with several commonly used data augmentation techniques, as well as combinations thereof.
Ii Related Work
Iia ImagetoImage Translation
To learn the mapping function with paired training data, Pix2Pix [16]
applies conditional GANs using both a latent vector and the input image. The constraint is enforced by the ground truth labels or pairwise correspondence at the pixellevel. CycleGAN
[18], DiscoGAN [19], and UNIT [20] employ a cycleconsistency constraint to simultaneously learn a pair of forward and backward mappings between two domains given unpaired training data, which is conditioned solely on an input image and accordingly produce one single output.To achieve multimodal generation, BicycleGAN [17] injects noise into mappings between latent and target spaces to prevent mode collapse in the paired setting. In unpaired multimodal translations, augmented CycleGAN [21] also injects latent code in the generators, and concurrent work [22, 23, 24] adopts disentangled representations to produce diversified outputs.
In the field of domain adaptation, the CycleGAN framework is applied in cycleconsistent adversarial domain adaptation models [6, 9, 42] and I2I translation based domain adaptation [7, 8] is designed for semantic segmentation of the target domain images. Currently, GANbased domain adaptation is introduced in person reidentification for addressing challenges in realworld scenarios. CycleGANbased approaches [10, 11, 12, 13] are widely adopted to transfer pedestrian image styles from one domain to another. Stateoftheart DGNet [14] makes use of disentangled architecture to encode pedestrians in appearance and structure spaces for implausible person image generation.
However, despite the wide range of use cases, unpaired I2I translation is more difficult from an optimization perspective because of the lack of supervision in the form of paired examples. Moreover, the latest multimodal methods incorporate domainspecific and domaininvariant encoders [22, 23, 24, 31]. These approaches often fail when the amount of training data is limited, or domain characteristics differ significantly [24]. It is problematic to learn separate latent spaces, larger networks, and unconditional generation where the latent vector can be simply mapped to a fullsize image in contrast to the previous conditional cases. Therefore, our work mainly focuses on the stabilization of unpaired I2I translation.
In general, all the models share a problem whereby the generators cannot faithfully reconstruct the input images since I2I models are inherently onetomany mappings. For instance, in the translation of semantic labels photo, original colors, textures, and lighting are impossible to fully recover and stochastically vary because the details are lost in the label domain. This is also the case for all other translations such as map photo, and summer winter, as well as digits. In our work, we make use of this drawback for improving the diversity of fake and reconstructed images in consistency regularization.
IiB Consistency Regularization
Consistency regularization was first proposed in the semisupervised learning literature [51, 52]
. The fundamental idea is simple: a classifier should output similar predictions for unlabeled examples even after they have been randomly perturbed. The random perturbations contain data augmentation
[47, 48], stochastic regularization (e.g. Dropout [50]) [51, 52], and adversarial perturbations [54]. Analytically, consistency regularization enhances the smoothness of function prediction [53, 54].CRGAN [40] introduces consistency regularization in the GAN discriminator and improves stateoftheart FID scores for conditional generation. In addition, CRGAN outperforms gradientbased regularizers: Gradient Penalty [37], DRAGAN [38] and JSRegularizer [39]. However, the CRGAN method on generated images often fails. We seek to explore this limitation and demonstrate the effectiveness of adding consistency regularization which employs both fake and reconstructed images.
Iii Proposed Method
Iiia Preliminaries
The goal of unpaired I2I translation is to learn the mapping within two domains and given training data and where and . We denote data distributions in two domains , , generators , , and discriminators , , where learns to distinguish real data from fake data , learns differences from . The objective consists of an adversarial loss [15] and a constraint term to encourage generators to produce samples that are structurally similar to inputs, and avoid excessive hallucinations and mode collapse that would increase the loss.
IiiA1 Adversarial loss
The adversarial loss is employed to match the distribution of the fake images to the target image distribution, as written by
IiiA2 Unpaired I2I objective
unpaired I2I translation requires the additional loss to support forward and backward mappings between two domains. Thus, the full objective of unpaired I2I models (UI2I) is given by
(2)  
CycleGAN [18] imposes a pixelwise constraint in the form of cycleconsistency loss [18] as the constraint term ,
IiiB Consistency Regularization for GANs
CRGAN [40] proposes a simple, effective and fast training stabilizer introducing consistency regularization on the GANs discriminator. Assuming that the decision of the discriminator should be invariant to any valid domainspecific data augmentations, the sensitivity of the discriminator is penalized to randomly augmented data. It improves FID scores in conditional generations. The consistency regularization loss for discriminator is given by
(4) 
where denotes a stochastic data augmentation function, e.g., flipping the image horizontally or translating the image by a few pixels. However, [40] reports that an additional regularization using generated images is not always superior to the original CRGAN method.
IiiC Augmented Cyclic Consistency Regularization
We propose augmented cyclic consistency regularization (ACCR) for stabilizing training in unpaired I2I models. ACCR enforces consistency regularization on discriminators leveraging real, fake, reconstructed, and augmented samples. The goal is to verify the effectiveness of consistency regularization, even where fake and reconstructed data are employed from datasets which include noise (e.g., SVHN or MNISTM). An overview of ACCRCycleGAN is shown in Fig. 2.
We define consistency regularization losses on discriminators and leveraging real, fake, and reconstructed data denoted by , , and , respectively. which is identical to CRGAN is written as
(5)  
Given fake samples {}, {} and augmented samples {}, {}, is written as
where denotes a stochastic data augmentation function which is semanticspreserving such as random crop, random rotation, or cutout [47]. Given reconstructed samples {}, {} and augmented samples {}, {}, is written as
(7)  
By default, we use the random crop as , and explore the effects of other functions in Section IVE2.
IiiD Full Objective
Finally, the objective of augmented cyclic consistency regularized unpaired I2I models (ACCRUI2I) given unpaired data is written as
(8)  
where the hyperparameters , , and control the weights of the regularization terms.
For comparison, we investigate generation property differences between standard GANs and unpaired I2I models and study why the proposed method is advantageous for I2I models in Section IVD.
Model  MNIST MNISTM  MNISTM MNIST  MNIST SVHN  SVHN MNIST 

CycleGAN  
CRCycleGAN  
CRCycleGAN + CRFake (Ours)  
CRCycleGAN + CRRec (Ours)  
ACCRCycleGAN (Ours) 
Feature Distance (MSE)  MNIST test  MNISTM test 

CycleGAN  
CRCycleGAN  
ACCRCycleGAN (Ours) 
Iv Experiments
This section validates our proposed ACCR method in digit translations with noise and various backgrounds (MNIST MNISTM and MNIST
SVHN). First, we present details concerning the datasets and experimental implementation. Next, we conduct a quantitative analysis to demonstrate the performance on digit translations and investigate the feature distance between real and augmented samples in discriminators for verifying the effect of ACCR. We then conduct a qualitative analysis to compare generation quality between unpaired I2I models and standard GANs, in particular, at the initial and end epochs. Finally, we conduct ablation studies to compare consistency regularization utilizing fake and reconstructed images and explore the importance of choices with respect to data augmentation and cycleconsistent constraints.
Iva Datasets
MNIST MNISTM: MNIST [43] contains centered, 2828 pixel grayscale images of singledigit numbers on a black background, 60,000 images for training and 10,000 for validation. We rescale to 3232 pixels and extend the channel to RGB. MNISTM [44] contains centered, 3232 pixel digits on a variant background which is substituted by a randomly extracted patch obtained from color photos from BSDS500 [45], 59,000 images for training and 1,000 for validation.
MNIST SVHN: We preprocess MNIST [43] as above. SVHN [46] is the challenging realworld Street View House Number dataset, much larger in scale than the other considered datasets. It contains 3232 pixel color samples, 73,257 images for training and 26,032 images for validation. Besides varying the shape and texture, the images often contain extraneous numbers in addition to those which are labeled and centered.
IvB Implementation
IvB1 Network architecture
We adopt architecture for our networks based on Hoffman et al [6]. The generator consists of two slide2 convolutional layers followed by two residual blocks and then two deconvolution layers with slide . The discriminator network consists of PatchGAN [16] with 5 convolutional layers. For all digit experiments, we use a variant of LeNet [43] architecture with 2 convolutional layers and 2 fully connected layers for 3232 pixel images.
IvB2 Training details
In terms of , we replace binary crossentropy loss by a leastsquares loss [35] to stabilize GANs optimization as per [18]. For all the experiments, we exploit the Adam solver [49]
to optimize the objective with a learning rate of 0.0002 on the generators and 0.0001 on the discriminators, and first (second) moment estimates of 0.5 (0.999). We train for the first 10 epochs and then linearly decay the learning rate to zero over 20 epochs. Moving on,
is set to 1 and and linearly increase from zero to half of because higher quality and diversified samples are guaranteed in the latter part of the training. We set the magnitude of cycleconsistency as 10 in MNIST and 0.1 in MNISTM and SVHN. By default, random crop is adopted as the stochastic data augmentation function.Data Augmentation  CRCycleGAN  ACCRCycleGAN (Ours)  
Direction  MNIST MNISTM  MNISTM MNIST  MNIST MNISTM  MNISTM MNIST 
(1) Random Crop  
(2) Random Rotation  
(3) Random Crop&Rotation  
(4) Cutout [47]  
(5) Random Erasing [48]  
(6) Color Jitter  
(7) Crop&Rotation&Jitter 
Model  MNIST SVHN  SVHN MNIST 

RCAL  
CRRCAL  
ACCRRCAL (Ours) 
IvB3 Evaluation details
For evaluation of all digit translations, we train revised LeNets [43] in MNIST, MNISTM, and SVHN, which reach classification accuracies of 99.2%, 97.5%, and 91.0%, respectively. We fix these classifiers for the tests, experiment 5 times with different random seeds, and report classification accuracies (%) on fake samples.
IvC Quantitative Analysis
Our proposed method is compared against CRGAN [40] in Table I. We conduct experiments on CycleGAN [18] as a baseline, CRCycleGAN, a CycleGAN with consistency regularization using real samples, and our ACCRCycleGAN on MNIST MNISTM and MNIST SVHN. ACCRCycleGAN outperforms CycleGAN and CRCycleGAN in all translations. To identify the sensitivity of the discriminators to the augmented data, we calculate the mean squared error (MSE) in the feature space between the real and augmented data as shown in Table II. ACCR and CR decrease the distance to the baseline and ACCR exerts a greater impact than the baseline and CR, especially in MNISTM. Therefore, ACCR improves the impact of consistency regularization and exhibits superior performance.
IvD Qualitative Analysis
Unpaired I2I problems are innately illposed and thus could have infinite solutions. Here we show generated samples in Fig. 3. It seems impossible to determine only one mapping from a grayscale to a color background in the translation from real to fake on MNIST MNISTM (Fig. 2(a)) and the reconstruction on MNISTM MNIST (Fig. 2(b)). However, we leverage the stochastic property as diversified samples of consistency regularization. Indeed, the property is significant in the MNISTM domain and accordingly ACCR decreases the feature distance to the greatest extent (Table II).
Furthermore, CRGAN [40] reports that consistency regularization on generated samples (CRFake) does not always lead to improvements. By investigating this limitation, we found that standard GANs fail to produce recognizable samples at the initial and end steps because, respectively, the GANs are unable to fully capture the data distribution (Fig. 3(a)) and may cause mode collapse (Fig. 3(b)). However, unpaired I2I translation induces these problems to a lesser extent (Fig. 4) due to image conditioning and the constraint term . Hence, I2I models can preserve semantics even at the first and end epochs and this justifies using fake and reconstructed images for consistency regularization.
IvE Ablation Studies
IvE1 Comparison with CRFake, CRRec, and ACCR
To explore the effect of CRFake, CRRec, and ACCR, we compare each model on MNIST MNISTM and MNIST SVHN as shown in Table I. Sometimes CRFake and CRRec are inferior to CR, but ACCR is always superior.
IvE2 Comparison with other data augmentation
IvE3 Comparison with other cycleconsist constraints
Table IV also shows results of an experiment with CycleGAN with Relaxed Cyclic Adversarial Learning (RCAL), which is a much looser constraint than having consistency in the pixel space, to verify our regularization with featurelevel cycleconsistent constraints. RCAL is a naive extension of CycleGAN to the semanticaware cycles using taskspecific classifiers. ACCRRCAL surpasses the RCAL baseline and CRRCAL. Therefore, ACCR does not limit the choice of the constraint in pixel space. Rather, it is compatible with featurewise cycleconsistent models.
Method  W/O  GP  CR  ACCR (Ours) 

Speed (step/s) 
IvE4 Training speed
In terms of computational cost, we measure the actual update speeds of the discriminators for ACCRCycleGAN with NVIDIA Tesla P100 in Table V. ACCR marginally increases the forward pass of the discriminators compared with CR. ACCRCycleGAN is around 1.5 times faster than CycleGAN with Gradient Penalty [37]. We observe that CycleGAN with Gradient Penalty sometimes degrades from the baseline as observed in [36, 40].
V Conclusion
In this paper, we propose a novel, simple, and effective training stabilizer ACCR in unpaired I2I translation. We demonstrate the effectiveness of adding consistency regularization using both fake and reconstructed data. In experiments, our ACCR outperforms the baseline and the CRGAN method in several digit translations. Furthermore, the proposed method surpasses the CRGAN in various situations where the cycleconsistent constraint and the data augmentation function are different.
Acknowledgments
This work was partially supported by JSPS KAKENHI Grant Number 19K22865 and a hardware donation from Yu Darvish, a Japanese professional baseball player for the Chicago Cubs of Major League Baseball.
References
 [1] D. Pathak, P. Krahenbuhl, J. Donahue, T. Darrell, and A. A. Efros. Context Encoders: Feature Learning by Inpainting. Proc. CVPR, 2014.
 [2] C. Dong, C. C. Loy, K. He, and X. Tang. Image SuperResolution Using Deep Convolutional Networks. Proc. TPAMI, 2016.
 [3] J. Kim, J. K. Lee, and K. M. Lee. Accurate Image SuperResolution Using Very Deep Convolutional Networks. Proc. CVPR, 2016.
 [4] R. Zhang, P. Isola, and A. A. Efros. Colorful Image Colorization. Proc. ECCV, 2016.

[5]
L. A. Gatys, A. S. Ecker, and M. Bethge. Image Style Transfer Using Convolutional Neural Networks.
Proc. CVPR, 2016.  [6] J. Hoffman, E. Tzeng, T. Park, J.Y. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell. CyCADA: CycleConsistent Adversarial Domain Adaptation. Proc. ICML, 2018.
 [7] Z. Murez, S. Kolouri, D. Kriegman, R. Ramamoorthi, and K. Kim. Image to Image Translation for Domain Adaptation. Proc. CVPR, 2018.
 [8] W. Hong, Z. Wang, M. Yang, and J. Yuan. Conditional Generative Adversarial Network for Structured Domain Adaptation, Proc. CVPR, 2018.
 [9] P. Russo, F. M. Carlucci, T. Tommasi, and B. Caputo. From source to target and back: symmetric bidirectional adaptive GAN. Proc. CVPR, 2018.
 [10] Z. Zhong, L. Zheng, Z. Zheng, S. Li, and Y. Yang. Camera style adaptation for person reidentification. Proc. CVPR, 2018.
 [11] W. Deng, L. Zheng, G. Kang, Y. Yang, Q. Ye, and J. Jiao. Imageimage domain adaptation with preserved selfsimilarity and domaindissimilarity for person reidentification. Proc. CVPR, 2018.
 [12] L. Wei, S. Zhang, W. Gao, and Q. Tian. Person Transfer GAN to Bridge Domain Gap for Person ReIdentification. Proc. CVPR, 2018.
 [13] Z. Zhong, L. Zheng, Z. Luo, S. Li, and Y. Yang. Invariance matters: Exemplar memory for domain adaptive person reidentification. Proc. CVPR, 2019.
 [14] Z. Zheng, X. Yang, Z. Yu, L. Zheng, Y. Yang, and J. Kautz. Joint Discriminative and Generative Learning for Person Reidentification. Proc. CVPR, 2019.
 [15] I. J. Goodfellow, J. P. Abadie, M. Mirza, B. Xu, D. W. Farley, S. Ozair, A. Courville, and Y. Bengio. Generative Adversarial Networks. Proc. NeurIPS, 2014.

[16]
P. Isola, J.Y. Zhu, T. Zhou, and A. A. Efros. Imagetoimage translation with conditional adversarial networks.
Proc. CVPR, 2017.  [17] J.Y Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman. Toward Multimodal ImagetoImage Translation. Proc. NeurIPS 2017.
 [18] J.Y Zhu, T. Park, P. Isola, and A. A. Efros. Unpaired imagetoimage translation using cycleconsistent adversarial networks. Proc. ICCV, 2017.
 [19] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim. Learning to Discover CrossDomain Relations with Generative Adversarial Networks. Proc. ICML, 2017.
 [20] M. Y. Liu, T. Breuel, and J. Kautz. Unsupervised ImagetoImage Translation Networks. Proc. NeurIPS, 2017.
 [21] A. Almahairi, S. Rajeswar, A. Sordoni, P. Bachman, and A. Courville. Augmented CycleGAN: Learning ManytoMany Mappings from Unpaired Data. Proc. ICML, 2018.
 [22] X. Huang, M. Y. Liu, S. Belongie, and J. Kautz. Multimodal Unsupervised ImagetoImage Translation. Proc. ECCV, 2018.
 [23] L. Ma, X. Jia, S. Georgoulis, T. Tuytelaars, and L. V. Gool. Exemplar Guided Unsupervised ImagetoImage Translation with Semantic Consistency. Proc. ICLR, 2019.
 [24] H. Y. Lee, H. Y. Tseng, J. B. Huang, M. K. Singh, and M. H. Yang. Diverse ImagetoImage Translation via Disentangled Representations. Proc. ECCV, 2018.
 [25] D. Ulyanov, A. Vedaldi, and V. S. Lempitsky. Instance Normalization: The Missing Ingredient for Fast Stylization. arXiv preprint arXiv:1607.08022, 2016.
 [26] V. Dumoulin, J. Shlens, and M. Kudlur. A Learned Representation for Artistic Style. Proc. ICLR, 2016.
 [27] X. Huang, and S. Belongie. Arbitrary Style Transfer in Realtime with Adaptive Instance Normalization. Proc. ICCV, 2017.
 [28] H. Nam and H. E. Kim. BatchInstance Normalization for Adaptively StyleInvariant Neural Networks. Proc. NeurIPS, 2018.
 [29] T. Park, M. Y. Liu, T. C. Wang, and J. Y. Zhu. Semantic Image Synthesis with SpatiallyAdaptive Normalization. Proc. CVPR, 2019.
 [30] J. Kim, M. Kim, H. Kang, and K. Lee. UGATIT: Unsupervised Generative Attentional Networks with Adaptive LayerInstance Normalization for ImagetoImage Translation. Proc. ICLR, 2020.
 [31] W. Wu, K. Cao, C. Li, C. Qian, and C. C. Loy. TransGaGa: GeometryAware Unsupervised ImagetoImage Translation. Proc. CVPR, 2019.
 [32] S. Ioffe and C. Szegedy. Batch normalization: Accelerating deep network training by reducing internal covariate shift. arXiv preprint arXiv:1502.03167, 2015.
 [33] J. Ba, J. R. Kiros, and G. E. Hinton. Layer Normalization. arXiv preprint arXiv:1607.06450, 2016.
 [34] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida. Spectral Normalization for Generative Adversarial Networks. Proc. ICLR, 2018.
 [35] X. Mao, Q. Li, H. Xie, R. Y.K. Lau, and Z. Wang, S. P. Smolley. Least Squares Generative Adversarial Networks. Proc. ICCV, 2017.
 [36] K. Kurach, M. Lucic, X. Zhai, M. Michalski, and S. Gelly. A LargeScale Study on Regularization and Normalization in GANs. Proc. ICML, 2019.
 [37] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville. Improved Training of Wasserstein GANs. Proc. NeurIPS, 2016.
 [38] N. Kodali, J. Abernethy, J. Hays, and Z. Kira. On Convergence and Stability of GANs. arXiv preprint arXiv:1705.07215, 2017.
 [39] K. Roth, A. Lucchi, S. Nowozin, and T. Hofmann. Stabilizing Training of Generative Adversarial Networks through Regularization. Proc. NeurIPS, 2017.
 [40] H. Zhang, Z. Zhang, A. Odena, and H. Lee. Consistency Regularization for Generative Adversarial Networks. Proc. ICLR, 2020.
 [41] A. Radford, L. Metz, and S. Chintala. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks. Proc. ICLR, 2016.
 [42] E. H. Asl, Y. Zhou, C. Xiong, and R. Socher. Augmented Cyclic Adversarial Learning for Low Resource Domain. Adaptation. Proc. ICLR, 2019.
 [43] Y. LeCun, L. Bottou, Y. Bengio, and P. Haner. GradientBased Learning Applied to Document Recognition. Proc. of the IEEE, 86(11):22782324, 1998. 5.
 [44] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. DomainAdversarial Training of Neural Networks. Proc. JMLR, 2015.
 [45] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour Detection and Hierarchical Image Segmentation. Proc. PAMI, 2011.
 [46] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, and Andrew Y. Ng. Reading Digits in Natural Images with Unsupervised Feature Learning. Proc. NeurIPSW, 2011.
 [47] T. DeVries, and G. W. Taylor. Improved Regularization of Convolutional Neural Networks with Cutout. arXiv preprint arXiv:1708.04552, 2017.
 [48] Z. Zhong, L. Zheng, G. Kang, S. Li, and Y. Yang. Random Erasing Data Augmentation. arXiv preprint arXiv:1708.04896, 2017.
 [49] D. P. Kingma, and J. Ba. Adam: A Method for Stochastic Optimization. Proc. ICLR, 2015.
 [50] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 2014.
 [51] M. Sajjadi, M. Javanmardi, and T. Tasdizen. Regularization with stochastic transformations and perturbations for deep semisupervised learning. Proc. NeurIPS, 2016.
 [52] S. Laine, and T. Aila. Temporal Ensembling for SemiSupervised Learning. Proc. ICLR, 2017.
 [53] W. Hu, T. Miyato, S. Tokui, E. Matsumoto, and M. Sugiyama. Learning Discrete Representations via Information Maximizing SelfAugmented Training. Proc. ICML, 2017.
 [54] T. Miyato, S. Maeda, M. Koyama, and S. Ishii. Virtual Adversarial Training: A Regularization Method for Supervised and SemiSupervised Learning. Proc. TPAMI, 2018.