Log In Sign Up

Adversarial Self-Defense for Cycle-Consistent GANs

The goal of unsupervised image-to-image translation is to map images from one domain to another without the ground truth correspondence between the two domains. State-of-art methods learn the correspondence using large numbers of unpaired examples from both domains and are based on generative adversarial networks. In order to preserve the semantics of the input image, the adversarial objective is usually combined with a cycle-consistency loss that penalizes incorrect reconstruction of the input image from the translated one. However, if the target mapping is many-to-one, e.g. aerial photos to maps, such a restriction forces the generator to hide information in low-amplitude structured noise that is undetectable by human eye or by the discriminator. In this paper, we show how such self-attacking behavior of unsupervised translation methods affects their performance and provide two defense techniques. We perform a quantitative evaluation of the proposed techniques and show that making the translation model more robust to the self-adversarial attack increases its generation quality and reconstruction reliability and makes the model less sensitive to low-amplitude perturbations.


page 2

page 7

page 14

page 15

page 16

page 17

page 18

page 19


Unpaired Image-to-Image Translation using Adversarial Consistency Loss

Unpaired image-to-image translation is a class of vision problems whose ...

Unsupervised Image-to-Image Translation with Stacked Cycle-Consistent Adversarial Networks

Recent studies on unsupervised image-to-image translation have made rema...

Unsupervised Image-to-Image Translation with Generative Adversarial Networks

It's useful to automatically transform an image from its original form t...

Stain Isolation-based Guidance for Improved Stain Translation

Unsupervised and unpaired domain translation using generative adversaria...

Self adversarial attack as an augmentation method for immunohistochemical stainings

It has been shown that unpaired image-to-image translation methods const...

Geometry-Consistent Adversarial Networks for One-Sided Unsupervised Domain Mapping

Unsupervised domain mapping aims at learning a function to translate dom...

Sound-to-Imagination: Unsupervised Crossmodal Translation Using Deep Dense Network Architecture

The motivation of our research is to develop a sound-to-image (S2I) tran...

1 Introduction

Figure 1: Results of translation of GTA [26]

frames to semantic segmentation maps using CycleGAN, UNIT and CycleGAN with our two proposed defense methods, additive noise and guess loss. The last column shows the reconstruction of the input image when high-frequency noise (Gaussian noise with mean 0 and standard deviation

intensity levels out of ) is added to the output map. Both of the proposed self-adversarial defense techniques (Section 4) make the CycleGAN model more robust to the random noise and make it rely more on the translation result rather than the adversarial structured noise as in the original CycleGAN and UNIT. More translation examples can be found in the Section 3 of supplementary material. Best viewed in color.

Generative adversarial networks (GANs) [7] have enabled many recent breakthroughs in image generation, such as being able to change visual attributes like hair color or gender in an impressively realistic way, and even generate highly realistic-looking faces of people that do not exist [13; 31; 14]

. Conditional GANs designed for unsupervised image-to-image translation can map images from one domain to another without pairwise correspondence and ground truth labels, and are widely used for solving such tasks as semantic segmentation, colorization, style transfer, and quality enhancement of images

[34; 10; 20; 3; 11; 35; 4] and videos [2; 1]. These models learn the cross-domain mapping by ensuring that the translated image both looks like a true representative of the target domain, and also preserves the semantics of the input image, e.g. the shape and position of objects, overall layout etc. Semantic preservation is usually achieved by enforcing cycle-consistency [34], i.e. a small error between the source image and its reverse reconstruction from the translated target image.

Despite the success of cycle-consistent GANs, they have a major flaw. The reconstruction loss forces the generator network to hide the information necessary to faithfully reconstruct the input image inside tiny perturbations of the translated image [5]. The problem is particularly acute in many-to-one mappings, such as photos to semantic labels, where the model must reconstruct textures and colors lost during translation to the target domain. For example, Figure 1’s top row shows that even when the car is mapped incorrectly to semantic labels of building (gray) and tree (green), CycleGAN is still able to “cheat” and perfectly reconstruct the original car from hidden information. It also reconstructs road textures lost in the semantic map. This behavior is essentially an adversarial attack that the model is performing on itself, so we call it a self-adversarial attack.

In this paper, we extend the analysis of self-adversarial attacks provided in [5]

and show that the problem is present in recent state-of-art methods that incorporate cycle consistency. We provide two defense mechanisms against the attack that resemble the adversarial training technique widely used to increase robustness of deep neural networks to adversarial attacks

[8; 16; 32]

. We also introduce quantitative evaluation metrics for translation quality and reconstruction “honesty” that help to detect self-adversarial attacks and provide a better understanding of the learned cross-domain mapping. We show that due to the presence of hidden embeddings, state of the art translation methods are highly sensitive to high-frequency perturbations as illustrated in Figure 

1. In contrast, our defense methods substantially decrease the amount of self-adversarial structured noise and thus make the mapping more reliant on the input image, which results in more interpretable translation and reconstruction and increased translation quality. Importantly, robustifying the model against the self-adversarial attack makes it also less susceptible to the high-frequency perturbations which make it less likely to converge to a non-optimal solution.

2 Related Work

Unsupervised image-to-image translation is one of the tasks of domain adaptation that received a lot of attention in recent years. Current state-of-art methods [34; 19; 11; 15; 4; 10] solve this task using generative adversarial networks [9]

that usually consist of a pair of generator and discriminator networks that are trained in a min-max fashion to generate realistic images from the target domain and correctly classify real and fake images respectively.

The goal of image-to-image translation methods is to map the image from one domain to another in such way that the output image both looks like a real representative of the target domain and contains the semantics of the input image. In the supervised setting, the semantic consistency is enforced by the ground truth labels or pairwise correspondence. In case when there is no supervision, however, there is no such ground truth guidance, so using regular GAN results in often realistic-looking but unreliable translations. In order to overcome this problem, current state-of-art unsupervised translation methods incorporate cycle-consistency loss first introduced in [34] that forces the model to learn such mapping from which it is possible to reconstruct the input image.

Recently, various methods have been developed for unimodal (CycleGAN [34], UNIT [19], CoGAN [21] etc.) and multimodal (MUNIT [11], StarGAN [4], BicycleGAN [35]) image-to-image translation. In this paper, we explore the problem of self-adversarial attacks in three of them: CycleGAN, UNIT and MUNIT. CycleGAN is a unimodal translation method that consists of two domain discriminators and two generator networks; the generators are trained to produce realistic images from the corresponding domains, while the discriminators aim to distinguish in-domain real images from the generated ones. The generator-discriminator pairs are trained in a min-max fashion both to produce realistic images and to satisfy the cycle-consistency property. The main idea behind UNIT is that both domains share some common semantics, and thus can be encoded to the shared latent space. It consists of two encoder-decoder pairs that map images to the latent space and back; the cross-domain translation is then performed by encoding the image from the source domain to the latent space and decoding it with the decoder for the target domain. MUNIT is a multimodal extension of UNIT that performs disentanglement of domain-specific (style space) and domain-agnostic (content space) features. While the original MUNIT does not use the explicit cycle-consistency loss, we found that cycle-consistency penalty significantly increases the quality of translation and helps the model to learn more reliable content disentanglement (see Figure 2). Thus, we used the MUNIT with cycle-consistency loss in our experiments.

As illustrated in Figure 2, adding cycle-consistency loss indeed helps to disentangle domain-agnostic information and enhance the translation quality and reliability. However, such pixelwise penalty was shown [5] to force the generator to hide the domain-specific information that cannot be explicitly reconstructed from the translated image (i.e., shadows or color of the buildings from maps in maps-to-photos example) in such way that it cannot be detected by the discriminator.

It has been known that deep neural networks [17]

, while providing higher accuracy in the majority of machine learning problems, are highly susceptible to the adversarial attacks

[24; 29; 16; 23]. There exist multiple defense techniques that make neural networks more robust to the adversarial examples, such as adding adversarial examples to the training set or adversarial training [24; 22], distillation [25], ensemble adversarial training [30], denoising [18] and many more. Moreover, [33] have shown that defending the discriminator in a GAN setting increases the generation quality and prevents the model from converging to a non-optimal solution. However, most adversarial defense techniques are developed for the classification task and are very hard to adapt to the generative setting.

3 Self-Adversarial Attack in Cyclic Models

Suppose we are given a number of samples from two image domains and . The goal is to learn two mappings and . In order to learn the distributions and , two discriminators and are trained to classify whether the input image is a true representative of the corresponding domain or generated by or accordingly. The cross-distribution mapping is learned using the cycle-consistency property in form of a loss based on the pixelwise distance between the input image and its reconstruction. Usually, the cycle-consistency loss can be described as following:


However, in case when domain is richer than , the mapping is many-to-one (i.e. if for one image there are multiple correct correspondences ), the generator is still forced to perfectly reconstruct the input even though some of the information of the input image is lost after the translation to the domain . As shown in [5], such behavior of a CycleGAN can be described as an adversarial attack, and in fact, for any given image it is possible to generate such structured noise that would lead to reconstruction of the target image [5].

In practice, CycleGAN and other methods that utilize cycle-consistency loss add a very low-amplitude signal to the translation that is invisible for a human eye. Addition of a certain signal is enough to reconstruct the information of image that should not be present in . This makes methods that incorporate cycle-consistency loss sensitive to low-amplitude high-frequency noise since that noise can destroy the hidden signal (shown in Figure 3). In addition, such behavior can force the model to converge to a non-optimal solution or even diverge since by adding structured noise the model "cheats" to minimize the reconstruction loss instead of learning the correct mapping.

4 Defense techniques

4.1 Adversarial training with noise

One approach to defend the model from a self-adversarial attack is to train it to be resistant to the perturbation of nature similar to the one produced by the hidden embedding. Unfortunately, it is impoossible to separate the pure structured noise from the traslated image, so classic adversarial defense training cannot be used in this scenario. However, it is possible to prevent the model from learning to embed by adding perturbations to the translated image before reconstruction. The intuition behind this approach is that adding random noise of amplitude similar to the hidden signal disturbs the embedded message. This results in high reconstruction error, so the generator cannot rely on the embedding. The modified noisy cycle-consistency loss can be described as follows:


where is some high-frequency perturbation function with parameters . In our experiments we used low-amplitude Gaussian noise with mean equal to zero. Such a simplistic defense approach is very similar to the one proposed in [33]

where the discriminator is defended from the generator attack by regularizing the discriminator objective using the adversarial vectors. In our setting, however, the attack is targeted on both the discriminator and the generator of opposite domain, which makes it harder to find the exact adversarial vector. Which is why we regularize both the discriminator and generator using random noise. Since adding noise to the input image is equivalent to penalizing large magnitude of the gradients of the loss function, this also forces the model to learn smoother boundaries and prevents it from overfitting.

4.2 Guess Discriminator

Ideally, the self-adversarial attack should be detected by the discriminator, but this might be too hard for it since it never sees real and fake examples of the same content. In the supervised setting, this problem is naturally solved by conditioning the outputs on the ground truth labels. For example, a self-adversarial attack does not occur in Conditional GANs because the discriminator is conditioned on the ground truth class labels and is provided with real and fake examples of each class. In the unsupervised setting, however, there is no such information about the class labels, and the discriminator only receives unpaired real and fake examples from the domain. This task is significantly harder for the discriminator as it has to learn the distribution of the whole domain. One widely used defense strategy is adding the adversarial examples to the training set. While it is possible to model the adversarial attack of the generator, it is very time and memory consuming as it requires training an additional network that generates such examples at each step of training the GAN. However, we can use the fact that cycle-consistency loss forces the model to minimize the difference between the input and reconstructed images, so we can use the reconstruction output to provide the fake example for the real input image as an approximation of the adversarial example.

Thus, the defense during training can be formulated in terms of an additional guess discriminator that is very similar to the original GAN discriminator, but receives as input two images – input and reconstruction – in a random order, and "guesses" which of the images is fake. As with the original discriminator, the guess discriminator is trained to minimize its error while the generator aims to produce such images that maximize it. The guess discriminator loss or guess loss can be described as:


where , . This loss resembles the class label conditioning in the Conditional GAN in the sense that the guess discriminator receives real and fake examples that are presumably of the same content, therefore the embedding detection task is significantly simplified.

In addition to the defense approaches described above, it is beneficial to use the fact that the relationship between the domains is one-to-many. One naive solution to add such prior knowledge is by assigning a smaller weight to the reconstruction loss of the "richer" domain (e.g. photos in maps-to-photos experiment). Results of our experiments show substantial improvement in the generation quality when such a domain relation prior is used.

5 Experiments and results

In abundance of GAN-based methods for unsupervised image translation, we limited our analysis to three popular state-of-art models that cover both unimodal and multimodal translation cases: CycleGAN[34], UNIT[19] and MUNIT[11]

. The details on model architectures and choice of hyperparameters used in our experiments can be found in the supplementary materials.

Figure 2: Comparison of translation results produced by original MUNIT method and MUNIT with additional cycle-consistency loss. In columns 2 and 3 are shown the translation results with two different randomly generated style vectors. It can be observed that, while both methods incorrectly disentangled style and content information, the method that contains cycle-consistency loss forces the model to preserve the overall scene layout and produce more reliable translation in general. Column 5 shows the results of reconstruction of the input image from the maps with the first random style (column 2). More examples on Google Maps translation can be found in the supplementary material. Best viewed in color.

5.1 Datasets

To provide empirical evidence of our claims, we performed a sequence of experiments on three publicly available image-to-image translation datasets. Despite the fact that all three datasets are paired and hence the ground truth correspondence is known, the models that we used are not capable of using the ground-truth alignment by design and thus were trained in an unsupervised manner.

Google Aerial Photo to Maps dataset consisting of 3292 pairs of aerial photos and corresponding maps. In our experiments, we resized the images from pixels to pixels for MUNIT and UNIT and to pixels for CycleGAN. During training, the images were randomly cropped to for UNIT and MUNIT and for CycleGAN. The dataset is available at [6]. We used 1098 images for training and 1096 images for testing.

Playing for Data (GTA)[26] dataset that consists of 24966 pairs of image frames and their semantic segmentation maps. We used a subset of 10000 frames (7500 images for training, 2500 images for testing) with day-time lighting resized to pixels, and randomly cropped with window size .

SynAction [28] synthetic human action dataset consisting of a set of 20 possible actions performed by 10 different human renders. For our experiments, we used two actors and all existing actions to perform the translation from one actor to another; all other conditions such as background, lighting, viewpoint etc. are chosen to be the same for both domains. We used this dataset to test whether the self-adversarial attack is present in the one-to-one setting. The original images were resized to and cropped to . We split the data to 1561 images in each domain for training 357 images for testing.

5.2 Metrics

Translation quality. The choice of aligned datasets was dictated by the need to quantitatively evaluate the translation quality which is impossible when the ground truth correspondence is unknown. However, even having the ground truth pairs does not solve the issue of quality evaluation in one-to-many case, since for one input image there exist a large (possibly infinite) number of correct translations, so pixelwise comparison of the ground truth image and the output of the model does not provide a correct metric for the translation quality.

Figure 3: Actor translation example with CycleGAN, CycleGAN with noise and CycleGAN with guess loss.

In order to overcome this issue, we adopted the idea behind the Inception Score [27] and trained the supervised Pix2pix[12] model to perform many-to-one mapping as an intermediate step in the evaluation. Considering the GTA dataset example, in order to evaluate the unsupervised mapping from segmentation maps to real frames (later on – segmentation to real), we train the Pix2pix model to translate from real to segmentation; then we feed it the output of the unsupervised model to perform "honest" reconstruction of the input segmentation map, and compute the Intersection over Union (IoU) and mean class-wise accuracy of the output of Pix2Pix when given a ground truth example and the output of the one-to-many translation model. For any ground truth pair , the one-to-many translation quality is computed as where is the translation with Pix2pix from to . The "honest reconstruction" is compared with the Pix2pix translation of the ground truth image instead of the ground truth image itself in order to take into account the error produced by the Pix2pix translation.

Figure 4: Illustration of sensitivity (Eq. 5) of cycle-consistent translation methods to high-frequency perturbations in one-to-many (left) and in many-to-one (right) cases. Here the domains A and B are segmentation maps and GTA video frames respectively.
Method MSE SN
CycleGAN 32.547 6.5 2.2
CycleGAN+noise* 22.182 1.1 0.1
CycleGAN+guess* 23.565 2.4 0.2
Table 1: Results on SynAction dataset: mean square error of the translation and sensitivity to noise.

Reconstruction honesty. Since it is impossible to acquire the structured noise produced as a result of a self-adversarial attack, there is no direct way to either detect the attack or measure the amount of information hidden in the embedding.

In order to evaluate the presence of a self-adversarial attack, we developed a metric that we call quantized reconstruction honesty. The intuition behind this metric is that, ideally, the reconstruction error of the image of the richer domain should be the same as the one-to-many translation error if given the same input image from the poorer domain. In order to measure whether the model is independent of the origin of the input image, we quantize the many-to-one translation results in such way that it only contains the colors from the domain-specific palette. In our experiments, we approximate the quantized maps by replacing the colors of each pixel by the closest one from the palette. We then feed those quantized images to the model to acquire the "honest" reconstruction error, and compare it with the reconstruction error without quantization. The honesty metric for a one-to-many reconstruction can be described as follows:


where is a quantization operation, is a many-to-one mapping, is a ground truth pair of examples from domains and .

Figure 5: Quantized reconstruction results of the original CycleGAN, CycleGAN with noise defense and CycleGAN with guess loss defense. After translating the input GTA frame to the semantic translation map, we performed quantization such that the resulting translation would only contain the colors present in the real segmentation maps. We then fed the quantized translation results to reconstruct the input image (column 5). The last column represents the translation from the corresponding ground truth semantic segmentation map to real frame for comparison. Comparison with the non-quantized reconstruction reveals the degree of embedding of the many-to-one mapping. For example, CycleGAN with the guess loss relies more on the input segmentation map than the original CycleGAN, although it still tends to embed the information about the road marking. More quantized translation examples can be found in the supplementary material. Best viewed in color.

Sensitivity to noise. Aside from the obvious consequences of the self-adversarial attack, such as convergence of the generator to a suboptimal solution, there is one more significant side effect of it – extreme sensitivity to perturbations. Figure 1

shows how addition of low-amplitude Gaussian noise effectively destroys the hidden embedding thus making a model that uses cycle-consistency loss unable to correctly reconstruct the input image. In order to estimate the sensitivity of the model, we add zero-mean Gaussian noise to the translation result before reconstruction and compute the reconstruction error. The sensitivity to noise of amplitude

for a set of images is computed by the following formula:


where MSE is the mean square pixelwise error. The overall sensitivity of a method is then computed as an area under curve of . In our experiments we chose , , for Google Maps and GTA experiments and for the SynAction experiment. In case when there is no structured noise in the translation, the reconstruction error should be proportional to the amplitude of added noise, which is what we observe for the one-to-many mapping using MUNIT and CycleGAN. Surprisingly, UNIT translation is highly senstive to noise even in one-to-many case. The many-to-one mapping result (Figure 3), in contrast, suggests that the structured noise is present, since the reconstruction error increases rapidly and quickly saturates at noise amplitude . The results of one-to-many and many-to-one noisy reconstruction show that both noisy CycleGAN and guess loss defense approaches make the CycleGAN model more robust to high-frequency perturbations compared to the original CycleGAN.

5.3 Results.

The results of our experiments show that the problem of self-adversarial attacks is present in all three cycle-consistent methods we examined. Surprisingly, the results on the SynAction dataset had shown that self-adversarial attack appear even if the learned mapping is one-to-one (Table 1). Both defense techniques proposed in Section 4 make CycleGAN more robust to random noise and increase its translation quality (see Tables 1, 2 and 3). The noise-regularization defense helps the CycleGAN model to become more robust both to small perturbations and to the self-adversarial attack. The guess loss approach, on the other hand, while allowing the model to hide some small portion of information about the input image (for example, road marking for the GTA experiment), produces more interpretable and reliable reconstructions. Since both defense techniques force the generators to rely more on the input image than on the structured noise, their results are more interpretable and provide deeper understanding of the methods "reasoning". For example, since the training set did not contain any examples of a truck that is colored in white and green, at test time the guess-loss CycleGAN approximated the green part of the truck with the "vegetation" class color and the white part with the building class color (see Section 3 of the supplementary material); the reconstructed frame looked like a rough approximation of the truck despite the fact that the semantic segmentation map was wrong. This can give a hint about the limitations of the given training set.

Method acc. segm IoU segm IoU p2p RH SN
CycleGAN 0.226 0.157 0.203 27.434 6.138 446.924
CycleGAN + noise* 0.240 0.167 0.230 9.166 7.366 94.150
CycleGAN + guess* 0.237 0.169 0.208 11.380 7.026 212.589
UNIT 0.075 0.044 0.063 6.373 11.685 361.521
MUNIT + cycle 0.126 0.084 0.173 2.498 8.859 244.950
pix2pix (supervised) 0.404 0.337
Table 2: Results on the GTA V dataset. acc. segm and IoU segm represent mean class-wise segmentation accuracy and IoU, IoU p2p is the mean IoU of the pix2pix segmentation of the segmentation-to-frame mappeing; RH (Eq.4) and SN(Eq.5) are the quantized reconstruction honesty and sensitivity to noise of the many-to-one mapping (B2A2B) respectively. * – our proposed defense methods. The reconstruction error distributions plots can be found in the supplementary material (Section 2).
Method acc. segm IoU segm IoU p2p RH SN
CycleGAN 0.233 0.175 0.210 21.775 5.164 251.192
CycleGAN + noise* 0.242 0.187 0.218 12.266 4.415 222.176
CycleGAN + guess* 0.241 0.184 0.224 7.467 2.381 235.432
UNIT 0.212 0.153 0.124 19.631 6.070 528.223
MUNIT + cycle 0.153 0.094 0.124 21.425 7.855 687.276
pix2pix (supervised) 0.301 0.234
Table 3: Results on the Google Maps dataset. The notation is same as in the Table 2.

6 Conclusion

In this paper, we introduced the self-adversarial attack phenomenon of unsupervised image-to-image translation methods – the hidden embedding performed by the model itself in order to reconstruct the input image with high precision. We empirically showed that self-adversarial attack appears in models when the cycle-consistency property is enforced and the target mapping is many-to-one. We provided the evaluation metrics that help to indicate the presence of self-adversarial attack, and a translation quality metric for one-to-many mappings. We also developed two adversarial defense techniques that significantly reduce the hidden embedding and force the model to produce more "honest" results, which, in return, increases its translation quality.


  • [1] A. Bansal, S. Ma, D. Ramanan, and Y. Sheikh (2018) Recycle-gan: unsupervised video retargeting. In

    Proceedings of the European Conference on Computer Vision (ECCV)

    pp. 119–135. Cited by: §1.
  • [2] D. Bashkirova, B. Usman, and K. Saenko (2018) Unsupervised video-to-video translation. arXiv preprint arXiv:1806.03698. Cited by: §1.
  • [3] Q. Chen and V. Koltun (2017) Photographic image synthesis with cascaded refinement networks. In The IEEE International Conference on Computer Vision (ICCV), Vol. 1. Cited by: §1.
  • [4] Y. Choi, M. Choi, M. Kim, J. Ha, S. Kim, and J. Choo (2018) Stargan: unified generative adversarial networks for multi-domain image-to-image translation. In

    Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition

    pp. 8789–8797. Cited by: §1, §2, §2.
  • [5] C. Chu, A. Zhmoginov, and M. Sandler (2017) Cyclegan, a master of steganography. arXiv preprint arXiv:1712.02950. Cited by: §1, §1, §2, §3.
  • [6] A. Efros (2017) Google aerial photos and maps dataset. External Links: Link Cited by: §5.1.
  • [7] I. Goodfellow, Y. Bengio, A. Courville, and Y. Bengio (2016) Deep learning. Vol. 1, MIT press Cambridge. Cited by: §1.
  • [8] I. J. Goodfellow, J. Shlens, and C. Szegedy (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572. Cited by: §1.
  • [9] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §2.
  • [10] J. Hoffman, E. Tzeng, T. Park, J. Zhu, P. Isola, K. Saenko, A. A. Efros, and T. Darrell (2017) CyCADA: cycle-consistent adversarial domain adaptation. CoRR abs/1711.03213. External Links: Link, 1711.03213 Cited by: §1, §2.
  • [11] X. Huang, M. Liu, S. Belongie, and J. Kautz (2018) Multimodal unsupervised image-to-image translation. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 172–189. Cited by: §1, §2, §2, §5.
  • [12] P. Isola, J. Zhu, T. Zhou, and A. A. Efros (2017)

    Image-to-image translation with conditional adversarial networks

    In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1125–1134. Cited by: §5.2.
  • [13] T. Karras, T. Aila, S. Laine, and J. Lehtinen (2017) Progressive growing of gans for improved quality, stability, and variation. arXiv preprint arXiv:1710.10196. Cited by: §1.
  • [14] T. Karras, S. Laine, and T. Aila (2018) A style-based generator architecture for generative adversarial networks. arXiv preprint arXiv:1812.04948. Cited by: §1.
  • [15] T. Kim, M. Cha, H. Kim, J. K. Lee, and J. Kim (2017) Learning to discover cross-domain relations with generative adversarial networks. In Proceedings of the 34th International Conference on Machine Learning-Volume 70, pp. 1857–1865. Cited by: §2.
  • [16] A. Kurakin, I. Goodfellow, and S. Bengio (2016) Adversarial examples in the physical world. arXiv preprint arXiv:1607.02533. Cited by: §1, §2.
  • [17] Y. LeCun, Y. Bengio, and G. Hinton (2015) Deep learning. nature 521 (7553), pp. 436. Cited by: §2.
  • [18] F. Liao, M. Liang, Y. Dong, T. Pang, X. Hu, and J. Zhu (2018) Defense against adversarial attacks using high-level representation guided denoiser. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1778–1787. Cited by: §2.
  • [19] M. Liu, T. Breuel, and J. Kautz (2017) Unsupervised image-to-image translation networks. In Advances in Neural Information Processing Systems, pp. 700–708. Cited by: §2, §2, §5.
  • [20] M. Liu, T. Breuel, and J. Kautz (2017) Unsupervised image-to-image translation networks. CoRR abs/1703.00848. External Links: Link, 1703.00848 Cited by: §1.
  • [21] M. Liu and O. Tuzel (2016) Coupled generative adversarial networks. In Advances in neural information processing systems, pp. 469–477. Cited by: §2.
  • [22] A. Madry, A. Makelov, L. Schmidt, D. Tsipras, and A. Vladu (2017) Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083. Cited by: §2.
  • [23] S. Moosavi-Dezfooli, A. Fawzi, and P. Frossard (2016) Deepfool: a simple and accurate method to fool deep neural networks. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2574–2582. Cited by: §2.
  • [24] N. Papernot, P. McDaniel, S. Jha, M. Fredrikson, Z. B. Celik, and A. Swami (2016) The limitations of deep learning in adversarial settings. In 2016 IEEE European Symposium on Security and Privacy (EuroS&P), pp. 372–387. Cited by: §2.
  • [25] N. Papernot, P. McDaniel, X. Wu, S. Jha, and A. Swami (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In 2016 IEEE Symposium on Security and Privacy (SP), pp. 582–597. Cited by: §2.
  • [26] S. R. Richter, V. Vineet, S. Roth, and V. Koltun (2016) Playing for data: ground truth from computer games. In European Conference on Computer Vision, pp. 102–118. Cited by: Figure 1, §5.1.
  • [27] T. Salimans, I. Goodfellow, W. Zaremba, V. Cheung, A. Radford, and X. Chen (2016) Improved techniques for training gans. In Advances in neural information processing systems, pp. 2234–2242. Cited by: §5.2.
  • [28] X. Sun, H. Xu, and K. Saenko (2018) A two-stream variational adversarial network for video generation. arXiv preprint arXiv:1812.01037. Cited by: §5.1.
  • [29] C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, and R. Fergus (2013) Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199. Cited by: §2.
  • [30] F. Tramèr, A. Kurakin, N. Papernot, I. Goodfellow, D. Boneh, and P. McDaniel (2017) Ensemble adversarial training: attacks and defenses. arXiv preprint arXiv:1705.07204. Cited by: §2.
  • [31] T. Wang, M. Liu, J. Zhu, A. Tao, J. Kautz, and B. Catanzaro (2018) High-resolution image synthesis and semantic manipulation with conditional gans. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8798–8807. Cited by: §1.
  • [32] Z. Yan, Y. Guo, and C. Zhang (2018) Deep defense: training dnns with improved adversarial robustness. In Advances in Neural Information Processing Systems, pp. 419–428. Cited by: §1.
  • [33] B. Zhou and P. Krähenbühl (2018) Don’t let your discriminator be fooled. Cited by: §2, §4.1.
  • [34] J. Zhu, T. Park, P. Isola, and A. A. Efros (2017) Unpaired image-to-image translation using cycle-consistent adversarial networks. arXiv preprint arXiv:1703.10593. Cited by: §1, §2, §2, §2, §5.
  • [35] J. Zhu, R. Zhang, D. Pathak, T. Darrell, A. A. Efros, O. Wang, and E. Shechtman (2017) Toward multimodal image-to-image translation. In Advances in Neural Information Processing Systems, pp. 465–476. Cited by: §1, §2.

7 Model description and parameters

In our experiments, we used the implementation of CycleGAN provided at For all CycleGAN models we used (original, noisy and guess-loss based) we set all the CycleGAN parameters to the default ones provided in the implementation except for the weights of the cycle-consistency loss.

The CycleGAN parameters used in our experiments are:

  • Generator architecture – ResNet with 9 residual block layers

  • Discriminator architecture – 3-layer PatchGAN with patch size .

  • Weight initialization – gaussian

  • Instance normalization

  • GAN objective – LSGAN

  • Optimizer – Adam with momentum 0.5

  • Learning rate – 0.0002 with linear policy

  • Trained for 200 epochs.

The parameters specific to the proposed defense techniques are:

  • For training with additive noise: standard deviation of noise that should lie in the interval . The higher is the value of , the harder it is for the model to perform the self-adversarial attack. We chose the minimal value which results in the reconstruction that lacks the high-frequency details that should be lost after the translation, such as road texture or color.

  • For the guess loss – weight of the guess loss . We chose and the cycle-consistency losses weights and such that their corresponding loss values are of the similar magnitude during training. In other words, we choose the loss weights to be such that they all lie within one range and none of them dominates in the overall loss.

For the GTA dataset, the defense-specific parameters are:

  • CycleGAN: . We performed the experiments with on the CycleGAN with the smaller weights and that are proportional to the cross-domain relation as for the guess loss approach (e.g. and ), and this resulted in unreliable translation.

  • CycleGAN + noise: , , .

  • CycleGAN + guess loss: .

For the SynAction, the defense-specific parameters are:

  • CycleGAN: .

  • CycleGAN + noise: , , .

  • CycleGAN + guess loss: .

For the Google Maps dataset, we used the following parameters:

  • CycleGAN: .

  • CycleGAN + noise: , , .

  • CycleGAN + guess loss: .

We based our experiments on the UNIT and MUNIT models on their original implementation:

UNIT architecture and parameters are:

  • Optimizer – Adam with momentum 0.5 and second momentum 0.999

  • Initialization – Kaiming

  • Learning rate – 0.0001 with step decay policy (decay weight 0.5, step size 10000 iterations)

  • weight on image reconstruction loss – 10

  • weight on cycle-consistency loss – 10

  • – weight of KL loss for cycle consistency – 0.01.

  • Discriminator – 4-layer multiscale LSGAN with leaky ReLU activation function and 3 scales.

  • Generator – VAE with ReLU activations, with 64 filters in the first layer, 2 downsampling layers and 4 residual blocks for the content encoder and decoder.

  • Padding – reflect.

MUNIT parameters are:

  • Optimizer – Adam with momentum 0.5 and second momentum 0.999

  • Initialization – Kaiming

  • Learning rate – 0.0001 with step decay policy (decay weight 0.5, step size 10000 iterations)

  • weight on image reconstruction loss – 10

  • weight on explicit cycle-consistency loss – 1

  • – weight of KL loss for cycle consistency – 0.01.

  • Discriminator – 4-layer multiscale LSGAN with leaky ReLU activation function and 3 scales.

  • Generator – VAE with ReLU activations, with 64 filters in the first layer, with 256 filters in MLP, 2 downsampling layers and 4 residual blocks for the content encoder and decoder.

  • Padding – reflect.

  • Length of style code – 8

The code for the guess loss CycleGAN and noisy CycleGAN can be found in files "" and "" respectively. In order to train or test the model, please add them to the folder "models" of the original CycleGAN project ( and specify the model parameter as "cycle_gan_guess" or "cycle_gan_noisy" instead of "cycle_gan".

8 Statistics

Figure 6: GTA.Left: Difference in the error distribution of the non-quantized vs quantized reconstructions, right: Reconstruction Honesty distributions.
Figure 7: Google Maps. Left: Difference in the error distribution of the non-quantized vs quantized reconstructions, right: Reconstruction Honesty distributions.
Figure 8: Sensitivity to noise on the Google Maps dataset. Left: translation from map to photo to map, right: translation from photo to map to photos.
Figure 9: Sensitivity to noise on the SynAction dataset. Left: translation from actor A to actor B, right: translation from actor B to actor A.

9 Translation Results Figures.

Figure 10: Results of translation of GTA frames to semantic segmentation maps.
Figure 11: Example of translation and reconstruction with CycleGAN + guess loss.
Figure 12: Noisy reconstruction.
Figure 13: Truck translation and reconstruction example with CycleGAN + guess loss.
Figure 14: Noisy reconstruction.
Figure 15: Quantized reconstruction of CycleGAN, UNIT and MUNIT.
Figure 16: Quantized reconstruction of CycleGAN, CycleGAN + noise and CycleGAN + guess loss.
Figure 17: Translation result with the proposed defense techniques.
Figure 18: Noisy reconstruction result.
Figure 19: Noisy reconstruction.
Figure 20: Results of translation of SynAction actors.