Cartoon-to-real: An Approach to Translate Cartoon to Realistic Images using GAN

by   K M Arefeen Sultan, et al.
University of Calgary

We propose a method to translate cartoon images to real world images using Generative Aderserial Network (GAN). Existing GAN-based image-to-image translation methods which are trained on paired datasets are impractical as the data is difficult to accumulate. Therefore, in this paper we exploit the Cycle-Consistent Adversarial Networks (CycleGAN) method for images translation which needs an unpaired dataset. By applying CycleGAN we show that our model is able to generate meaningful real world images from cartoon images. However, we implement another state of the art technique - Deep Analogy - to compare the performance of our approach.



There are no comments yet.


page 1

page 2


toon2real: Translating Cartoon Images to Realistic Images

In terms of Image-to-image translation, Generative Adversarial Networks ...

H-GAN: the power of GANs in your Hands

We present HandGAN (H-GAN), a cycle-consistent adversarial learning appr...

Exploring Generative Adversarial Networks for Image-to-Image Translation in STEM Simulation

The use of accurate scanning transmission electron microscopy (STEM) ima...

Cycle-Consistent Generative Rendering for 2D-3D Modality Translation

For humans, visual understanding is inherently generative: given a 3D sh...

Unsupervised Enhancement of Real-World Depth Images Using Tri-Cycle GAN

Low quality depth poses a considerable challenge to computer vision algo...

Automatic Feature Highlighting in Noisy RES Data With CycleGAN

Radio echo sounding (RES) is a common technique used in subsurface glaci...

Positional Encoding Augmented GAN for the Assessment of Wind Flow for Pedestrian Comfort in Urban Areas

Approximating wind flows using computational fluid dynamics (CFD) method...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

What if we could see real images of one of the most famous cartoon movies – Spirited Away (2001)? How would it feel to see the protagonist, chihiro’s real life version? Isn’t this most of the caroon-lovers have dreamt of while watching cartoon movies?

In this paper, we present a method – Cartoon-to-Real – to materialize the above desire by performing cartoon to real world image translation. However, it is extremely time consuming and tedious to create a sufficient paired dataset, hence we develope an unpaired one. We extracted cartoon images from different cartoon movies and real images from internet, i.e. flickr, where a cartoon and a realistic image in a pair have no correlations among themselves. Using our Cartoon-to-Real, we achieve significant result in translating the cartoon images to realistic ones.

2 Background and Related Works

Recently, Generative Adversarial Network (GANs) [1] have achieved astounding results in image synthesis such as – text-to-image translation[2]

, image inpainting


, super-resolution

[4] etc. Moreover, GAN is widely used in image-to-image translation, for example – CycleGAN[5] – that uses unpaired training data. It trains two sets of GAN to map class R class C and C R respectively. Recently, CartoonGAN[6] is proposed to translate real word images to cartoon images which converges faster than CycleGAN[5] and performs satisfactorily (see Figure. 1).

(a) Input image
(b) Output image
Fig. 1: Results of CartoonGAN[6] approach. Here, a real world image (a) is translated a cartoon image (b).

3 Proposed Methodology

The main target of our cartoon-to-real is to perform the reverse of CartoonGAN (CartoonReal) and we exploit the CycleGAN[5] technique for this purpose. The model contains two mapping functions and where denotes cartoon and denotes the real domain. There are discriminators (, ) and generators (, ) for the translation process. While performing , tries to enforce the translation to domain , and the vice-versa for and . For the regularization, we implement two cycle consistency losses[5] where authors proposed that the learned mapping functions should be cycle-consistent to avoid direct mapping distribution. The loss is written as -


Hence, the full objective is -


where is the weight or relative importance of the two objectives. Therefore, our aim can be described as -

Fig. 2: Comparison between cartoon to real translation using Deep Analogy (DA) and using cycle consistency loss (CCL). Left column shows the input, middle column shows the output of using Deep Analogy (with the style image on top corner), and the right column presents the results of using cycle consistency loss. It is visible that the cartoon-to-real with cycle consistency loss[5] shows the better translation.

4 Experiment Results

We developed two unpaired datasets to train our network. For the cartoon domain, we collected almost K images scrapped from various movies, e.g. Pokemon, My Neighbour Totoro and Kiki’s Delivery. We used flickr dataset for the real images’ domain. Images are resized to

resolution. For implementation we used PyTorch and for hardware we used

Nvidia GTX . We compared our paper with the outputs from another state of the art work called – Deep Analogy[7]. Our result along with the outputs from Deep Analogy are presented in Figure 2. It exhibits that outputs with cycle consistency loss produce more realistic images than that with Deep Analogy.

5 Conclusion

In this paper, we performed image translation from cartoons to real world images. We used cycle consistency loss to generate images so that the generate image will not be directly mapped into any distribution of target domain. Our research is yet on progress. However, we observed that our results are not completely satisfactory and our upcoming target is to minimize the limitation. In future, we want to investigate on preserving the content of input image from cartoon domain for better translation.