Guiding the One-to-one Mapping in CycleGAN via Optimal Transport

11/15/2018 ∙ by Guansong Lu, et al. ∙ Shanghai Jiao Tong University 0

CycleGAN is capable of learning a one-to-one mapping between two data distributions without paired examples, achieving the task of unsupervised data translation. However, there is no theoretical guarantee on the property of the learned one-to-one mapping in CycleGAN. In this paper, we experimentally find that, under some circumstances, the one-to-one mapping learned by CycleGAN is just a random one within the large feasible solution space. Based on this observation, we explore to add extra constraints such that the one-to-one mapping is controllable and satisfies more properties related to specific tasks. We propose to solve an optimal transport mapping restrained by a task-specific cost function that reflects the desired properties, and use the barycenters of optimal transport mapping to serve as references for CycleGAN. Our experiments indicate that the proposed algorithm is capable of learning a one-to-one mapping with the desired properties.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 7

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

Introduction

Image-to-image translation aims at learning a mapping between a source distribution and a target distribution, which can transform an image from the source distribution to that from the target distribution. It covers a variety of computer vision problems including image denoising [Buades, Coll, and Morel2005], segmentation [Long, Shelhamer, and Darrell2015], and saliency detection [Goferman, Zelnik-Manor, and Tal2012]

. Along with the recent popularity of deep supervised learning, many algorithms based on paired training data and deep convolution neural networks have been proposed for specific image-to-image translation tasks. Among them, Pix2pix 

[Isola et al.2016] proposed an image-to-image translation framework utilizing adversarial training technique to force the translation results being indistinguishable from the target distribution.

In practice, it is usually difficult to collect a large amount of paired training data, while unpaired data can usually be obtained more easily, hence unsupervised learning algorithms have also been widely studied. Particularly, generative adversarial networks (GANs; 

[Goodfellow et al.2014]) and dual consistency [He et al.2016] are extensively studied in image-to-image translation. CycleGAN [Zhu et al.2017], DiscoGAN [Kim et al.2017] and DualGAN [Yi et al.2017] adopt these two techniques for solving unsupervised image-to-image translation where GAN loss is used to ensure the generated images being indistinguishable from real images and cycle consistency loss helps to establish a one-to-one mapping between source distribution and target distribution. In this paper, to simplify the terminology, we will use CycleGAN as a representative for these three similar frameworks combining GANs and the idea of cycle consistency.

CycleGAN can establish a one-to-one mapping between two data distributions unsupervisedly with the help of the cycle consistency losses in both directions. However, theoretically, there is no claim on the detailed properties of the mapping established by CycleGAN, which results in a large feasible solution space. Consequently, without meticulously designed network and hyper-parameters, the one-to-one mapping learned by CycleGAN will be a random one within this large space.

For many cross-domain translation tasks, people actually have expected properties on the learned mapping, e.g. in language translation task, people would expect the semantic meaning keeps unchanged. Hence, it will be more satisfactory if we can add explicit constraints on the one-to-one mapping within CycleGAN to control the mapping’s properties, so as to meet the requirements of specific tasks.

Among the many potential feasible maps between two data distributions, it is more promising to find the optimal one according to some measure. Optimal transport (OT) aims at finding a transportation plan [Kantorovich1942] that holds the least cost of transporting the source distribution to the target distribution, given a cost function that specifies the transportation cost between any pair of samples from the two distributions.

It is worth mentioning that the cost function in optimal transport is very flexible. For specific tasks, it is possible to define a cost function to reflect the underlining expectation of the desired mapping properties. For example, given a set of handbags and shoes, if one would like to pair the handbags with the shoes such that they have matched colors, one can specify the cost function to be the distance between their color histograms, and then the optimal transport would find the mapping that has the least overall difference in color distribution.

In summary, CycleGAN lacks the control of the one-to-one mapping, while optimal transport holds the ability to establish a mapping towards the desired property. However, the optimal transport mapping, i.e., transportation plan, is usually not a one-to-one mapping, but many-to-many instead; that is, we cannot directly use optimal transport to build a desired one-to-one mapping. We thus propose to use optimal transport as a reference to endow CycleGAN with the ability of learning a one-to-one mapping with desired properties.

The contributions of this paper has been summarized as below.

  • We study the properties of the one-to-one mapping learned by CycleGAN and verify that under some circumstances the one-to-one mapping learned by CycleGAN is just a random one within the large feasible solution space, which is due to the lack of constraint on the one-to-one mapping established by CycleGAN.

  • We propose to use the optimal transport with respect to a task-specific metric to guide CycleGAN on learning a one-to-one mapping with desired properties. Our experiments on several datasets demonstrate the effectiveness of the proposed algorithm on learning a desired one-to-one mapping.

Related Work

Generative Adversarial Networks (GANs), consisting of a generator network and a discriminator network, is originally proposed as a generative model to match the distribution of generated samples to the real distribution, where the discriminator is trained to distinguish generated samples from real ones while the generator learns to generate samples that fool the discriminator. Researchers have been working hard on improving the stability of training and exploiting the capacity of GANs for various computer vision tasks. For instance, [Radford, Metz, and Chintala2015] proposes a deep convolutional architecture that stabilizes the training; WGAN [Arjovsky, Chintala, and Bottou2017] proposes to utilize Wasserstein-1 distance (or Earth Mover’s distance/EMD) as an alternative metric.

Conditional GANs (cGANs; [Mirza and Osindero2014, Odena, Olah, and Shlens2016, Zhou et al.2017]) proposes to extend GANs to a conditional model by conditioning some extra information, such as class label, on both generator and discriminator in GANs so that it can generate images conditioned on class labels and so on. [Reed et al.2016] extends cGANs with conditional information being text features. Pix2pix [Isola et al.2016] proposed a unified image-to-image translation framework based on conditional GANs, with conditional information being images.

In practice, it is always hard to collect a large amount of paired training data, while unpaired data can always be obtained more easily. In order to make better use of unpaired data in real world, CycleGAN [Zhu et al.2017], DiscoGAN [Kim et al.2017] and DualGAN [Yi et al.2017] adopt the idea of dual consistency, which was firstly proposed in language translation [He et al.2016], together with GANs to simultaneously train a pair of generators and discriminators for translation in both directions and applied cycle consistency loss on both data distributions, which forces the mapping to be a one-to-one mapping. However, theoretically, there is no explicit constraint on the properties of the one-to-one mapping within CycleGAN, which results in a large feasible solution space and the learned one-to-one mapping being a random one within this space.

Optimal transport [Villani2008] aims to find a mapping between two distributions that can transport the source distribution to the target distribution with the least transportation cost. In many cases, the mapping between two distributions, where each source point only maps to a single target point (the Monge’s problem) does not exist. The modern approach to optimal transport relaxes the Monge’s problem by optimizing over plans, i.e., a distribution over the product space of the source distribution space and the target distribution space. [Cuturi2013] proposes to introduce entropic regularization term into OT problem which turns it into an easier optimization problem and can be solved efficiently by Sinkhorn-Knopp algorithm. [Seguy et al.2017]

proposed a stochastic approach for solving large-scale regularized OT and estimating a Monge mapping as a deep neural network approximating the barycentric mapping of the OT plan.

Method

Figure 1: The framework of the proposed method. We use the barycentric mapping of optimal transport, which minimizes the cost of mismatching of a task-specific property, to guide the CycleGAN on learning a one-to-one mapping with the desired properties.

Given two sets of unpaired images that respectively from domain and domain , the primal task of unsupervised image-to-image translation is to learn a generator that maps an image to an image . The modern techniques [Zhu et al.2017, Yi et al.2017, Kim et al.2017] of unsupervised image-to-image translation introduce an extra generator that maps an image to an image and cycle consistency loss, i.e., and , is introduced to regularize the mapping between and . As the result, the learned mapping would be a bijection, i.e., a one-to-one mapping. However, as we will discuss in the latter of this section, cycle consistency loss, though helps build a one-to-one mapping, has no control on the properties of the learned one-to-one mapping. In this section, we will also discuss how to add extra constraints on the learning of the one-to-one mapping to chase desired properties.

Preliminary: CycleGAN

In CycleGAN, besides the above-mentioned two coupled generators and that translate images across domain and and the cycle consistency losses that regularize the learned mapping to be a bijection, it also introduces an adversarial loss to each generator to ensure translated images are valid samples. More strictly, by playing a minimax game with the discriminator, the adversarial loss forces the generator to match the distribution of generated images with the distribution of real images in the target domain.

Adversarial Loss

In the original GAN [Goodfellow et al.2014]

, the discriminator was formulated as a binary classifier outputting a probability. Given a real image distribution

and the fake image distribution formed by generated samples with

, the loss function of original GAN is defined as:

(1)

The discriminator learns to maximize , that is to distinguish the real samples and the fake samples, while the generator learns to minimize such as to make the generated samples have a low probability of being classified as fake by the discriminator. When is assumed to be optimal, the objective of generator is to minimize the Jensen-Shannon divergence between and , and the minimum is achieved if and only if .

Although GANs have achieved great success in the realistic image generation, training of the original GANs turns out to be very difficult and one has to carefully balance the ability of generator and discriminator. It was showed in [Arjovsky and Bottou2017, Arjovsky, Chintala, and Bottou2017] that Jensen-Shannon divergence is ill-defined when the supports of the two distributions are not overlapped. Wasserstein distance is thus introduced [Arjovsky, Chintala, and Bottou2017] as an alternative metric for evaluating the distance between the real and fake distributions. Wasserstein distance is defined as the minimal cost of transporting distribution into . In its primal form, it is formally defined as:

(2)

where denotes the collection of all probability measures on with marginals on and on V.

Since the infimum in Eq. (2) is highly intractable, in WGAN [Arjovsky, Chintala, and Bottou2017], the discriminator (critic) is designed to estimate the Wasserstein distance by solving its dual form, with the corresponding objective defined as:

(3)

where the discriminator is constrained as a 1-Lipschitz function. The problem of how to properly enforce 1-Lipschitz has evolved a set of investigations [Gulrajani et al.2017, Miyato et al.2018, Petzka, Fischer, and Lukovnicov2017]. In our experiments, these solutions show very similar results and we choose the Gradient-Penalty [Gulrajani et al.2017] loss for on-the-fly example through the paper, i.e.,

(4)

where

is the distribution of uniformly distributed linear interpolations of

and .

Cycle Consistency Loss

Training with respect to the adversarial loss forces the distribution of to match with the distribution . However, this actually does not build any relationship between the source domain and the target domain. Without paired data, traditional approaches build the relationship between the domain data via predefined similarity function [Bousmalis et al., Shrivastava et al.2017, Taigman, Polyak, and Wolf2016] or assuming shared low-dimensional embedding space [Liu, Breuel, and Kautz2017, Aytar et al.2017]. In CycleGAN series [Zhu et al.2017, Kim et al.2017, Yi et al.2017], a dual task of translating data from domain to domain is introduced and cycle consistency is encouraged as a regularization.

Specifically, cycle consistency requires any image in domain can be reconstructed after applying and on in turn and any image in domain can be reconstructed after applying and on reversely. That is, , . The cycle consistency loss can be formulated as follow:

(5)

in which we adopt distance to measure the distance between the original image and the reconstructed image.

The One-to-one Mapping in CycleGAN

In CycleGAN, the adversarial losses applied on two generators help to establish the mappings between domain and domain in both directions, as it forces the generated images to be within the target domain. Meanwhile, the cycle consistency losses help to relate these two mappings and force them to be two one-to-one mappings, as it forces different samples in the source domain to be mapped to different samples in the target domain (otherwise, the consistency loss would be large). Therefore, CycleGAN would establish a bijective mapping between domain and domain , which is also mentioned in DiscoGAN [Kim et al.2017] and CycleGAN [Zhu et al.2017].

It is promising that CycleGAN can find a one-to-one mapping between two data distributions unsupervisedly. But theoretically, there exists a large number of one-to-one mappings between two data distributions. For example, the number of possible one-to-one mapping between two discrete data distributions with each containing discrete data points is the factorial of , i.e. . And all these one-to-one mappings are perfect in terms of CycleGAN’s objective.

Since there is no extra control on the properties of the mapping, as long as it is one-to-one, the learned one-to-one mapping with CycleGAN would theoretically be a random one in this large feasible solution space.

(a) Datasets Samples
(b) CycleGAN Mapping
(c) Optimal Transport Mapping
Figure 2: Synthetic experiments: CycleGAN learns a one-to-one mapping between datasets A and B, however, the learned mapping is out-of-order. Defining the cost function to be the squared Euclidean distance of the locations of their vertical lines, the optimal transport is capable of mapping the images in dataset A to B in sequence. This illustrates the randomness of the one-to-one mapping established via CycleGAN and at the same time show the ability of optimal transport to build a desired mapping, given task-specific cost function. The x- and y-axis ticks in sub-figure (b) and (c) indicate the images with the specified locations of the vertical line in domain A and B respectively.

For verification, we conducted experiment across two synthetic datasets A and B, each consists of 32 images in the resolution of 64x64, with each image contains one vertical line at a different position as showed in Figure (1(a)). The resulting mapping learned with CycleGAN is showed in Figure (1(b)). As we can see, images with the vertical line in different positions in A is mapped to images in B without any order. Furthermore, this one-to-one mapping changes, given different initializations and hyper-parameters.

Guiding CycleGAN with Optimal Transport

As discussed above, the one-to-one mapping learned by CycleGAN can be random in the large feasible solution space. However, in many practical applications, we would expect certain feature getting matched in the learned mapping. For example, when the two domains are different languages, one may expect the semantic information of characters keeps unchanged after translation. Without any additional control, the one-to-one mapping function learned by CycleGAN, in theory, will fail to achieve this with a very high probability (approaching one as the number of term increasing).

Here we propose to make use of the controllability in optimal transport to endow CycleGAN with the ability of learning one-to-one mapping with desired properties.

Optimal Transport (OT)

According to Kantorovich formulation [Kantorovitch1958], the typical optimal transport problem can be defined as finding a mapping function between two distribution and , which is optimal with respect to cost function , and it can be formulated as follows:

(6)

where denotes the collection of all probability measures on with marginals on and on V, as in the primal form of Wasserstein distance. In fact, Wasserstein distance is a special form of optimal transport with the cost function required to be a distance (a proper metric), while in optimal transport, can be any cost function. Another difference is that, as an adversarial objective, Wasserstein distance is conducted between the distribution formed by and the target distribution , while the optimal transport is conducted between the source distribution and the target distribution . And here we will focus more on the optimal transport plan, instead of the optimal cost.

Reflecting the Desired Properties with OT

Given two distribution and , CycleGAN builds a one-to-one mapping between and . As we discussed previously, the one-to-one mapping might be a random one in the feasible solution space. However, in the specific tasks, people actually have an expectation on the outcome of the learned one-to-one mapping, e.g. pixel-level distance or average hue difference was expected to be low, the outline or semantic meaning was expected to be unchanged and so on.

One way to model the expectation is to define a task-specific cost function and then the satisfaction degree of the expectation, if it is defined to be the averaged satisfaction degree of all pairing, can be modeled as the transport cost , Eq. (6). It follows that, given a task-specific cost function , in terms of the optimal transport, the best mapping is the .

We thus propose to solve the optimal transport problem under the task-specific cost function and use the optimal transport plan as a reference to build the one-to-one mapping in CycleGAN.

Optimal Transport Plan as Reference

Given an arbitrary cost function, the optimal transport plan is usually a many-to-many mapping, i.e. and is usually not a Dirac delta distribution. Therefore, it is not feasible in cross-domain translation tasks, and some previous work [Perrot et al.2016, Seguy et al.2017] attempt to use the Barycenter instead. The Barycenter of in the source distribution is defined to be a sample in target domain V that has the minimal transport cost to its transport targets :

(7)

However, the Barycenter is not guaranteed to lie in the distribution , which in practice behaves as blurring images.

We thus proposed that, instead of directly using the optimal transport plan or the Barycenter, we train a CycleGAN and use the Barycenter of the optimal transport plan as a reference to guide the establishment of its one-to-one mapping. Given a proper weight on this regularization, CycleGAN will be able to learn a one-to-one mapping that basically follows the optimal transport plan, while at the same time, makes each translated sample lies in the target distribution under the supervision of adversarial loss. Our algorithm can then be separated into two steps:

  • Firstly, given two distributions and a task-specific cost function, we learn an optimal transport plan between the two distributions, and we evaluate the Barycenter and for each sample in the two distributions.

  • Secondly, we train a CycleGAN model using these Bary-centers as references to the two cross-domain generators. The corresponding reference loss is defined as follows:

    (8)

The full objective of our algorithms can be formulated as:

(9)

where and are optimized to minimize the objective, while and are optimized to maximize the objective. We will later refer to this model as OT-CycleGAN.

Discussions

CycleGAN Optimal Nearest
Transport Neighbor
Controlling N Y Y
Mapping One-to-One Many-to-Many N/A
Generalization Y N N
Table 1:

Comparison among CycleGAN, optimal transport, and nearest neighbor. The nearest neighbor and optimal transport are capable of controlling the mapping with respect to a given metric between two samples. However, the mapping build via nearest neighbor does not form a joint distribution, i.e. may collapse to a subset, and optimal transport usually builds a many-to-many mapping, which is not adequate in cross-domain translate. And also, the optimal transport plan does not generalize to out-of-distribution samples.

As discussed in the previous sections, in the sense of establishing a mapping between two data distributions, CycleGAN and optimal transport both have strengths and weaknesses. This motivates us to use the barycenters of optimal transport mapping to serve as the references of CycleGAN, so as to combine the strengths of the two models to establish a one-to-one mapping with (mostly) minimized mismatching cost over task-specific properties between two data distributions.

Another difference between CycleGAN and optimal transport is that optimal transport establishes a mapping between samples in both datasets mathematically. Under the circumstance of two discrete datasets, it cannot generalize to out-of-distribution samples. In contrast, CycleGAN learns the mapping function between two distributions via two neural networks and thus has the ability to generalize to out-of-distribution samples. When the two discrete datasets hold the same number of unduplicated samples, a perfect one-to-one mapping actually may also exist in optimal transport. Under such conditions, CycleGAN helps optimal transport generalized to out-of-distribution samples.

Besides optimal transport, nearest neighbor might also come to mind for controlling the mapping to have matched properties. With the nearest neighbor algorithm, every sample in the source distribution will be mapped to the nearest one in the target distribution. However, nearest neighbor is a local algorithm, and without considering the global status, the mapping established via nearest neighbor might collapse to a subset in the target domain or even a single point. For example, source domain is a set of real numbers whose range is [0, 31] while the range of target domain is [32, 63] and the cost function is specified as the squared difference. In this case, the nearest neighbor would map all samples in the source domain to the ‘leftmost’ one in the target domain i.e. 32. In comparison, optimal transport will map the whole source domain to the whole target domain in sequence.

We summarize the discussion among CycleGAN, optimal transport and nearest neighbor in Table (1).

Experiments

In order to demonstrate the effectiveness of our proposed algorithm for learning a one-to-one mapping between two data distributions with desired properties, we conduct several image-to-image translation experiments between different datasets, and we compare the translation results of our algorithm with CycleGAN. Details of our experimental setting are as follows.

Network Architecture

In our experiments, we adopted the architecture of auto-encoder [Hinton and Salakhutdinov2006]

in both of our generators. The encoder is composed of a set of stride-2 convolution layers with a 4x4 filter, while the decoder is composed of several stride-2 deconvolution layers with 4x4 filter. Each convolution layer in the encoder or deconvolution layer in the decoder is followed by a normalization layer except the first and the last one. We use WGAN-GP loss instead of the original GAN loss in our experiments. The architecture of discriminator (critic) is designed to be the similar as the decoder, except that we eliminate all normalization layers.

Optimization Details

We use network simplex algorithm [Damian, Comm, and Garret1991]

for solving the optimal transport problem between two data distributions as linear programming. Due to the lack of computation power, we use L2 barycenter instead of accurate barycenter to obtain the barycentric mapping out of the previously-obtained optimal transport plan, which can be simplified as the weighted sum of mapped samples. We use Adam 

[Kingma and Ba2014] optimizer with ,

. We train our model for 3000 epochs with an initial learning rate of 0.0002 and linearly decayed it to zero.

is set as 10, is set in the range of [100, 800] and is set in the range of [50, 300]. We train critic for 5 steps and generator for 1 step in turn.

CycleGAN OT-CycleGAN
Mismatching Degree () 1.026 0.5634 0.3393 0.2788 0.2865 0.3023
Table 2: Comparison between CycleGAN and OT-CycleGAN in terms of mismatching degree.

Experiment: Car-to-Chair

We conduct our first experiment between a car dataset [Fidler, Dickinson, and Urtasun2012] and a chair dataset [Aubry et al.2014]. Both datasets consist of images of 3D rendered objects with varying azimuth angles and the value of azimuth angle of each image is provided by the dataset. Figure (2(b)) shows the translation results of CycleGAN between these two datasets. As we can see, as the images of car vary in azimuth angle in order, the translation results are random samples in the target domain.

OT Barycenter

By using the azimuth angle of each image provided by each dataset and specifying the cost function between each image to be the squared difference of azimuth angle, we are able to find an optimal transport plan that can transport the car distribution to the chair distribution with the least overall azimuth angle difference. Additionally, as there is more than one image at each azimuth angle, we further use the Euclidean distance between the average RGB color of each image (exclude the white background) to as subsidiary cost function, such to find an optimal transport plan that can further minimize the overall color difference. In summary, the task-specific cost function in this experiments is formulated to be:

(10)

The samples of resulting barycentric mapping are illustrated in Figure (2(a)).

(a) Barycentric Mapping
(b) Result Comparison
Figure 3: Car-to-Chair Experiments.

OT-CycleGAN Result

Figure (2(b)) shows translation results of our algorithm. The resulting mapping of OT-CycleGAN successfully matches the azimuth angles and colors of the generator’s input and output. We also evaluate the mismatching degree for each method. As listed in Table (2), OT-CycleGAN achieves a much lower mismatching degree.

Azimuth-Angle Mapping Analysis

We plot the overall azimuth-angle mapping to provide a global comparison between CycleGAN and OT-CycleGAN. As we can see in Figure (4), the resulting azimuth-angle mapping with CycleGAN is fairly random, while the OT-CycleGAN mostly matches the azimuth-angle of input and output. The azimuth-angle of translated image is obtained via finding its nearest neighbor in the training set. It worth mentioning that here we ignored the color attribute, therefore, the result is a superposition over images of different colors.

(a) Car-to-Chair: CycleGAN
(b) Car-to-Chair: OT-CycleGAN
(c) Chair-to-Car: CycleGAN
(d) Chair-to-Car: OT-CycleGAN
Figure 4: Azimuth angle mapping of Car-to-Chair.

Experiment: Shoes-to-Handbags

(a) Barycentric Mapping
(b) Result Comparison
Figure 5: Shoes-to-Handbags experiments.

In this experiment, we performed image-to-image translation between a shoes dataset [Yu and Grauman2014] and a handbags dataset [Zhu et al.2016]. Figure (4(b)) shows the translation results of CycleGAN between these two datasets. As we can see, the translation results are of an obvious color difference from the source samples.

OT Barycenter

In this experiment, we would like to establish a one-to-one mapping that matches the color of the handbags with the color of shoes. As the color in each image of these two datasets is much more complex than the previously used car and chair datasets, it would be inaccurate to use the average color to represent the color information of each image, we thus adopted a color histogram to represent color information of each image. We use the Wasserstein distance between two histograms and as the cost function, with being the Euclidean distance of two color bins and in Lab color space. Samples of the resulting barycentric mapping are showed in Figure (4(a)).

OT-CycleGAN Result

Figure (4(b)) illustrates the mapping function learned by our method (OT-CycleGAN). Compared with the original CycleGAN, the mapping established by our algorithm is significantly better, in terms of whether the color distributions match each other, in both visual and quantitative metric .

Reference Weight

One important parameter in OT-CycleGAN is , i.e. the weight of OT reference loss to CycleGAN. Ideally, if is extremely large, the resulting mapping will be identical to the barycentric mapping of OT, while if the is extremely small, the reference loss will not take effect and the result will be similar to CycleGAN, which is evidenced in Figure 6. More results are summarized in Table (2), and we can see there exists a pretty large range of where OT-CycleGAN is able to learn a satisfactory mapping.

Figure 6: Tuning the parameter .

Discussion

U-Net architecture is mostly used in image-to-image translation tasks, though it tends to connect the pixel information between the input and output and achieved many satisfactory results, it, however, does not theoretically guarantee the relationship between source and target and thus may require extensive tuning if a special property is wanted. Our method, in contrast, can directly specify which properties to be matched.

Conclusion and Future Work

We have presented OT-CycleGAN where an optimal transport mapping is used to guide the one-to-one mapping established by CycleGAN. With the proposed algorithm, one can control the learned one-to-one mapping in CycleGAN via defining a task-specific cost function that reflecting the desired mapping properties.

Specifically, we demonstrate that there is no controllability on the properties of the learned one-to-one mapping in CycleGAN, and optimal transport can provide a mapping that minimizing the overall cost of mismatching of expected properties, given a task-specific cost function. Since the optimal transport mapping is usually not one-to-one, we propose to use the Barycenters of learned mapping as references to guide the training of CycleGAN to form a one-to-one mapping with desired mapping properties.

Experiments conducted on several benchmark datasets have shown that the mapping function learned by vanilla CycleGAN can be quite messy and the guiding of optimal transport can significantly improve the mapping in terms of the task-specific properties.

In the main-body and experiments, we mainly focused on image-to-image translation, as it is the most successful application of CycleGAN. We hope the detailed analysis of the properties of CycleGAN and optimal transport would also benefit further investigation on cycle consistency loss and unsupervised cross-domain translation. OT-CycleGAN is a general framework for establishing one-to-one mapping with desired properties and we plan to investigate more related tasks in the further.

References