DeepAI
Log In Sign Up

Continuous Methods : Hamiltonian Domain Translation

07/08/2022
by   Emmanuel Menier, et al.
0

This paper proposes a novel approach to domain translation. Leveraging established parallels between generative models and dynamical systems, we propose a reformulation of the Cycle-GAN architecture. By embedding our model with a Hamiltonian structure, we obtain a continuous, expressive and most importantly invertible generative model for domain translation.

READ FULL TEXT VIEW PDF

page 3

page 4

02/13/2018

An Optimized Architecture for Unpaired Image-to-Image Translation

Unpaired Image-to-Image translation aims to convert the image from one d...
09/30/2019

Hamiltonian Generative Networks

The Hamiltonian formalism plays a central role in classical and quantum ...
02/25/2020

Generalizing Convolutional Neural Networks for Equivariance to Lie Groups on Arbitrary Continuous Data

The translation equivariance of convolutional layers enables convolution...
06/08/2020

Structure-preserving discretization of port-Hamiltonian plate models

Methods for discretizing port-Hamiltonian systems are of interest both f...
05/08/2020

Geometric numerical integration of Lìenard systems via a contact Hamiltonian approach

Starting from a contact Hamiltonian description of Lìenard systems, we i...
12/22/2018

Deep Ptych: Subsampled Fourier Ptychography using Generative Priors

This paper proposes a novel framework to regularize the highly ill-posed...
06/01/2021

Hybrid Generative Models for Two-Dimensional Datasets

Two-dimensional array-based datasets are pervasive in a variety of domai...

1 Introduction

Domain translation is the process of transforming elements from one domain to another. One can think of applications such as neural style transfer (Gatys et al., 2016)

which is for example used to apply a certain painter’s style to photo-realistic images. A common problem encountered in domain translation applications is that, in many cases, paired data is not available during training, which means that the problem has to be formulated in an unsupervised setting. Unsupervised learning is very common in the field of generative modeling, and several architectures have been proposed to deal with the problem of Unsupervised Domain Translation. In this work, we focus on the Cycle-GAN

(Zhu et al., 2017) architecture, which has proved successful in various applications of Unsupervised Domain Translation ***Cycle GAN project page, https://junyanz.github.io/CycleGAN/.

Despite its success, the formulation of the Cycle-GAN method has been questioned and shown to be ill-posed. Using results from Chen & Gopinath (2000), it can be shown that when considering two distinct domains, there exist an infinity of pairings between the two domains which satisfy the Cycle-GAN objective. This is an issue as the model could get stuck trying to learn wildly inefficient mappings, leading to unsatisfactory optima. This conditioning problem has been explored in depth by de Bézenac et al. (2021), as they proposed to use a regularized residual network to learn the mapping between two given domains. Borrowing ideas from optimal transport and dynamical systems, they showed that pushing the training towards simple, low-energy, transformations in latent space leads to learning a sensible and trivially invertible mapping between the two domains of interest.

The study of the links between dynamical systems theory and deep learning is still to this day a major topic of interest. One can for example cite the identification of residual networks as first order approximations of a time-continuous process which has led to the development of ground-breaking approaches such as neural ordinary differential equations (Neural ODE

(Chen et al., 2018)

) or invertible neural networks

(Behrmann et al., 2019).

Building on this existing connection, as well as the work of de Bezenac et al., we propose a formulation of unsupervised domain translation as a continuous time process with conservation guarantees which ensure invertibility by construction. The proposed architecture learns the dynamics of the transformation as a Hamiltonian dynamical system. Hamiltonian systems are typically used in General Mechanics to describe the evolution of conservative systems. They preserve a quantity, called the Hamiltonian, along their trajectory. Using neural networks to learn Hamiltonian dynamics is an earlier idea that was proposed in Greydanus et al. (2019). However this work proposes to use them to ensure invertibility of the generative process which is a desirable property to ensure the domain translation problem is well-posed. Learning conservative transformations is in fact critical to other generative modeling approaches, such as normalizing flows (Rezende & Mohamed, 2015).

2 Method

2.1 Invertibility and CycleGAN

Formally, we can look at the two domains as two separate sets , where is the dimension of the space, i.e. the pixel space for images, or any latent representation space. The goal of unsupervised domain translation is to learn the forward mapping as well as the reverse map so that the pair generates semantically meaningful samples of each domain. That is to say, the generated samples should be indistinguishable from samples in the target domain, while remaining coherent with their corresponding sample in the original domain.

CycleGan proposes to enforce these constraints by using a combined loss: . The first term corresponds to an adversarial loss which measures the distance between the generated samples and the target domain. This term ensures that generated samples are indistinguishable from the target domain. In CycleGAN,

is implemented using Generative Adversarial Networks

(Goodfellow et al., 2014).

The second term in the loss is called the cyclic loss, . This term promotes transformations that are invertible and such that . Intuitively, this pushes the CycleGAN architecture towards learning minimal transformations of the samples, so as to retain a maximum of information from the initial sample and simplify the reconstruction . This second term is used to ensure coherence between the translated and initial samples. In addition, learning an invertible (thus bijective) map between the two domains is critical at the conceptual level. Indeed, one sample from a given domain should not map to multiple samples in the target domain as only one sample in the target domain should optimally satisfy the trade-off between coherence with the original sample and similarity with the target domain.

2.2 Continuous models and Hamiltonian Neural Networks

The previous paragraph outlined the importance of ensuring the translation map is invertible to relax the learning problem. In fact, this is not specific to the domain translation problem, as invertibility of learned maps has been linked to classical deep learning problems such as vanishing/exploding gradients in recurrent neural networks

(Pascanu et al., 2012), or the training of other generative models like normalizing flows (Rezende & Mohamed, 2015). Several approaches have been proposed to push learned models towards invertibility (Miyato et al., 2018; Rosenblatt, 1952), however, they often impose significant constraints on the structure and expressivity of the models, leading to important training costs.

In this work, we propose to use a natural formulation for invertible transformations. Exploiting the parallel between the residual networks used in numerous image processing approaches, and ordinary differential equations, we propose to define domain translation as a continuous system. Starting at with samples from one domain , we learn a transport flow so that, at , :

(1)

This formulation is not enough to ensure invertibility of the transformation, as the flow could be dissipative, or even unstable. To enforce invertibility, we express the flow as a conservative operator using Hamiltonian neural networks inspired from Greydanus et al. (2019)

. To do so, the samples are divided into two vectors of equal length

, (we assume to be even as a modeling choice). In general mechanics, and would respectively describe the position and momentum of the studied entities. In our setting, their significance is more abstract and is defined by another function, called the Hamiltonian , which we parameterize using a neural network, hence:

(2)

Using Neural ODEs and automatic differentiation, the function can be trained to satisfy the transport objective, i.e. given . Moreover, this formulation is invertible by design as it preserves the quantity along its trajectory. We show below that learning the transformation with this formulation allows for the generation of semantically correct samples, without using the cyclic loss required in CycleGAN. Thanks to the conservation properties of the flow , the inverse map is trivially obtained by integrating the flow backward in time:

(3)

3 Results

3.1 Generative results

We apply our Hamiltonian domain translation approach to image generation tasks. As proposed in de Bézenac et al. (2021), we train an encoder and a decoder to map images from both domains to a latent space of size . This is a common approach in many image processing approaches, as the intrinsic dimension of a given image processing problem is often much lower than the dimension of the pixel space. Thus, encoding images to a low-dimensional latent space reduces the domain translation problem complexity, as well as training costs.

Once the pair () is trained, it can be used to generate low-dimensional encoded vectors of images of the dataset at hand. We then use our approach to learn the transport flow . The Hamiltonian and discriminator

are implemented as multi layer perceptron with 3 hidden layers. The continuous flow is learned using the

optimise-then-discretise version of NeuralODEs. We apply the architecture to the task of translating male samples of the celebA (Liu et al., 2015) dataset to females. Figure 1 presents samples generated with this approach.

Figure 1: Selected samples of the male to female transport process using the proposed continuous domain translation approach. The transported encodings are decoded at regular time intervals, to illustrate the transformation applied by the model.
Figure 2: Selected samples of the reverse male to female generation process. NB : These samples are generated using equation (3) as transport flow was solely trained to map male to females.

As shown on Figure 1, decoding the transported samples along the transformation trajectory shows that the flow progressively transforms the male samples to females. As expected, the conservative nature of the model promotes transformations that retain non gender-specific features, as we observe that attributes such as pose, skin tone, face shape and background are preserved during the transformation. Figure 2 demonstrates an additional interest of the Hamiltonian architecture as we are able to generate males from female samples by simply integrating the flow backward (see Eq. 3). One should note that these results were obtained without ever training the model to map females to males as we do not compute the cyclic loss used in CycleGAN. These results are similar to the results of de Bézenac et al. (2021) while no penalization of the magnitude of the flow applied by the model is used but invertibility is enforced instead.

Figure 3: Results of excessive integration of the transport flow. A model trained to map males to females in 1 is integrated backward for more than twice the map horizon. We observe that the generated samples retain semantic sense for up to about times the training horizon.

3.2 Excessive Integration

An interesting feature of using a continuous flow to carry out domain translation is that one can gain some insight in the way the model transforms samples. If a flow has been trained to map two domains in one time unit (t.u.), , it can be integrated for a longer period, pushing the transformation further. This is one of the major differences between learning continuous transformations and discrete residual blocks. While residual blocks approximate the flow in specific regions of the latent space, the continuous flow is defined over the whole space. Any trained model starts losing performance once it drifts too far from its training conditions but we observed interesting results when integrating our model for several t.u.. Figure 3 shows that transported samples retain semantic meaning for up to about one and a half t.u., as the model progressively adds more and more gender-related features such as beards, wider jaws, shorter hair, etc. This generalisation performance can be linked to the conservative architecture of the model which prevents it from diverging to unknown conditions. It also adds to the interest of the approach as it supports the idea that the model is consistent with the structure of the latent space.

3.3 Training

It should be noted that, once the autoencoder is trained, learning the flow

is very inexpensive. The model starts generating semantically coherent samples after a single epoch, and does not require fine tuning between the training of the discriminator

and the flow . More formal benchmarking against comparable domain translation methods are planned for the future.

4 Conclusion

This work proposes a novel and improved formulation for domain translation. By using a time-continuous approach, we are able to leverage results from general mechanics to obtain a model that is invertible by construction. We show that this model can quickly learn to map two domains of interest, even in a latent space learned prior to training the domain translation architecture.

With the recent success of diffusion models (Nichol et al., 2021; Rombach et al., 2021), which are based on successive transformations in a pre-defined space, the analogy between generative models and dynamical systems becomes more and more relevant. In this context, the comparison of our proposed method with existing approaches such as normalizing flows and latent diffusion models constitutes the next step to this work, for instance exploring the potential extension of our continuous generative approach to stochastic differential equations (Li et al., 2020).

References