Unsupervised learning has emerged as one of the most important facets of Machine Learning research. With the advent of Generative Adversarial Networks (GANs) (goodfellow2014generative) it has become possible to harness large amounts of unlabeled data in the form of a generative model which can make extremely plausible images (radford2015unsupervised). If we are to target superhuman intelligence, we have to create networks that not only learn from large quantities of data but also interact among themselves in order to learn from each other or even compete with each other. Through this work we obtain one of the first approaches towards engaging multiple agents towards learning of deep unsupervised representations. Note that very recently multiple agents have been explored by sukhbaatar2016learning and foerster2016learning where they employ a Deep Reinforcement based formulation in order to achieve a shared utility.
Generative Adversarial Networks have recently seen applications in Image Inpainting (pathak2016context), Interactive Image Generation from just a few brushstrokes (zhu2016generative
), Image Super Resolution (ledig2016photo) and Abstract Reasoning Diagram Generation (ghosh2017contextual). GANs have been augmented in several ways to extract structure out of the representations most notably by chen2016infogan, liu2016coupled and dumoulin2016adversarially. While our Multi-Agent Generator based framework has the elegance to be applicable in most of the above applications, we demonstrate its application in the task of unsupervised image generation.
Our work bears close resemblance to the work on Adversarial Neural Cryptography (abadi2016learning) where the Cryptographic System is automatically learned based on the varying objectives of the three agents Alice, Bob and Eve. Our conceding and competing objectives are based on these ideas. Multi-Agent Systems with message passing were first employed by foerster2016learning and our model of harnessing the messages that are received from the other generator is based on similar ideas. lazaridou2016multi introduced a message passing model between the different agents that are forcibly made to co-operate via the introduction of a bottleneck and the clustering of the image features show that messages for images of same category are usually same.
This work presents one of the first forays into this subject and in a more traditional Deep Unsupervised Learning setting rather than in a Deep Reinforcement Setting where the reward structure is discrete and training becomes slightly more difficult. In this work we present a setting of Multi-Generator based Generative Adversarial Networks with a competing objective function which promotes the two generators to compete among themselves apart from trying to maximally fool the discriminator. We also analyze a conceding objective which tries to promote the other generator to be better than itself. We also introduce a message passing model in order to make the generators aware of the generations the other generator is targeting and hence learn to generate better images.
With the message passing model, a fact emerged that a bottleneck has to be added in order to make the message generator actually learn meaningful representations of the messages. Hence we demonstrate the performance of the message passing model in presence of three bottlenecks. The first being the two generators being passed samples from different noise distributions, namely one of them was provided with samples from Normal(0,1) and the other one was passed samples from Uniform(-1,1). The message passing model was also analyzed with the two objectives introduced as competing objective and the conceding objective to understand the message and the generations that each of the networks produce in such situations.
The models yielded some interesting results as seen from Fig. 1 and Fig. 2 without any explicit formulation one of the generators was generating images with much more facial detail while the other generator was generating images with the overall content with even obscure objects as the last image in Fig. 2 where a woman wearing a cap with her eyes covered by the cap is depicted. An interesting observation was in Fig. 3
where the message interpolation results between the 2 generators showed the process an artist takes for an artistic creation.
In summary our main contributions in the paper are:
Presenting a novel framework of Multi-Agent GANs that comprises of multiple generators.
Introducing an objective which promotes competition among the generators and another objective which tries to make the other generator better than the current generator.
Introducing a novel message passing model, with the messages being passed between the generators in order to better explore the modes in the distribution.
2 Related Work
Unsupervised Learning with Generative Models have made immense progress within a remarkably short time, most notably being pioneered by 2 major directions Variational Autoencoders (kingma2013auto) and Generative Adversarial Networks (goodfellow2014generative). Efforts have been made for the unification of the 2 methods using Adversarial Autoencoders (makhzani2015adversarial). Since the Variational Autoencoder based models are based on a Maximum Likelihood based objective hence some of the modes may remain unexplored.
Generative Adversarial Networks (GANs) have received tremendous interest in recent times especially after radford2015unsupervised were able to show several interesting interpolation based generations and even arithmetic properties that exists in the latent space. Several applications such as video generation (vondrick2016generating), Image manipulation (zhu2016generative) and 3-D object generation (wu2016learning) use GANs as the underlying generative model. Several variants of the GAN training objective have also been proposed in order to stabilize the training such as salimans2016improved and arjovsky2016towards. Several objective functions have been proposed which minimize a divergence different from the Jensen Shannon divergence as proposed by goodfellow2014generative for instance nowozin2016f experiment with various different divergences and show improved results.
Our technical approach is closely related to the Conditional GANs of mirza2014conditional which generate images based on class specific information, reed2016generative which condition the generation on the text, ghosh2017contextual which condition the generation on all previous inputs via a RNN, chen2016infogan which learn special representations of its latent variables for an interpretable conditional GAN based model. durugkar2016generative also looked upon multi-agent GANs but their model was based on multiple discriminators rather than multiple generators and based on ensemble based principles rather than message passing based objective. liu2016coupled
learn a joint distribution of images by coupling a pair of GANs i.e. jointly training a pair of generator-discriminator such that some of the initial layers of generators have shared weights and similarly some of the last layers of the discriminators have shared weights.
Message Passing Models and Co-operating Agents
Belief propagation (weiss2001optimality
) based message passing had been one of the major learning algorithms employed as the principal training procedure in Probabilistic Graphical Models. The paradigm of co-operating agents has been looked upon in Game Theory (cai2011minmax). foerster2016learning and sukhbaatar2016learning introduce formulations of co-operating agents with a message passing model and a common communication channel respectively. lazaridou2016multi recently introduced a framework for networks to work co-operatively and introduce a bottleneck that forces the networks to pass messages which are even interpretable by humans.
Although the Generative Adversarial Networks goodfellow2014generative is itself modeled as an adversarial game between 2 agents but with the advent of the competing objective even between the competing generators the generators start venturing into slightly different modes in the underlying noise space exploring greater modes of data. lee2016stochastic is a work that incorporates competition between deep ensembles by passing the gradient to the best network. abadi2016learning formulated a neural cryptography based framework in which Eve is an adversary and Alice and Bob work co-operatively in order to hide sensitive information from Eve.
With the introduction of multiple generators, we add another set of objectives which helps us understand the dynamics of the system. We also introduce a version of Message Passing Generative Adversarial Networks with several variations which pass messages in order to make the generations of both better. The message passing model is augmented with several bottlenecks which encourage the generators to pass meaningful messages.
The competing objective that we introduced is based on the principle that the Generators also compete with each other to get better scores for its generations from the Discriminator. The minimization objective function for the generator is:
while the minimization objective function for generator is:
where f(x) = so that the optimization objective for pushes it to get better scores from the Discriminator and vice versa for .
The principle behind the introduction of this objective is that the 2 generators try to guide each other in order to get better scores for its generations from the Discriminator. This model is similar in structure to the Competing Objective but the crucial difference is in the function used. Here, the minimization objective function for the generator is:
while the minimization objective function for the generator is:
where f(x) = so that the generations of are better than that of .
Our Message Passing model is based upon the principle that the messages passed between the 2 Generators will make the generators explore different subspaces of the image manifold and also provide better training for the discriminator as a regularization by introducing different types of images to the Discriminator.
Message Passing Model
Each Generator generates images conditioned upon the message that it receives from the other generator and the noise sampled from the noise distribution. After both the generators have generated their respective images, a common message generator with shared parameters takes the image as input and generates message and the message generated from each generator’s image is passed to the other generator as a message in the next iteration. We also experimented with individual message generators for each generator but a common message generator works better because the messages are transferred between the 2 generators and meaningful messages can only be produced if the same network can gauge the generations by the 2 generators.
The minimization objective function for the generator is
where x is composed of noise obtained from distribution and message passed by . The message is initialized with same distribution as noise. is the message for the Generator G1 created by the message generator in the previous iteration.
Similarly, the minimization objective function for the generator is
The discriminator is trained such that both the generations by and are labeled to be as fake by the discriminator.
Conditioned Message Passing Model
Each generated image is passed to Message Generator which creates an output. This output along with the generator’s input is encoded using a multi-layer perceptron called Encoder to create the message. As the message is conditioned both on the generation and the input of the generator, the encoder can create much better messages as it knows what factors led to the generation.
The objective being minimized by the generator is
where x is composed of noise obtained from distribution and message passed by . The message is initialized with same distribution as noise. Analogously, the objective being minimized by the Generator is
The message passing model is oblivious to the input that the generator received in order to generate the images and hence doesn’t give good generations, on the other hand the conditioned message generation gives much better generations because the messages are also conditioned on the input and the output of the generators.
We consider three different bottlenecks in order to force the messages to be meaningful:
Different Noise Distributions
The noise and that each of the Generators and get are sampled from different distributions. The principle behind the introduction of this bottleneck is that the generators would be able to master the modes in the 2 kinds of noise distributions and additionally that the messages will be forced to be different from mirroring the trivial noise distribution that was initially started off with. More concretely and was used for the training of the pair of generators.
In order for the generators to co-operate and pass meaningful messages and make each other better we provide a model where the generators’ objective function tries to make the other generator’s generations get better scores from the discriminator and passes messages accordingly.
The objective function minimized by the generator is:
and similar version for the generator
In order to see the effects of the competing objective with the message passing model and whether rogue messages are passed in order to get better scores for its generations from the other discriminator, we provide a model of message passing GANs which compete with each other along with passing messages.
The structure is same as the message passing with condition. The objective function minimized by the generator is:
and similar version for the generator
In the simplest version (when generators don’t pass messages) the objective function being maximized by the discriminator is:
When generators pass messages, only the input of the generators will change to include messages as well.
4 Experimental Setup
Model Architecture Details
The Generator and the Discriminator’s architecture was unaltered from radford2015unsupervised while the only change was with the introduction of the message generator which has an almost identical architecture as the Discriminator but with the modification of changing the number of filters to the message dimension of the final output. On extensive experimentation with the different dimensions used for the message the best results were produced when the dimension of the message was 50. The experiments done are:
The representation of the image obtained by passing the real images through the discriminator as employed by radford2015unsupervised was used alongside a novel feature representation enabled by our formulation of the message generator. The interesting aspect of the message generator is that it never got to see the real images, it just got to see the generated images by the 2 Generators and still when its feature representation is used it still gives interesting results. The dataset used for the classification examples is the Street View House Numbers Dataset goodfellow2013multi which was used by radford2015unsupervised and also salimans2016improved for the evaluation of their techniques. Ablation studies were performed to identify the benefits from the discriminator representation and the message representation individually as well.
The celebrity dataset liu2015faceattributes was used to partition the faces based on the type of hair into five categories: bald, black, brown, blond and gray. The images belonging to these partitions were passed through the Message Generator to get the representations of each of the images from the Message Generator. The representations are then reduced to 2 dimensions using T-SNE and then they are represented using the different colors. Somewhat meaningful clusters start emerging from this exercise.
With the introduction of the message passing mechanism, visualization was done by varying the messages or the noise in order to interpret the manifold learnt by the pair of Generators.
Message Interpolation: Keeping the noise constant between the 2 generators we can understand the structure of the messages learnt by varying the message for creating the generations.
Noise Interpolation: Keeping the messages constant between the 2 generators we can understand the impact of the noise by interpolation between the 2 noise.
A very interesting insight emerged is that the interpolation between the messages showed major content of the image while the interpolation between the noise produced texture changes in the image. This phenomenon would be elucidated in the results and analysis section further.
5 Results and Analysis
Classification Results on SVHN
As shown in Table 1 all of the models’ discriminator representation improved the results over the discriminator representation of DCGAN radford2015unsupervised thus showing that the proposed models provide regularization to the training procedure of the discriminator. The non-trivial accuracy obtained by the message representation which never got to see the real images is an interesting phenomena while the improvement of the accuracy with the Message alongside the Discriminator features shows that the Message representation learns some complementary features which helps in the overall classification task. The conceding objective performs better than the competing objective in the absence of message passing while it lags behind the competing objective when message passing is introduced as well. The message passing in itself doesn’t perform well as compared to the conditioned message passing with respect to the experiment performed on the generators getting noise from different noise distributions hence the rest of the message passing experiments were conducted with the conditioned message generator based architecture.
|Model||Discriminator Rep||Message Rep||Msg+Disc Rep|
|Improved GANs salimans2016improved||8.11 1.3 %||NA||NA|
|Different Noise MP||20.1%||53.48%||18.7%|
|Different Noise CMP||17.1%||54.21%||15.2%|
As described in the experimental section the clustering was performed with the messages and visualized using T-SNE on the 2-D space. As evident from the clustering results there emerge some clusters from the messages based on the disjoint division of the hair style. As evident from Fig. 6 the messages for the bald hair style totally separates from the rest while black and brown being a bit subjective are similar in the message space but some clusters for black hair emerge which are totally pure. Gray hair also separates quite clearly from the rest.
Interpolation results for competing objective.
Interpolation results for conceding objective.
We consider three bottlenecks:
Different Noise Distribution
As evident from Table 1 that in case of different noise distribution, one with condition performs better, we consider only conditioned message passing in the next two bottlenecks:
Interpolation results for competing objective with conditioned message.
Interpolation results for conceding objective with conditioned message.
We presented several novel architectures and objectives aimed at training multi-agent GANs along with bottlenecks such as the generators receiving noise from different noise distributions, competing generators which compete with each other, conceding generators which tries to encourage the other generator to perform better than itself. As is evident from the experiments the models learn meaningful representations. The introduction of the architecture regularizes the training of the discriminator as is evident from the improved results of the discriminator. The representations obtained from the message generator are quite valuable in itself as is evident from the high accuracy obtained from its representation that was not even shown the real images.