AuthorGAN: Improving GAN Reproducibility using a Modular GAN Framework

11/26/2019 ∙ by Raunak Sinha, et al. ∙ ibm IIIT Delhi 0

Generative models are becoming increasingly popular in the literature, with Generative Adversarial Networks (GAN) being the most successful variant, yet. With this increasing demand and popularity, it is becoming equally difficult and challenging to implement and consume GAN models. A qualitative user survey conducted across 47 practitioners show that expert level skill is required to use GAN model for a given task, despite the presence of various open source libraries. In this research, we propose a novel system called AuthorGAN, aiming to achieve true democratization of GAN authoring. A highly modularized library agnostic representation of GAN model is defined to enable interoperability of GAN architecture across different libraries such as Keras, Tensorflow, and PyTorch. An intuitive drag-and-drop based visual designer is built using node-red platform to enable custom architecture designing without the need for writing any code. Five different GAN models are implemented as a part of this framework and the performance of the different GAN models are shown using the benchmark MNIST dataset.



There are no comments yet.


page 5

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Automated generative models have progressed a lot over the last couple of years Goodfellow (2016)

. Many variations of generative models have been proposed and discussed in the literature such as Gaussian Mixture Models (GMM) 

Reynolds (2015)

, Hidden Markov model (HMM) 

Rabiner and Juang (1986); Sonnhammer et al. (1998), Latent Dirichlet Allocation (LDA) Blei et al. (2003)

, Restricted Boltzmann Machine (RBM) 

Smolensky (1986); Hinton (2012). A recent successful formulation of generative models is called as Generative Adversarial Networks (GAN). GAN models contains a generative module and a discriminative module competing with each other. When equilibrium between these two learning modules is achieved, the generator would have approximated the distribution of the given data.

The initial research about GAN was proposed in 2014 Goodfellow et al. (2014) and has rapidly grew in sophistication and complexity ever since, with more than named GANs in the literature. Currently, research works such as Progressive GAN Karras et al. (2017) can generate high resolution and almost-real looking images from nothing but just random numbers as input. With the advent of so many GAN models, reproducibility and easy consumability becomes a challenge. “How easy is it to use a known GAN model for my task?" and “What are the coding and technical prerequisites required to work on GAN?" are the two of the many open challenges for using GAN models. Despite the presence of GAN zoo’s such as TFGAN 111, Keras-GAN 222, PyTorch-GAN333, building custom GAN models is challenging for the following reasons: (i) to implement from scratch, there is a requirement for expert level understanding of the libraries such as PyTorch, Keras, or Tensorflow, (ii) modifying an existing GAN

, is challenging as the entire GAN model is tightly coupled at a code-level and modifying a small component such as only generator, or discriminator, or a loss function requires tinkering of the entire code, (iii)

mix-and-match of different GAN components is not possible as there is no universal GAN architecture and also models implemented in different libraries are not interoperable with each other. Due to these challenges, there is huge learning curve for software engineers to start implementing and playing around with GAN models. Also, researchers struggle in reproducing and comparing results of different GAN models for their research papers.

Authoring GAN models is a struggling and time taking experience for both novice and expert users. There is a need for a easy-to-use authoring system that is agnostic of library or language underneath and gives a very intuitive interface for users to play around with GAN models. Thus, in this paper we propose, AuthorGAN444The AuthorGAN system will be made available as an open source system upon the acceptance of this paper., a GAN authoring system with the following research contributions:

  1. To modularize the GAN architecture into reusable components and propose a library-agnostic, abstract, modularized representation for GAN models,

  2. Develop a highly extensible, no-code system implementing GAN models. The toolkit has an intuitive visual interface for authoring GAN models from scratch without the need for writing code in any library. The toolkit is available here:

  3. A qualitative user survey detailing the challenges faced in developing and implementing GAN models by different kinds of users,

  4. Experimental results to quantify the performance of the different GAN models developed using our system on benchmark MNIST and dataset.

2 Preliminary User Study

Figure 1: (On the left) The distribution of the participants based on their expertise with GAN models in the qualitative user survey conducted. (On the right) The different challenged faced by the user survey participants in authoring and consuming GAN models.

The initial step is to understand the challenges involved in developing and consuming GAN models from the different kinds of users. A quantitative survey was conducted across different developers, software engineers, and researchers from various organizations and academic institutions. of the participants were male with a very diverse age distribution between -. The survey was conducted among those participants who had minimum knowledge about GANs with almost of them rating themselves as experts, as shown in Figure 1, and having hands-on experience with GANs. It was quite interesting to note that almost (highest) learnt about GANs from research papers while only learnt it from basic course material and tutorials. In fact, as a free text feedback many participants requested for the need of a MOOC or a structured course in learning GANs.

An overwhelming of the participants expressed that they had challenges. As shown in Figure 1, out of the various challenges expressed, the most popular one was “Coding is very difficult" () while author challenges included “Enough data is not available" () and “I do not understand GAN" (). This motivates the need for a no-coding based system for developing GAN models as developers and researchers find it non-intuitive to code GAN models with minimal learning. Additionally, the system should intuitive to use with detailed documentation for consumers who find it difficult to understand GANs. It is interesting to note that of the participants who rated themselves as “competent" or “expert" voted that it was not easy for them to code GAN model or that they eventually gave up. Some of the free text feedback that the participants provided as shown here,

  • "An easy to use interface in which we can specify model parameters and architecture quickly without much coding."

  • "Easy to use Interface to stitch components of GAN’s with easy to edit logic/parameters"

  • "Easily and quickly change the architecture and hyperparameters to experiment models."

  • "Given dimensionality/ modality of the input and the output, if the system could predict the configuration of GAN to be used, it would be really great!"

Thus, there is a need for having an intuitive system that could democratize the development of GAN authoring and make GAN consumable by users with different technical prowess.

Figure 2: The different mix-and-match GAN models that could be implemented using our modular GAN framework.

3 System Architecture

3.1 Modularization of GAN Architecture

After studying different named GAN variations, we identified that a standard GAN model would contains six different modules, as follows:

  1. Real training data: This is the input data whose distribution is to be learnt and regenerated. Some examples include MNIST dataset, CIFAR-10 dataset, or movie review text dataset.

  2. Generator:

    It is a generative function that takes a random noise vector (latent space) as input and generates an output that is of the same modality and dimensions of the real training data. Some examples include Deconvolutional network, or recurrent neural network.

  3. Discriminator:

    This is typically a classifier which learns from the real training data and the data generated by the generator (fake data). The primary task of the discriminator is distinguish between the real data and fake data. Some examples include multi-layer perceptron, convolutional neural network, or regression.

  4. Loss function: Two different loss functions are used; one for computing the generator’s loss and the other for the discriminator’s loss. Some standard loss functions include euclidean loss, cross-entropy loss, or hinge loss.

  5. Optimizer function:

    Two different optimizers are used to learn the generator and discriminator, seperately. Some standard optimizers are SGD or RMSProp.

  6. Training process:

    If both the generator and the discriminator adopt a network based architecture, standard back-propogation based techniques could be used for a model update. However, certain GANs required much sophisticated training process such as reinforcement learning. The overall learning process and its parameters are detailed in this module.

The primary idea of GAN modularization is that the different modules could be combined seamlessly and a mix-and-match GAN architecture could be defined easily. For example, the generator of the DCGAN can be combined with the discriminator of a WGAN with the loss functions and optimizers from a CGAN to build a novel GAN architecture.

3.2 Library Agnostic Abstract GAN Representation

Consider the example of a popular Deep Convolutional GAN (DCGAN) model, as shown in Figure 2. The PyTorch implementation of the model would roughly contain lines of code555 and the tensorflow implementation would require lines of code666, with the requirement of expertise for model customization. However, we propose a simple JSON representation of defining a GAN model, extending the modules explained in the previous section. The most simplistic realization of the DCGAN architecture is shown below:


Providing this config file as the only input to our AuthorGAN system is enough to train a GAN model and obtain the performance results. There is no need to write even a single line of code. Thus, defining this JSON object does not require any expertise in Python, PyTorch, or GANs. Additionally, this JSON representation is library agnostic and as shown in Figure 2, multiple library drivers (Keras/ PyTorch) could be written to parse the JSON object into the respective static computational graphs. Moreover, the proposed abstract representation provides enough flexibility to define every configurable parameter of the DCGAN model. Thus, the abstract representation not only provides the easiness of authoring GAN model for novice level users but also offers flexibility to provide detailed parameters for expert level users.

4 Visual Authoring

Figure 3: An example illustrating the power of the visual authoring capability of AuthorGAN to design the DCGAN model Radford et al. (2015) in an from scratch.

The JSON object based GAN model representation offers abstraction over multiple libraries and enables cross library interoperability. However, true democratization of GAN authoring requires an intuitive authoring interface for entry level users to adopt GAN models. Thus, as shown in Figure 3, we developed an drag-and-drop interface to author GAN models. The drag-and-drop interface is built over a powerful open source node-red platform777 All the required functionality are available as a palette of nodes, from which the users could chose and design an architecture in an intuitive fashion. Each node (function) could be parameterized through the user interface, avoiding the task of creating the abstract JSON object from scratch. The node-red based user interface internally generates the JSON object which is used by the backend library drivers for model creation and training.

Further, customizing GAN models and authoring GAN models from scratch is super fun and easy using the user interface of AuthorGAN. As shown in Figure 3, the basic layers of a neural network architecture are provided in the node palette. A list of layers is provided, grouped under seven categories: (i) convolutional layers, (ii) recurrent layers, (iii) core layers, (iv) activation layers, (v) loss layers, (vi) optimization layers, and (vii) normalization layers. GAN architectures could be designed from scratch and the backend drivers could additionally read the custom generator or discriminator architecture and create the static computational graphs in the respective libraries. Thus, GAN models could be visually authored using the intuitive user interface of AuthorGAN system. To the best of our knowledge, this is the first visual authoring system for designing and building GAN models.

5 Experimental Analysis

The performance of the different implemented GAN models is shown using the benchmark MNIST handwritten image classification dataset. The task of each GAN would be to generate digit images similar to the MNIST dataset. We implement five different GAN models using our AuthorGAN frameowork: Vanilla GAN (GAN) Goodfellow et al. (2014), Conditional GAN (CGAN) Mirza and Osindero (2014), Deep Convolutional GAN (DCGAN) Radford et al. (2015), Wasserstein GAN (WGAN) Arjovsky et al. (2017), Wasserstein GAN with Gradient Policy (WGAN-GP) Gulrajani et al. (2017). Addtionally, to show the flexbility of the proposed modularized AuthorGAN system, different mix-and-match GAN models are formed and their experimental performance is shown in this section.

A total of different GAN configurations are created and is defined as generator, discriminator: 1. GAN, GAN; 2. GAN, DCGAN; 3. GAN, WGAN; 4. GAN, WGAN_GP; 5. WGAN, GAN; 6. WGAN, DCGAN; 7. WGAN, WGAN; 8. WGAN, WGAN_GP; 9. WGAN_GP, GAN; 10. WGAN_GP, DCGAN; 11. WGAN_GP, WGAN; 12. WGAN_GP, WGAN_GP; 13. DCGAN, GAN; 14. DCGAN, DCGAN; 15. DCGAN, WGAN; 16. DCGAN, WGAN_GP.

Figure 4: (On the left) Generator loss, (On the middle) Discriminator loss, (On the right) Average training time per epoch in seconds; obtained for different GAN combinations implemented using the proposed system.

To compare the performance of the different configurations, a heatmap confusion matrix of the final generator and discriminator loss is shown in Figure 

4. Using other GAN libraries, only the diagonal elements of this confusion matrix could be obtained while obtaining the other elements of the confusion matrix is not straight forward. This demonstrates both the performance as well as the flexibility of the AuthorGAN system.

In terms of training time, the average time required to train one epoch (in seconds) is shown Figure 4. It can be observed that the configurations that use DCGAN’s generator or discriminator train much slower than the other configurations. This benchmark helps us in comparing different GAN model not only in terms of performance accuracy but also in terms of efficiency.

6 Discussion

As a testimonial to the importance of this problem and the need for easy-to-use systems there are a few GAN libraries in the literature, such as, PyTorchGAN, TFGAN, and KerasGAN. These open source libraries provide a collection of existing GAN implementation in their respective libraries. However as compared to the existing libraries, the proposed AuthorGAN system provides the following advantages,

  1. Highly modularized representation of GAN model for easy mix-and-match of components across architectures. For instance, one can use the generator component from DCGAN and the discriminator component from CGAN, with the training process of WGAN. While this could be done in other libraries, typically, they are coding intensive and require expert level knowledge.

  2. An abstract representation of GAN architecture to provide multi-library support. Currently, we are providing backend PyTorch and Keras support for the provided JSON object, while in the future we plan to support Tensorflow, Caffe and other libraries as well. Thus, the abstract representation is library agnostic.

  3. Coding free, visual designing of GAN models. A highly intuitive drag-and-drop based visual designer is provided to author GANs and there is no need for writing any code to train the GAN model.

During the process of building this authoring system, there were a few interesting observations that we made and are discussed as follows:

  1. The quantitative user survey conducted, additionally points out that using evaluation metrics to evaluate GAN models and using transfer learning to speed up the training process are not popularly known among GAN users. Thus, it is required to focus on these aspects of the system in the successive versions.

  2. The participants of the user survey showed equal interest in using three different libraries: PyTorch, Keras, and Tensorflow. This represents that there is a requirement for a stronger knitting between these three libraries and the proposed AuthorGAN system could act as the entry point to use any of these library models.

  3. It is to be understood that this is a continuously evolving system. The usability could be achieved by adding more state-of-art GAN models into this system. Thus, we plan to make this whole system and framework open source, to benefit the entire GAN community.

7 Conclusion and Future Directions

Thus, in this research an AuthorGAN system is proposed and developed, to achieve true democratization of authoring GAN models. A highly modular and abstract GAN represented was defined to allow interoperability across different dimensions. An easy-to-use intuitive visual designer was developed to allow novice users construct custom GAN architectures. The benchmark results of various GANm models implemented using our system was shown on benchmark MNIST dataset.

As immediate extension to this system, the following future directions are identified: (i) Implement atleast GAN models as a part of this system, by modularizing every GAN architecture, (ii) extend the framework to support different training process such as reinforcement learning, and (iii) support other data types such as text data and speech data to increase the scope of usage.


  • M. Arjovsky, S. Chintala, and L. Bottou (2017) Wasserstein gan. arXiv preprint arXiv:1701.07875. Cited by: §5.
  • D. M. Blei, A. Y. Ng, and M. I. Jordan (2003) Latent dirichlet allocation.

    Journal of machine Learning research

    3 (Jan), pp. 993–1022.
    Cited by: §1.
  • I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1, §5.
  • I. Goodfellow (2016) NIPS 2016 tutorial: generative adversarial networks. arXiv preprint arXiv:1701.00160. Cited by: §1.
  • I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. C. Courville (2017) Improved training of wasserstein gans. In Advances in Neural Information Processing Systems, pp. 5767–5777. Cited by: §5.
  • G. E. Hinton (2012) A practical guide to training restricted boltzmann machines. In Neural networks: Tricks of the trade, pp. 599–619. Cited by: §1.
  • T. Karras, T. Aila, S. Laine, and J. Lehtinen (2017) Progressive growing of gans for improved quality. Stability, and Variation. arXiv preprint. Cited by: §1.
  • M. Mirza and S. Osindero (2014) Conditional generative adversarial networks. Manuscript: https://arxiv. org/abs/1709.02023. Cited by: §5.
  • L. R. Rabiner and B. Juang (1986) An introduction to hidden markov models. ieee assp magazine 3 (1), pp. 4–16. Cited by: §1.
  • A. Radford, L. Metz, and S. Chintala (2015) Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv preprint arXiv:1511.06434. Cited by: Figure 3, §5.
  • D. Reynolds (2015) Gaussian mixture models. Encyclopedia of biometrics, pp. 827–832. Cited by: §1.
  • P. Smolensky (1986) Information processing in dynamical systems: foundations of harmony theory. Technical report COLORADO UNIV AT BOULDER DEPT OF COMPUTER SCIENCE. Cited by: §1.
  • E. L. Sonnhammer, G. Von Heijne, A. Krogh, et al. (1998) A hidden markov model for predicting transmembrane helices in protein sequences.. In Ismb, Vol. 6, pp. 175–182. Cited by: §1.