Transforming the output of GANs by fine-tuning them with features from different datasets

by   Terence Broad, et al.
Goldsmiths' College

In this work we present a method for fine-tuning pre-trained GANs with features from different datasets, resulting in the transformation of the output distribution into a new distribution with novel characteristics. The weights of the generator are updated using the weighted sum of the losses from a cross-dataset classifier and the frozen weights of the pre-trained discriminator. We discuss details of the technical implementation and share some of the visual results from this training process.



page 3

page 4

page 5

page 6


Fine-tuning of Pre-trained End-to-end Speech Recognition with Generative Adversarial Networks

Adversarial training of end-to-end (E2E) ASR systems using generative ad...

Weight Poisoning Attacks on Pre-trained Models

Recently, NLP has seen a surge in the usage of large pre-trained models....

Discovering Visual Patterns in Art Collections with Spatially-consistent Feature Learning

Our goal in this paper is to discover near duplicate patterns in large c...

Fine-tuning of Language Models with Discriminator

Cross-entropy loss is a common choice when it comes to multiclass classi...

Alleviating Representational Shift for Continual Fine-tuning

We study a practical setting of continual learning: fine-tuning on a pre...

Tuning Modular Networks with Weighted Losses for Hand-Eye Coordination

This paper introduces an end-to-end fine-tuning method to improve hand-e...

LQF: Linear Quadratic Fine-Tuning

Classifiers that are linear in their parameters, and trained by optimizi...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

The motivation for this work was to find a way of transforming a generative model that had been trained on one distribution, to output a completely new distribution of images that did not model an existing dataset. We approached this by taking the generator from a pre-trained generative adversarial network (GAN) Goodfellow et al. (2014)

trained on one dataset (in this case ImageNet

Deng et al. (2009)) and then fine-tuned it with features from another dataset using a classifier trained on data from both datasets.

With this approach we were hoping not to simply model the distribution of images in the new dataset, but transform the generator so it outputs a new distribution of images that fuses visual features from both datasets, resulting in a distribution with novel characteristics. By starting from a pre-trained model with good initial weights, we hoped that this would preserve some aspects of the original distribution, such as the spatial structure of the images, but instilling it with some new characteristics from the other dataset.

2 Method

We created a dataset of approximately 14k images from Pinterest boards with the title a e s t h e t i c.111See Figure 2 in the Appendix for samples. Images from these boards can usually be characterised by having distinct, washed-our colour palettes (often with only one dominant colour in the image) and often the photographs are framed with no particular subject in focus.

We trained a binary classifier to classify between the a e s t h e t i c images222We also trained classifiers for other datasets with prominent aesthetic characteristics, but for posterity, we will only be discussing results from fine-tuning with the classifiers trained on the a e s t h e t i c dataset. and images from the ImageNet dataset Deng et al. (2009). To train the classifier we fine-tuned a pre-trained ResNet He et al. (2016) model that had been trained to weakly classify Instagram hastags and then ImageNet Mahajan et al. (2018). In addition to training the classifier to classify a e s t h e t i c images and ImageNet images as separate classes (contrastive features), we also—initially by accident—trained a classifier that classifies them as being in the same class (joint features), which led to significantly better results when used for fine-tuning the generator (see Section 3 for further discussion).

After training the cross-dataset classifier, we used this model to fine-tune the weights of a pre-trained BigGAN Brock et al. (2019) generator trained on the ImageNet dataset at a resolution of 128x128 pixels.333

For this we used ‘The author’s officially unofficial PyTorch BigGAN implementation’ and would like to thank the authors of the repository, Andrew Brock and Alex Andonian, for releasing the model weights for the discriminator as well as the generator, without which this work would not have been possible. We also used the frozen weights of the discriminator in the fine-tuning training procedure, updating the weights of the generator based on a weighted sum of the loss from the discriminator and the cross-dataset classifier (see Figure 1 for details). During this fine-tuning process, the networks are not exposed to any new training data, all the samples and losses are produced only using the pre-trained networks.

The process of training and convergence is very rapid. Usually within 1000 iterations (using a batch size of 9) the generator has converged onto a configuration of the weights that satisfies both the cross-dataset classifier and the discriminator. However we find that the best results were achieved using early stopping, often the most interesting visual results occurred when training was stopped after 300-600 iterations. Because training time is so quick, it is trivial to try multiple configurations of the parameter weighting and manually compare the visual results.


Sample Batch

Frozen Classifier

Frozen Discriminator

Weighted Sum of Losses

update generator weights
Figure 1: Diagram of training process: Batches of images are sampled from the pre-trained generator, which are fed to the cross-dataset classifier and the pre-trained discriminator (both of which have their weights frozen). The weights of the generator are updated based on a weighted sum of the losses from the classifier and discriminator.

3 Discussion and Conclusion

In the process of this work we have happened upon a number of surprising results. The manner in which features get combined from the different datasets was highly unexpected. Neither the results of fine-tuning using the contrastive features or the joint features classifier have resulted in producing images that resemble the images in either the ImageNet or a e s t h e t i c datasets.

The second surprising result is that when fine-tuning with the joint features classifier the visual results were much richer and varied (almost dreamlike in nature) than the results from fine-tuning with the contrastive features classifier (see Figures 4 and 5 in the Appendix for a detailed comparison). We speculate that the contrastive features classifier discards a lot of important features from the ImageNet distribution, so when the generator is fine-tuned, there are less combinations of features that can be used and the resulting distribution has a lot less variety.

In future research, we hope to find ways of having more control over what kind of characteristics from the different datasets get combined in the fine-tuning process, be that characteristics relating to aesthetic qualities, the structure and form in the images, or the stylistic qualities of a given dataset. We also hope to apply these techniques to higher resolution GAN models, but without having access to pre-trained discriminators, it is currently not possible to apply these techniques to the higher resolution generative models that have been made publicly available without retraining the models from scratch.


This work has been supported by UK’s EPSRC Centre for Doctoral Training in Intelligent Games and Game Intelligence (IGGI; grant EP/L015846/1).


  • [1] A. Brock, J. Donahue, and K. Simonyan (2019) Large scale GAN training for high fidelity natural image synthesis. In International Conference on Learning Representations, External Links: Link Cited by: §2.
  • [2] J. Deng, W. Dong, R. Socher, L. Li, K. Li, and L. Fei-Fei (2009) Imagenet: a large-scale hierarchical image database. In

    2009 IEEE conference on computer vision and pattern recognition

    pp. 248–255. Cited by: §1, §2.
  • [3] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio (2014) Generative adversarial nets. In Advances in neural information processing systems, pp. 2672–2680. Cited by: §1.
  • [4] K. He, X. Zhang, S. Ren, and J. Sun (2016) Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770–778. Cited by: §2.
  • [5] D. Mahajan, R. Girshick, V. Ramanathan, K. He, M. Paluri, Y. Li, A. Bharambe, and L. van der Maaten (2018) Exploring the limits of weakly supervised pretraining. In Proceedings of the European Conference on Computer Vision (ECCV), pp. 181–196. Cited by: §2.