Invariant Representations from Adversarially Censored Autoencoders

05/21/2018 ∙ by Ye Wang, et al. ∙ Northeastern University MERL 1

We combine conditional variational autoencoders (VAE) with adversarial censoring in order to learn invariant representations that are disentangled from nuisance/sensitive variations. In this method, an adversarial network attempts to recover the nuisance variable from the representation, which the VAE is trained to prevent. Conditioning the decoder on the nuisance variable enables clean separation of the representation, since they are recombined for model learning and data reconstruction. We show this natural approach is theoretically well-founded with information-theoretic arguments. Experiments demonstrate that this method achieves invariance while preserving model learning performance, and results in visually improved performance for style transfer and generative sampling tasks.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

We consider the problem of learning data representations that are invariant to nuisance variations and/or sensitive features. Such representations could be useful for fair/robust classification zemel2013-fairrep ; louppe2016-pivot ; xie2017controllable , domain adaptation tzeng2017-AdvDomAdapt ; shen2017-AdvRepLearn , privacy preservation hamm2016-minimax ; iwasawa2017privacy , and style transfer mathieu2016disentangling . We investigate how this problem can be addressed by extensions of the variational autoencoder (VAE) model introduced by kingma2013-VAE

, where a generative model is learned as a pair of neural networks: an encoder that produces a representation

from data , and a decoder that reconstructs the data from the representation .

A conditional VAE sohn2015learning can be trained while conditioned on the nuisance/sensitive variable (i.e., the encoder and decoder each have as an additional input). In principle, this should yield an encoder that extracts representations that are invariant to , since the corresponding generative model (decoder) implicitly enforces independence between and . Intuitively, an efficient encoder should learn to exclude information about from , since

is already provided directly to the decoder. However, as we demonstrate in our experiments, invariance is not sufficiently achieved in practice, possibly due to approximations arising from imperfect optimization and parametric models. The adversarial feature learning approach of 

edwards2015-censor proposes training an unconditioned autoencoder along with an adversarial network that attempts to recover a binary sensitive variable from the representation . However, this approach results in a challenging tradeoff between enforcing invariance and preserving enough information in the representation to allow decoder reconstruction and generative model learning.

Our work proposes and investigates the natural combination of adversarial censoring with a conditional VAE, while also generalizing to allow categorical (non-binary) or continuous . Although an adversary is used to enforce invariance between and , the decoder is given both and as inputs enabling data reconstruction and model learning. This approach disentangles the representation from the nuisance variations , while still preserving enough information in to recover the data when recombined with . In Section 2.2, we present a theoretical interpretation for adversarial censoring as reinforcement of the representation invariance that is already implied by the generative model of a conditional VAE. Our experiments in Section 3 quantitatively and qualitatively show that adversarial censoring of a conditional VAE can achieve representation invariance while limiting degradation of model learning performance. Further, the performance in style transfer and generative sampling tasks appear visually improved by adversarial censoring (see Figures 3 and 4).

1.1 Further Discussion of Related Work

The variational fair autoencoder of louizos2015-VFAE extends the conditional VAE by introducing an invariance-enforcing penalty term based on maximum mean discrepancy (MMD). However, this approach is not readily extensible to non-binary or continuous .

Generative adversarial networks (GAN) and the broader concept of adversarial training were introduced by goodfellow2014GAN . The work of mathieu2016disentangling also combines adversarial training with VAEs to disentangle nuisance variations from the representation . However, their approach instead attaches the adversary to the output of the decoder, which requires a more complicated training procedure handling sample triplets and swapping representations, but also incorporates the learned similarity concept of larsen2016autoencoding . Our approach is much simpler to train since the adversary is attached to the encoder directly enforcing representation invariance.

Addressing the problem of learning fair representations zemel2013-fairrep , further work on adversarial feature learning louppe2016-pivot ; xie2017controllable ; hamm2016-minimax ; iwasawa2017privacy ; tzeng2017-AdvDomAdapt ; shen2017-AdvRepLearn

have used adversarial training to learn invariant representations tailored to classification tasks (i.e., in comparison to our work, they replace the decoder with a classifier). However, note that in 

louppe2016-pivot , the adversary is instead attached to the output of the classifier. Besides fairness/robustness, domain adaptation tzeng2017-AdvDomAdapt ; shen2017-AdvRepLearn and privacy hamm2016-minimax ; iwasawa2017privacy are also addressed. By considering invariance in the context of a VAE, our approach instead aims to produce general purpose representations and does not require additional class labels.

GANs have also been combined with VAEs in many other ways, although not with the aim of producing invariant representations. However, the following concepts could be combined in parallel with adversarial censoring. As mentioned earlier, in larsen2016autoencoding , an adversary attached to the decoder learns a similarity metric to enhance VAE training. In makhzani2015-adversarial ; mescheder2017-adversarial , an adversary is used to approximate the Kullback–Leibler (KL)-divergence in the VAE training objective, allowing for more general encoder architectures and latent representation priors. Both donahue2017adversarial and dumoulin2017adversarially independently propose a method to train an autoencoder using an adversary that tries to distinguish between pairs of data samples and extracted representations versus synthetic samples and the latent representations from which they were generated.

2 Formulation

In Section 2.1, we review the formulation of conditional VAEs as developed by kingma2013-VAE ; sohn2015learning . Sections 2.2 and 2.3 propose techniques to enforce invariant representations via adversarial censoring and increasing the KL-divergence regularization.

Figure 1: Training setup for adversarially censored VAE. The encoder and decoder are trained to maximize the sum of the objective terms (in three dotted boxes), while the adversary is trained to minimize its objective.

2.1 Conditional Variational Autoencoders

The generative model for the data involves an observed variable and a latent variable . The nuisance (or sensitive) variations are modeled by , while captures other remaining information. Since the aim is to extract a latent representation that is free of the nuisance variations in

, these variables are modeled by the joint distribution

, where and are explicitly made independent. The generative model is from a parametric family of distributions that is appropriate for the data. The latent prior

can be chosen to have a convenient form, such as the standard multivariate normal distribution

. No knowledge or assumptions about the nuisance variable prior are needed since it is not directly used in the learning procedure.

The method for learning this model involves maximizing the log-likelihood for a set of training samples with respect to the conditional distribution

This objective is analogous to minimizing the KL-divergence between the true conditional distribution of the data and the model since

where the expectation is with respect to and can be approximated by .

Using a variational posterior to approximate the actual posterior , the log-likelihood can be lower bounded by


where in each expectation. The quantity given by (1) is known as the variational or evidence lower bound (ELBO). The inequality in (1) follows since

Thus, by optimizing both and to maximize the lower bound , while is trained toward the true conditional distribution of the data, is trained toward the corresponding posterior .

In the VAE architecture, the generative model (decoder) and variational posterior (encoder) are realized as neural networks that take as input and , respectively, as illustrated in Figure 1, and output the parameters of their respective distributions. This architecture is specifically a conditional VAE, since the encoding and decoding are conditioned on the nuisance variable .

When the encoder is realized as conditionally Gaussian:


where the mean vector

and diagonal covariance matrix are determined as a function of , and the latent variable distribution is set to the standard Gaussian , the KL-divergence term in (1) can be analytically derived and differentiated kingma2013-VAE . However, the expectations in (1

) must be estimated by sampling.

Hence, the learning procedure maximizes a sampled approximation of the ELBO , given by


where, for ,


which approximates the expectations in (1) by sampling .

2.2 Representation Invariance via Adversarial Censoring

In principle, optimal training with ideal parametric approximations should result in an encoder that accurately approximates the true posterior , for which and are independent by construction. Thus, the theoretically optimal encoder should produce a representation that is independent of the nuisance variable . In practice, however, since the encoder is realized as a parametric approximation and globally optimal convergence cannot be guaranteed, we often observe that the representation produced by the trained encoder is significantly correlated with the nuisance variable . Further, one may wish to train an encoder that does not use as an input, to allow the representation to be generated from the data alone. However, this additional restriction on the encoder may increase the challenge of extracting invariant representations.

Invariance could be be enforced by minimizing the mutual information where is the latent representation generated by the encoder. Mutual information can be subtracted from the lower bound of (1), yielding

where equality is still met for . Thus, incorporating a mutual information penalty term into the lower bound does not, in principle, change the theoretical maximum. However, since computing mutual information is generally intractable, we apply the approximation technique of barber2003-IMalgorithm , which utilizes a variational posterior and the lower bound


where equality is met for equal to the actual posterior for which the expectation and entropies are defined with respect to. Hence, maximizing over the variational posterior , which can also be similarly realized as a neural network, yields an approximation of . The entropy , although generally unknown, is constant with respect to the optimization variables. Incorporating this variational approximation of the mutual information penalty into (3), modulo dropping the constant , results in the adversarial training objective


where are the same samples used for as given by (4), and is a parameter that controls the emphasis on invariance. Note that when

is a categorical variable (e.g., a class label), the additional, adversarial network to realize the variational posterior

is essentially just a classifier trained (by minimizing cross-entropy loss) to recover from the representation generated by the encoder. In this approach, the VAE is adversarially trained to maximize the cross-entropy loss of this classifier combined with the original objective given by (3). Figure 1 illustrates the overall VAE training framework including adversarial censoring.

2.3 Invariance via KL-divergence Censoring

Another approach to enforce invariance is to introduce a hyperparameter

to increase the weight of the KL-divergence terms in (4), yielding the alternative objective terms


for which (4) is the special case when . The KL-divergence terms can be interpreted as regularizing the variational posterior toward the latent prior, which encourages the encoder to generate representations that are invariant to not only but also the data . While increasing further encourages invariant representations, it potentially disrupts model learning, since the overall dependence on the data is affected.

3 Experiments

(a) Adversary Accuracy vs ELBO
(b) Mutual Information vs ELBO
Figure 2: Quantitative performance comparison. Smaller values along the x-axes correspond to better invariance. Larger values along the y-axis (ELBO) correspond to better model learning.

We evaluate the performance of various VAEs for learning invariant representations, under several scenarios for conditioning the encoder and/or decoder on the sensitive/nuisance variable :

  • Full: Both the encoder and decoder are conditioned on . In this case, the decoder is the generative model and the encoder is the variational posterior as described in Section 2.1.

  • Partial: Only the decoder is conditioned on . This case is similar to the previous, except that the encoder approximates the variational posterior without as an input.

  • Basic (unconditioned): Neither the encoder nor decoder are conditioned on . This baseline case is the standard, unconditioned VAE where is not used as an input.

In combination with these VAE scenarios, we also examine several approaches for encouraging invariant representations:

  • Adversarial Censoring: This approach, as described in Section 2.2, introduces an additional network that attempts to recover from the representation . The VAE and this additional network are adversarially trained according to the objective given by (6).

  • KL Censoring: This approach, as described in Section 2.3, increases the weight on the KL-divergence terms, using the alternative objective terms given by (7).

  • Baseline (none): As a baseline, the VAE is trained according to the original objective given by (3) without any additional modifications to enforce invariance.

(a) Partial – Baseline
(b) Partial – Adversarial censoring
(c) Partial – KL censoring
(d) Full – Baseline
(e) Full – Adversarial censoring
(f) Full – KL censoring
Figure 3: Style transfer with conditional VAEs. The top row within each image shows the original test set examples input the encoder, while the other rows show the corresponding output of the decoder when conditioned on different digit classes .

3.1 Dataset and Network Details

We use the MNIST dataset, which consists of 70,000 grayscale, pixel images of handwritten digits and corresponding labels in . We treat the vectorized images in as the data , while the digit labels serve as the nuisance variable . Thus, our objective is to train VAE models that learn representations that capture features (i.e., handwriting style) invariant of the digit class .

We use basic, multilayer perceptron architectures to realize the VAE (similar to the architecture used in 

kingma2013-VAE ) and the adversarial network. This allows us to illustrate how the performance of even very simple VAE architectures can be improved with adversarial censoring. We choose the latent representation to have 20 dimensions, with its prior set as the standard Gaussian, i.e., . The encoder, decoder, and adversarial networks each use a single hidden layer of 500 nodes with the activation function. In the scenarios where the encoder (or decoder) is conditioned on the nuisance variable, the one-hot encoding of is concatenated with (or , respectively) to form the input. The adversarial network uses a 10-dimensional softmax output layer to produce the variational posterior .

We use the encoder to realize the conditionally Gaussian variational posterior given by (2). The encoder network produces a 40-dimensional vector (with no activation function applied) that represents the mean vector concatenated with the log of the diagonal of the covariance matrix . This allows us to compute the KL-divergence terms in (4) analytically as given by kingma2013-VAE .

The output layer of the decoder network has 784 nodes and applies the sigmoid activation function, matching the size and scale of the images. We treat the decoder output, denoted by , as parameters of a generative model given by

where and are the components of and , respectively. Although not strictly binary, the MNIST images are nearly black and white, allowing this Bernoulli generative model to be a reasonable approximation. We directly display to generate the example output images.

We implemented these experiments with the Chainer deep learning framework 


. The networks were trained over the 60,000 image training set for 100 epochs with 100 images per batch, while evaluation and example generation were performed with the 10,000 image test set. The adversarial and VAE networks were each updated alternatingly once per batch with Adam 

kingma2014adam . Relying on stochastic estimation over each batch, we set the sampling parameter in (4), (6), and (7).

(a) Partial – Baseline
(b) Partial – Adversarial censoring
(c) Partial – KL censoring
(d) Full – Baseline
(e) Full – Adversarial censoring
(f) Full – KL censoring
Figure 4: Generative sampling with conditional VAEs. Latent representations are sampled from and input to the decoder to generate synthetic images, with the decoder conditioned on selected digit classes in .
(a) MNIST Examples
(b) Basic – Adversarial censoring
(c) Basic – Adversarial censoring
(d) Basic – Baseline
(e) Basic – KL censoring
(f) Basic – KL censoring
Figure 5: Generative sampling with unconditioned (“basic”) VAEs. Attempting to censor an unconditioned VAE results in severely degraded model performance.

3.2 Evaluation Methods

We quantitatively evaluate the trained VAEs for how well they:

  • Learn the data model: We measure this with the ELBO score estimated by computing over the test data set (see (3) and (4)).

  • Produce invariant representations: We measure this via the adversarial approach described in Section 2.2. Even when not using adversarial censoring, we still train an adversarial network in parallel (i.e., its loss gradients are not fed back into the main VAE training) that attempts to recover the sensitive variable from the representation . The classification accuracy and cross-entropy loss of the adversarial network provide measures of invariance. Since the digit class

    is uniformly distributed over

    , the entropy is equal to and can be combined with the cross-entropy loss (see (5) and barber2003-IMalgorithm ) to yield an estimate of the mutual information , which we report instead.

The VAEs are also qualitatively evaluated with the following visual tasks:

  • Style Transfer (Digit Change): An image from the test set is input to the encoder to produce a representation by sampling from . Then, the decoder is applied to produce the image , while changing the digit class to .

  • Generative Model Sampling: A synthetic image is generated by first sampling a latent variable from the prior , and then applying the decoder to produce the image for a selected digit class .

3.3 Results and Discussion

Figure 2 presents the quantitative performance comparison for the various combinations of VAEs with full (), partial (), or no conditioning (), and with invariance encouraged by adversarial censoring ( red —), KL censoring ( blue —), or nothing (black). Each pair of red and blue curves represent varying emphasis on enforcing invariance (as the parameters and are respectively changed) and meet at a black point corresponding to the baseline (no censoring) case (where and ).

Unsurprisingly, the baseline, unconditioned VAE produces a representation that readily reveals the digit class ( accuracy), since otherwise image reconstruction by the decoder would be difficult. However, even when partially or fully conditioned on , the baseline VAEs still significantly reveal (partial: , full: accuracies). Both adversarial and KL censoring are effective at enforcing invariance, with adversarial accuracy approaching chance and mutual information approaching zero as the parameters and are respectively increased. However, the adversarial approach has less of an impact on the model learning performance (as measured by the ELBO score). With conditional VAEs, adversarial censoring achieves invariance while having only a small impact on the ELBO score, and appears to visually improve performance (particularly for the partially conditioned case) in the style transfer and sampling tasks as shown in Figures 3 and 4. The worse model learning performance with KL censoring seems to result in blurrier (although seemingly cleaner) images, as also shown in Figures 3 and 4. Attempting to censor a basic (unconditioned) autoencoder (as proposed by edwards2015-censor ) rapidly degrades model learning performance, which manifests as severely degraded sampling performance as shown in Figure 5.

The results in Figures 3 and 4 correspond to specific points in Figure 2 as follows: (a) baseline , (b) left-most (), (c) left-most (), (d) baseline , (e) left-most (), (f) left-most (). Figure 5 results correspond to points in Figure 2 as follows: (a) MNIST test examples, (b-c) two left-most (), (d) baseline , (e-f) two left-most (). Note that larger values for the and parameters were required for the unconditioned VAEs to achieve similar levels of invariance as the conditioned cases.

4 Conclusion

The natural combination of conditional VAEs with adversarial censoring is a theoretically well-founded method to generate invariant representations that are disentangled from nuisance variations. Conditioning the decoder on the nuisance variable allows the representation to be cleanly separated and model learning performance to be preserved, since and are both used to reconstruct the data . Training VAEs with adversarial censoring visually improved performance in style transfer and generative sampling tasks.


  • (1) D. Barber and F. Agakov. The IM algorithm: a variational approach to information maximization. In Advances in Neural Information Processing Systems (NIPS), pages 201–208, 2003.
  • (2) J. Donahue, P. Krähenbühl, and T. Darrell. Adversarial feature learning. In

    Proceedings of the International Conference on Machine Learning (ICML)

    , 2017.
  • (3) V. Dumoulin, I. Belghazi, B. Poole, O. Mastropietro, A. Lamb, M. Arjovsky, and A. Courville. Adversarially learned inference. In Proceedings of the International Conference on Machine Learning (ICML), 2017.
  • (4) H. Edwards and A. Storkey. Censoring representations with an adversary. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.
  • (5) I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems (NIPS), pages 2672–2680, 2014.
  • (6) J. Hamm. Minimax filter: Learning to preserve privacy from inference attacks. Journal of Machine Learning Research, 18(129):1–31, 2017.
  • (7) Y. Iwasawa, K. Nakayama, I. E. Yairi, and Y. Matsuo. Privacy issues regarding the application of DNNs to activity-recognition using wearables and its countermeasures by use of adversarial training. In

    Proceedings of the International Joint Conferences on Artificial Intelligence (IJCAI)

    , pages 1930–1936, 2017.
  • (8) D. Kingma and J. Ba. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), 2015.
  • (9) D. P. Kingma and M. Welling. Auto-encoding variational bayes. In Proceedings of the International Conference on Learning Representations (ICLR), 2014.
  • (10) A. B. L. Larsen, S. K. Sønderby, H. Larochelle, and O. Winther. Autoencoding beyond pixels using a learned similarity metric. In Proceedings of the International Conference on Machine Learning (ICML), pages 1558–1566, 2016.
  • (11) C. Louizos, K. Swersky, Y. Li, M. Welling, and R. Zemel. The variational fair autoencoder. In Proceedings of the International Conference on Learning Representations (ICLR), 2016.
  • (12) G. Louppe, M. Kagan, and K. Cranmer. Learning to pivot with adversarial networks. In Advances in Neural Information Processing Systems (NIPS), pages 982–991, 2017.
  • (13) A. Makhzani, J. Shlens, N. Jaitly, I. Goodfellow, and B. Frey. Adversarial autoencoders. In Proceedings of the International Conference on Learning Representations (ICLR), 2016.
  • (14) M. F. Mathieu, J. J. Zhao, J. Zhao, A. Ramesh, P. Sprechmann, and Y. LeCun. Disentangling factors of variation in deep representation using adversarial training. In Advances in Neural Information Processing Systems (NIPS), pages 5040–5048, 2016.
  • (15) L. Mescheder, S. Nowozin, and A. Geiger. Adversarial variational bayes: Unifying variational autoencoders and generative adversarial networks. In Proceedings of the International Conference on Machine Learning (ICML), pages 2391–2400, 2017.
  • (16) J. Shen, Y. Qu, W. Zhang, and Y. Yu. Adversarial representation learning for domain adaptation. arXiv preprint arXiv:1707.01217, 2017.
  • (17) K. Sohn, H. Lee, and X. Yan. Learning structured output representation using deep conditional generative models. In Advances in Neural Information Processing Systems (NIPS), pages 3483–3491, 2015.
  • (18) S. Tokui, K. Oono, S. Hido, and J. Clayton. Chainer: a next-generation open source framework for deep learning. In Proceedings of Workshop on Machine Learning Systems (LearningSys) in The Twenty-ninth Annual Conference on Neural Information Processing Systems (NIPS), 2015.
  • (19) E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell. Adversarial discriminative domain adaptation. In Computer Vision and Pattern Recognition (CVPR), 2017.
  • (20) Q. Xie, Z. Dai, Y. Du, E. Hovy, and G. Neubig. Controllable invariance through adversarial feature learning. arXiv preprint arXiv:1705.11122, 2017.
  • (21) R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In Proceedings of the International Conference on Machine Learning (ICML), pages 325–333, 2013.