1 Introduction
Current day machine learning algorithms based on deep neural networks have demonstrated impressive progress across multiple domains such as image classification, speech recognition etc. By stacking together multiple layers of linear and nonlinear operations deep neural networks have been able to learn and identify complex patterns in data. As a byproduct of these capabilities, deep neural networks have also become powerful enough to inadvertently identify sensitive information or features of data even in the absence of any additional side information. For example, consider a scenario where a user enrolls their facial image in a face recognition system for the purpose of access control. During enrollment, a feature vector is extracted from the image and stored in a database. Apart from the identity of the user, this feature vector potentially contains information that is sensitive to the user, such as the age, information that the user may never have expressly consented to provide. More generally, learned data representations could leak auxiliary information that the participants may never have intended to release. Information obtained in this manner can be used to compromise the privacy of the user or to be biased and unfair to the user. Therefore, it is imperative to develop representation learning algorithms that can
intentionally and permanently obscure sensitive information while retaining task dependent information. Addressing this problem is the central aim of this paper.A few recent attempts have been made to study related problems, such as learning censored [3], fair [14], or invariant [18] representations of data. The central idea of these approaches, collectively referred to as Adversarial Representation Learning
(ARL), is to learn a representation of data in an adversarial setting. These approaches couple together (i) an adversarial network that seeks to classify and extract sensitive information from a given representation, and (ii) an embedding network that is tasked with extracting a compact representation of data while preventing the adversarial network from succeeding at leaking sensitive information. To achieve their respective goals, the adversary is optimized to maximize the likelihood of the sensitive information, while the encoder is optimized to minimize the same likelihood i.e., adversary’s likelihood of the sensitive information, thereby leading to a zerosum game. We will henceforth refer to this formulation as
Maximum Likelihood Adversarial Representation Learning (MLARL).The zerosum game formulation of optimizing the likelihood, however, is practically suboptimal from the perspective of preventing information leakage. As an illustration consider a problem where the sensitive attribute has three categories. Let there be two instances where the adversary’s probability distribution of the sensitive label is (0.33, 0.17, 0.5) and (0.33., 0.33., 0.33.) and let the correct label be class 1 for both of them. In each of these cases the likelihood of the discriminator is the same i.e.,
but the former instance is more informative than the latter. Moreover, the potential of this formulation to prevent information leakage is predicated upon: (i) the existence of an equilibrium, and (ii) the ability of practical optimization procedures to converge to such an equilibrium. As we will show, in practice, the conditions necessary for convergence may not be satisfied. Therefore, when the optimization does not reach the equilibrium, a probability distribution with the minimum likelihood is the distribution that is most certain with the potential to leak the most amount of information. In contrast, the second instance is a uniform distribution over the sensitive labels and provides no information to the adversary. This solution corresponds to the maximum entropy distribution over the sensitive labels.
Contributions: Building on the observations above, we propose a framework, dubbed Maximum Entropy Adversarial Representation Learning (MaxEntARL), which optimizes an image representation with two major objectives, (i) maximally retain information pertinent to a given target attribute, and (ii) minimize information leakage about a given sensitive attribute. We pose the learning problem in an adversarial setting as a nonzero sum three player game between an encoder, a predictor and a discriminator (proxy adversary) where the encoder tries to maximize the entropy of the discriminator on the sensitive attribute and maximizes the likelihood of the predictor on the target attribute.
We analyze the equilibrium and convergence properties of the MLARL as well as the proposed MaxEntARL formulation using tools from nonlinear systems theory. We compare and evaluate the numerical performance of MLARL and MaxEntARL for fair classification tasks on the UCI dataset, illumination invariant classification on the Extended Yale B dataset and two fabricated tasks on the CIFAR10 and CIFAR100 datasets. On a majority of these tasks MaxEntARL outperforms all other baselines.
2 Related Work
Adversarial Representation Learning: In the context of image classification, adversarial learning has been utilized to learn representations that are invariant across domains [4, 5, 17], thereby enabling us to train classifiers on a source domain and utilize on a target domain.
The entire body of work devoted to learning fair and unbiased representations of data share many similarities to the adversarial representation learning problem. Early work on this topic did not involve an explicit adversary but shared the goal of learning representations with competing objectives. The concept of learning fair representations was first introduced by Zemel et al [19], where the goal was to learn a representation of data by “fair clustering” while maintaining the discriminative features of the prediction task. Building upon this work many approaches have been proposed to learn an unbiased representation of data while retaining its effectiveness for a prediction task. To remove influence of “nuisance variables” Louizos et al [14]
proposed variational fair autoencoder (VFAE), a joint optimization framework for learning an invariant representation and a prediction task. In order to improve fairness in the representation, they regularized the marginal distribution
through Maximum Mean Discrepancy (MMD).More recent approaches [3, 20, 1, 18] have used explicit adversarial networks to measure information content of sensitive attributes. These problems are set up as a minimax game between the encoder and the adversary. The encoder is setup to achieve fairness by maximizing the loss of the adversary i.e. minimizing negative loglikelihood of sensitive variables as measured by the adversary. Among these approaches, our proposed MaxEntARL formulation is most directly related to the Adversarial Invariant Feature Learning introduced by Xie et al. [18].
Optimization Theory for Adversarial Learning:
The formulation of adversarial representation learning poses unique challenges from an optimization perspective. The parameters of the models in ARL are typically optimized through stochastic gradient descent, either jointly
[3, 15] or alternatively [4]. The former is, however, more commonly used in practice and is a generalization of gradient descent. While the convergence properties of gradient descent and its variants are well understood, there is relatively little work on the convergence and stability of simultaneous gradient descent in adversarial minimax problems. Recently, Mescheder et al. [15] and Nagarajan et al. [16] both leveraged tools from nonlinear systems theory [9] to analyze the convergence properties of simultaneous gradient descent in the context of GANs. They show that without the introduction of additional regularization terms to the objective of the zerosum game, simultaneous gradient descent does not converge. Our convergence analysis of MLARL and MaxEntARL also leverages the same nonlinear systems theory tools and show the conditions under which they converge.3 Adversarial Representation Learning
The Adversarial Representation Learning setup involves observational input , a target attribute with classes and a sensitive attribute with classes . In this paper, we restrict ourselves to attributes over a discrete space with multiple labels. Our goal is to learn an embedding function that maps to from which we can predict a target attribute , while also minimizing information leakage about a known sensitive attribute i.e. class labels of attribute .
3.1 Problem Setting
The Adversarial Representation Learning problem is formulated as a game among three players, encoder , a target predictor , and a discriminator that serves as a proxy for an unknown adversary . After is learned and fixed, we train and evaluate an adversary with the aim of leaking information of the sensitive attribute that we sought to protect. Since the adversary is unknown to encoder at training, the encoder is trained against the discriminator , which thereby acts as a proxy for the unknown . An illustration of this setting is shown in Fig. 1. The encoder is modeled as a deterministic function, , the target predictor models the conditional distribution via and the discriminator models the conditional distribution via , where and are the ground truth labels for a given target and sensitive labels and , respectively.
3.2 Background
In existing formulations of ARL, the goal of the encoder is to maximize the likelihood of the target attribute, as measured by the target predictor , while minimizing the likelihood of the sensitive attribute, as measured by the discriminator . This problem (henceforth referred to as MLARL) was formally defined by Xie et al. [18] as a three player zerosum minimax game:
(1) 
where is a parameter that allows us to tradeoff between the two competing objectives for the encoder and,
where the terms reduce to the loglikelihood if the label distributions are ideal categorical distributions.
3.3 Maximum Entropy Adversarial Representation Learning
In the MaxEntARL formulation the goal of the encoder is to maximize the likelihood of the target attribute, as measured by the target predictor, while maximizing the uncertainty in the sensitive attribute, as measured by the entropy of the discriminator’s prediction. Formally, we define the MaxEntARL optimization problem as a three player nonzero sum game:
(2)  
where allows us to tradeoff between the two competing objectives for the encoder and,
where is the uniform distribution. The crucial difference between the MaxEntARL formulation and the MLARL formulation is the fact that while the encoder and the discriminator have competing objectives, in MLARL they directly compete against each other on the same metric (likelihood of sensitive attribute), while in MaxEntARL they are optimizing competing metrics that are related but not the exact same metric.
Optimizing the embedding function to maximize the entropy of the discriminator instead of minimizing its likelihood has one crucial practical advantage. Entropy maximization inherently does not need class labels for training. This is advantageous in settings where it is either, (i) Undesirable for the embedding function to have access to the sensitive label, potentially for privacy reasons., or (ii) Sensitive labels for the data points are unknown. For instance consider, a semisupervised scenario where only the desired label is known while the sensitive label is unknown. The embedding function can learn from such data by obtaining gradients from the entropy of the discriminator.
4 Theoretical Analysis
In this section we analyze the properties of the MaxEntARL formulation and compare it to the MLARL formulation, both in terms of equilibrium as well as convergence dynamics under simultaneous gradient descent.
4.1 Equilibrium
Theorem 1.
Given a fixed encoder , the optimal discriminator is and the optimal predictor is .
Proof.
The proof uses the fact that, given a fixed encoder , the objective is convex w.r.t. each distribution. Thus we can obtain the stationary point for and as a function of and , respectively. The detailed proof is included in the supplementary material. ∎
Therefore, both the optimal distributions and are functions of the encoder parameters . The objective for optimizing the encoder now reduces to:
where the first term is minimizing the uncertainty (negative loglikelihood) of the true target attribute label and the second term is maximizing unpredictability (as measured by entropy) across all the classes in the discriminator distribution, thereby, preventing leakage of any information about the sensitive attribute label. In contrast the corresponding objective of the MLARL problem is [18],
where the first term is minimizing the uncertainty (negative loglikelihood) of the true target attribute label, while the second term is maximizing uncertainty (loglikelihood) of only the true sensitive attribute label. However, by doing so, the encoder inadvertently becomes more certain about the other labels, and can still be informative to an adversary.
Equilibrium when : When the target and sensitive attributes are independent with respect to each other (e.g., age and gender), the two terms in the encoder optimization can both reach their optima simultaneously. Furthermore, the problem reduces to a nonzero sum two player game between the encoder and the discriminator in the MaxEntARL case and to a zerosum two player game between the same players in the case of MLARL.
Corollary 1.1.
When , let the optimum discriminator and predictor for an encoder be and respectively. The optimal encoder in the MaxEntARL formulation induces a uniform distribution in the discriminator over the classes of the sensitive attribute.
Proof.
The proof uses the fact that, given a fixed optimal discriminator , is independent of when . The detailed proof is included in the supplementary material. ∎
Equilibrium when : When the target and sensitive attributes are related to each other (e.g., beard and gender), the two terms in the encoder optimization cannot reach their optima simultaneously. In both the formulations, MLARL and MaxEntARL, the relative optimality of the two objectives depends on the tradeoff factor .
4.2 Convergence Dynamics
We analyze the standard algorithm (simultaneous stochastic gradient descent) for finding the equilibrium solution of such adversarial games. That is, we take simultaneous gradient steps in , and , which can be expressed as differential equations of the form:
(3)  
where the gradients define a vector field over .
The qualitative behavior of the aforementioned nonlinear system near any equilibrium point can be determined via linearization with respect to that point [9]. Restricting our attention to a sufficiently small neighborhood of the equilibrium point, the nonlinear state equations in (3) can be approximated by a linear state equation:
(4) 
where, is the Jacobian of the vector field evaluated at the chosen equilibrium point . For small neighborhoods around an equilibrium, the trajectories of the nonlinear system in (3) is expected to be “close” to the trajectories of the linear approximate system in (4).
Theorem 2 (Linearization).
Let be an equilibrium point for the nonlinear system, , where is continuously differentiable and is a neighborhood of the origin. Let, . Then,

The origin is unstable if Re for one or more of the eigenvalues of .
Proof.
See Theorem 4.7 of [9]. ∎
5 Numerical Experiments
In this section we will evaluate the efficacy of the proposed Maximum Entropy Adversarial Representation Learning model and compare it with other Adversarial Representation Learning baselines.
5.1 Three Player Game: Linear Case
As an illustrative example we analyze the convergence of both MLARL and MaxEntARL under the same setting. The encoder, discriminator and predictor are linear models with multiplicative weights and , respectively. We limit our model to this three variable setting for ease of analysis and visualization. Both predictor and the discriminator are optimizing crossentropy loss on binary labels. To observe the game between the three players we provide same data sample yet with different target and sensitive labels i.e., 4 samples with for target and sensitive labels. Loss is calculated as the average over all samples and corresponding vector field values are also computed. The stationary point of this game, for both MLARL and MaxEntARL, is at
and the gradient of the loss functions are zero at this point. We consider a small (
grid) neighborhood around the stationary point in the range for weights and visualize trajectories by following the vector field of the game.Figure 3 shows streamline plots of the vector field around for a point starting at the green location. In the MLARL case, we observe that when the predictor is fixed at , the trajectory for the encoder and the discriminator does not converge and rotates around the stationary point. In contrast, for the MaxEntARL method converges to the stationary point. When , the streamlines for both MLARL and MaxEntARL converge to . For an alternate formulation, where the discriminator is of the form , we found convergent behavior for both MLARL and MaxEntARL.
5.2 Mixture of Gaussians
In this experiment we seek to visualize and compare the representation learned by MaxEntARL and MLARL. We consider a mixture of 4 Gaussians with means at
and variance
in each case. Our model is a neural network with 2 hidden layer with 2 neuron in each layer. Each data sample has two attributes, color and shape. We setup the ARL problem with shape as the target attribute and color as the sensitive attribute. The encoder is a neural network with one hidden layer, mapping the 2D shape into another 2D embedding, and both the predictor and discriminator are logistic regression classifiers. The tradeoff parameter is set to
and the parameters are learned using the Adam optimizer with learning rate of . After learning the embedding function, we freeze its parameters and learn a logistic classifier as the adversary. The test accuracy of the adversary is 63% for MaxEntARL and 70% for MLARL. Therefore, by optimizing the entropy instead of the likelihood MaxEntARL is able to leak less information about the sensitive label compared to MLARL. Figure 4 shows the data and the learned embeddings.5.3 Fair Classification
We consider the setting of fair classification on two datasets from the UCI MLrepository [2], (a) The German credit dataset with 20 attributes for 1000 instances with target label being classifying bank account holders with good or bad credit and gender being the sensitive attribute, (b) The Adult income dataset has 45,222 instances with 14 attributes. The target is a binary label of annual income more or less than , while gender is the sensitive attribute. For both MLARL and MaxEntARL, the encoder is a NN with one hidden layer, discriminator is a NN with 2 hidden layers, and target predictor is linear logistic regression. Following MLARL [18] we choose 64 units in each hidden layer. We compare both ARL formulations with stateoftheart baselines LFR (Learning Fair Representations [19]), VAE (Variational Autoencoder [11]) and VFAE (Variational Fair Autoencoder [14]). For MaxEntARL, after learning the embedding, we again learn an adversary to extract the sensitive attribute.
Figure 5 show the results for the German and Adult datasets, for both the target and sensitive attributes. For German data, MaxEntARL’s prediction accuracy is 86.33% which is close to that of the original data (87%). Other models such as, LFR, VAE, VFAE and MLARL have target accuracies of 72.3%, 72.5%, 72.7% and 74.4% respectively. On the other hand, for the sensitive attribute, the MaxEntARL adversary’s accuracy is 72.7%. Other models reveal much more information with adversary accuracies of 80%, 80.5%, 79.5%, 79.7% and 80.2% for the original data, LFR, VAE, VFAE and MLARL, respectively. For the adult income dataset, the target accuracy for original data, MLARL and MaxEntARL is 85%, 84.4% and 84.6%, respectively, while the adversary’s performance on the sensitive attribute is 67.7% and 65.5% for MLARL and MaxEntARL, respectively.
5.4 Illumination Invariant Face Classification
We consider the task of face classification under different illumination conditions. We used the Extended Yale B dataset [6] comprising of face images of 38 people under different lighting conditions (directions of the light source) : upper right, lower right, lower left, upper left, or the front. Our target task is to identify one of the 38 people in the dataset with the direction of the light source being the sensitive attribute. We follow the experimental setup of Xie et al. [18] and Louizos et al. [14] using the same train/test split strategy and no validation set. samples are used for training and the rest of the 1,096 data samples are used for testing. Following the model setup in [18], the encoder is a one layer neural network, target predictor is a linear layer and the discriminator has two hidden layers where each hidden layer consists of 100 units. The parameters are trained using Adam [10] with a learning rate of and weight decay of .
Method  (lighting)  (identity) 

LR  96  78 
NN + MMD [13]    82 
VFAE [14]  57  85 
MLARL [18]  57  89 
MaxentARL  40  89 
We report baseline [13, 14, 18] results for this experiment in Table 1 and compare with the proposed MaxEntARL framework. Louizos et al. [14] regularize their neural networks via Maximum Mean Discrepancy to remove lighting conditions from data whereas Xie et al. [18] use the MLARL framework. The MaxEntARL achieves an accuracy of 89% for identity classification (same as MLARL) while outperforming MMD (82%) and VFAE (85%). In terms of protecting sensitive attribute i..e, illumination direction, adversary’s classification accuracy reduces from 57% for MLARL to 40.2% for MaxEntARL. It is clear from the table that, MaxEntARL is able to remove more information from the image compared to the baselines.
5.5 Cifar10
We create a new binary target classification problem on the CIFAR10 dataset[12]. The CIFAR10 dataset consists of 10 basic classes, namely, (‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’). We divide the classes into two groups: living and nonliving objects. We expect the living objects to have visually discriminative properties like smooth shapes compared to regular geometric shapes of nonliving objects. The target task is binary classification of an image into these two supersets with the underlying class label being the sensitive attribute. For example, the task of classifying an object as living (‘dog’ or ‘cat’) or nonliving (‘ship’ or ‘truck’) should not reveal any information about its underlying identity (‘dog’, ‘cat’, ‘truck’ or ‘ship’). But as we will see, this is a challenging problem and the image representation might not be able to prevent leakage of the sensitive label.
Implementation Details: We adopt the ResNet18 [8] architecture as the encoder, and the discriminator and adversary are 2layered neural networks with 256 and 64 neurons, respectively. The encoder and the target predictor are trained using SGD with momentum of 0.9, learning rate of and weightdecay of for the prediction task. Both the discriminator and the adversary, however, are trained using Adam with a learning rate of and weightdecay of
for 300 epochs.
Experimental Results: We evaluate performance of the predictor and adversary as we vary the tradeoff parameter . We first note that, ideally, the desired predictor accuracy is 100%, adversary accuracy is 10% (random chance for 10 classes) and adversary entropy is 2.3 nats (uniform distribution for 10 classes). Figure 6 (a)(b) shows the tradeoff achieved between predictor and adversary along with the corresponding normalized hypervolume (HV). For the predictor and adversary accuracy, the HV corresponds to area above the tradeoff curve, while for the predictor accuracy and adversary entropy the HV is the area under the curve.
We obtain these results by repeating all the experiments five times and retaining the nondominated solutions i.e., a solution that is no worse than any other solution in both the objectives. From these results, we observe that without privacy considerations, the representation achieves the best target accuracy but also leaks significant information. In contrast adversarial learning of the representation achieves a better tradeoff between utility and information leakage. Among ARL approaches, we observe that MaxEntARL is able to obtain a better tradeoff compared to MLARL. Furthermore, among all possible solutions, MaxEntARL achieves the solution closest to the ideal desired point.
5.6 Cifar100
Superclass  Main Class 

aquatic mammals  beaver, dolphin, otter, seal, whale 
fish  aquarium fish, flatfish, ray, shark, trout 
flowers  orchids, poppies, roses, sunflowers, tulips 
food containers  bottles, bowls, cans, cups, plates 
fruit and vegetables  apples, mushrooms, oranges, pears, sweet peppers 
household electrical devices  clock, computer keyboard, lamp, telephone, television 
household furniture  bed, chair, couch, table, wardrobe 
insects  bee, beetle, butterfly, caterpillar, cockroach 
large carnivores  bear, leopard, lion, tiger, wolf 
large manmade outdoor things  bridge, castle, house, road, skyscraper 
large natural outdoor scenes  cloud, forest, mountain, plain, sea 
large omnivores and herbivores  camel, cattle, chimpanzee, elephant, kangaroo 
mediumsized mammals  fox, porcupine, possum, raccoon, skunk 
noninsect invertebrates  crab, lobster, snail, spider, worm 
people  baby, boy, girl, man, woman 
reptiles  crocodile, dinosaur, lizard, snake, turtle 
small mammals  hamster, mouse, rabbit, shrew, squirrel 
trees  maple, oak, palm, pine, willow 
vehicles 1  bicycle, bus, motorcycle, pickup truck, train 
vehicles 2  lawnmower, rocket, streetcar, tank, tractor 
We formulate a new privacy problem on the CIFAR100 dataset. The dataset consists of 100 classes and are grouped into 20 superclasses (Table 2). Each image has a “fine” (the class to which it belongs) and a “coarse” (the superclass to which it belongs) label. We treat the “coarse” (superclass) and “fine” (class) labels as the target and sensitive attribute, respectively. So the encoder is tasked to learn features of the superclasses while not revealing the information of the underlying classes. We adopt ResNet18 as the encoder while the predictor, discriminator and adversary are all 2layered fully connected networks. The adversarial game is trained for 150 epochs, followed by training the adversary for 100 epochs while the parameters of the encoder are frozen.
Just as in the case of CIFAR10, we report the tradeoff achieved between predictor and adversary along with the corresponding normalized hypervolume (HV) in Fig. 6 (c)(d). Here we note that, ideally, we desire predictor accuracy of 100%, adversary accuracy of 1% (random chance for 100 classes) and adversary entropy of nats (uniform distribution for 100 classes). We make the following observations from the results. Firstly, the performance of the different approaches suggest that this task is significantly harder than the CIFAR10 task, with much lower achievable target accuracy and much higher adversary accuracy. Secondly, representation learning without privacy considerations leaks significant amount of information. Thirdly, MaxEntARL is able to significantly outperform MLARL on this task, achieving tradeoff solutions that are far better, both in terms of adversary accuracy and entropy of adversary.
6 Conclusion
This paper introduced a new formulation of Adversarial Representation Learning called Maximum Entropy Adversarial Representation Learning (MaxEntARL) for mitigating information leakage from learned representations under an adversarial setting. In this model, the encoder is optimized to maximize the entropy of the adversary’s distribution of a sensitive attribute as opposed to minimizing the likelihood (MLARL) of the true sensitive label. We analyzed the equilibrium and convergence properties of the MLARL and MaxEntARL. Numerical experiments on multiple datasets suggests that MaxEntARL is a promising framework for preventing information leakage from image representations, outperforming the baseline minimum likelihood objective.
7 Appendix
In this appendix we include proof of Theorem 1 in Section 7.1, Corollary 1.1 in Section 7.2 and finally provide the numerical values of the tradeoff fronts in the CIFAR10 and CIFAR100 experiment in Section 7.3.
7.1 Proof of Theorem 1
Theorem 3.
Given a fixed encoder , the optimal discriminator is and the optimal predictor is .
Proof.
Let, be the fixed encoder output from input i.e. . Let,
be the true joint distribution of the variables, i.e. input
, target label and sensitive label . The fixed encoder is a deterministic transformation of and generates an implicit distribution .Discriminator: The objective of the discriminator is,
(5)  
s.t.  
The Lagrangian dual of the problem can be written as
Now we take partial derivative of w.r.t. , the distribution of optimal discriminator. Therefore, the optimal discriminator satisfies,
(6)  
where we used the fact that, . Now summing w.r.t. to variable on the both sides of last line and using the fact that we get,
By substituting we obtain the solution for the optimal discriminator,
(7) 
Therefore,
Target Predictor: The objective of the predictor is,
(8)  
s.t.  
The Lagrangian dual of the problem can be written as
Now we take partial derivative of w.r.t. , the distribution of optimal predictor. The optimal predictor satisfies the equation.
(9)  
where we used the fact that, . Now summing w.r.t. to variable on the both sides of last line and using the fact that we get,
By substituting we obtain the solution of the optimal discriminator
(10) 
Therefore,
∎
7.2 Proof of Corollary 1.1
Corollary 3.1.
When , let the optimum discriminator and predictor for an encoder be and respectively. The optimal encoder in the MaxEntARL formulation induces a uniform distribution in the discriminator over the classes of the sensitive attribute.
Proof.
Here we will prove that, when discriminator is fixed, then the encoder learns a representation of data such that . First we note that although the discriminator is fixed, the discriminator probability can change by changing the encoder parameters . Optimization of the encoder in MaxEntARL is formulated as:
(11)  
The Lagrangian dual of the problem can be written as,
Here is a Lagrangian multiplier and is assumed to be a constant in the absence of any further information. Since , we have is independent of given from Theorem 3. Therefore, if we take derivative of w.r.t. and set it to zero we have:
(12)  
Using the first (nontrivial) constraint, we have
Hence, the probability distribution of the discriminator after the encoder’s parameters are optimized is . Thus, when the optimum discriminator parameters are fixed, the encoder optimizes the representation such that the discriminator does not leak any information, i.e., it induces a uniform distribution. ∎
7.3 CIFAR TradeOff
We report the numerical values of the target accuracy and adversary accuracy tradeoff results on the CIFAR10 and CIFAR100 experiments in Table 3 and Table 5, respectively. Similarly, we report the numerical values of the target accuracy and adversary entropy tradeoff results on the CIFAR10 and CIFAR100 experiments in Table 4 and Table 6, respectively.












References
 [1] A. Beutel, J. Chen, Z. Zhao, and E. H. Chi. Data decisions and theoretical implications when adversarially learning fair representations. arXiv preprint arXiv:1707.00075, 2017.
 [2] D. Dua and C. Graff. UCI machine learning repository, 2017.
 [3] H. Edwards and A. J. Storkey. Censoring representations with an adversary. In International Conference on Learning Representations (ICLR), 2016.

[4]
Y. Ganin and V. Lempitsky.
Unsupervised domain adaptation by backpropagation.
In International Conference on Machine Learning (ICML), 2015.  [5] Y. Ganin, E. Ustinova, H. Ajakan, P. Germain, H. Larochelle, F. Laviolette, M. Marchand, and V. Lempitsky. Domainadversarial training of neural networks. The Journal of Machine Learning Research, 17(1):2096–2030, 2016.
 [6] A. S. Georghiades, P. N. Belhumeur, and D. J. Kriegman. From few to many: Illumination cone models for face recognition under variable lighting and pose. IEEE Transactions on Pattern Analysis & Machine Intelligence, (6):643–660, 2001.
 [7] I. Goodfellow, J. PougetAbadie, M. Mirza, B. Xu, D. WardeFarley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets. In Advances in Neural Information Processing Systems (NeurIPS), pages 2672–2680, 2014.

[8]
K. He, X. Zhang, S. Ren, and J. Sun.
Identity mappings in deep residual networks.
In
European Conference on Computer Vision (ECCV)
, pages 630–645. Springer, 2016.  [9] H. K. Khalil. Nonlinear systems. PrinticeHall Inc, 1996.
 [10] D. P. Kingma and J. Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
 [11] D. P. Kingma and M. Welling. Autoencoding variational bayes. arXiv preprint arXiv:1312.6114, 2013.
 [12] A. Krizhevsky and G. Hinton. Learning multiple layers of features from tiny images. Technical report, Citeseer, 2009.
 [13] Y. Li, K. Swersky, and R. Zemel. Learning unbiased features. arXiv preprint arXiv:1412.5244, 2014.
 [14] C. Louizos, K. Swersky, Y. Li, M. Welling, and R. Zemel. The variational fair autoencoder. In International Conference on Learning Representations (ICLR), 2016.
 [15] L. Mescheder, S. Nowozin, and A. Geiger. The numerics of gans. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
 [16] V. Nagarajan and J. Z. Kolter. Gradient descent gan optimization is locally stable. In Advances in Neural Information Processing Systems (NeurIPS), pages 5585–5595, 2017.

[17]
E. Tzeng, J. Hoffman, K. Saenko, and T. Darrell.
Adversarial discriminative domain adaptation.
In
IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
, 2017.  [18] Q. Xie, Z. Dai, Y. Du, E. Hovy, and G. Neubig. Controllable invariance through adversarial feature learning. In Advances in Neural Information Processing Systems (NeurIPS), 2017.
 [19] R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In International Conference on Machine Learning (ICML), 2013.
 [20] B. H. Zhang, B. Lemoine, and M. Mitchell. Mitigating unwanted biases with adversarial learning. In AAAI/ACM Conference on AI, Ethics, and Society, 2018.
Comments
There are no comments yet.