Learning Controllable Fair Representations

12/11/2018 ∙ by Jiaming Song, et al. ∙ 12

Learning data representations that are transferable and fair with respect to certain protected attributes is crucial to reducing unfair decisions made downstream, while preserving the utility of the data. We propose an information-theoretically motivated objective for learning maximally expressive representations subject to fairness constraints. We demonstrate that a range of existing approaches optimize approximations to the Lagrangian dual of our objective. In contrast to these existing approaches, our objective provides the user control over the fairness of representations by specifying limits on unfairness. We introduce a dual optimization method that optimizes the model as well as the expressiveness-fairness trade-off. Empirical evidence suggests that our proposed method can account for multiple notions of fairness and achieves higher expressiveness at a lower computational cost.



There are no comments yet.


page 1

page 2

page 3

page 4

Code Repositories

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Statistical learning systems are increasingly being used to assess individuals, influencing consequential decisions such as bank loans, college admissions, and criminal sentences. This yields a growing demand for systems guaranteed to output decisions that are fair with respect to sensitive attributes such as gender, race, and disability.

In the typical classification and regression settings with fairness and privacy constraints, one is concerned about performing a single, specific task. However, situations arise where a data owner needs to release data to downstream users without prior knowledge of the tasks that will be performed (Madras et al., 2018). In such cases, it is crucial to find representations of the data that can be used on a wide variety of tasks while preserving fairness (Calmon et al., 2017).

This gives rise to two desiderata. On the one hand, the representations need to be expressive, so that they can be used effectively for as many tasks as possible. On the other hand, the representations also need to satisfy certain fairness constraints to protect sensitive attributes. Further, many notions of fairness are possible, and it may not be possible to simultaneously satisfy all of them (Kleinberg et al., 2016; Chouldechova, 2017). Therefore, the ability to effectively trade off multiple notions of fairness is crucial to fair representation learning.

To this end, we present an information theoretically motivated constrained optimization framework (Section 2). The goal is to maximize the expressiveness of representations while satisfying certain fairness constraints. We represent expressiveness as well as three dominant notions of fairness (demographic parity (Zemel et al., 2013)

, equalized odds, equalized opportunity 

(Hardt et al., 2016)) in terms of mutual information, obtain tractable upper/lower bounds of these mutual information objectives, and connect them with existing objectives such as maximum likelihood, adversarial training (Goodfellow et al., 2014)

, and variational autoencoders 

(Kingma and Welling, 2013; Rezende and Mohamed, 2015).

As we demonstrate in Section 3, this serves as a unifying framework for existing work (Zemel et al., 2013; Louizos et al., 2015; Edwards and Storkey, 2015; Madras et al., 2018) on learning fair representations. A range of existing approaches to learning fair representations, which do not draw connections to information theory, optimize an approximation of the Lagrangian dual of our objective with fixed values of the Lagrange multipliers. These thus require the user to obtain different representations for different notions of fairness as in Madras et al. (2018).

Instead, we consider a dual optimization approach (Section 4), in which we optimize the model as well as the Lagrange multipliers during training (Zhao et al., 2018), thereby also learning the trade-off between expressiveness and fairness. We further show that our proposed framework is strongly convex in distribution space.

Our work is the first to provide direct user control over the fairness of representations through fairness constraints that are interpretable by non-expert users. Empirical results in Section 5 demonstrate that our notions of expressiveness and fairness based on mutual information align well with existing definitions, our method encourages representations that satisfy the fairness constraints while being more expressive, and that our method is able to balance the trade-off between multiple notions of fairness with a single representation and a significantly lower computational cost.

2 An Information-Theoretic Objective for Controllable Fair Representations

We are given a dataset containing pairs of observations and sensitive attributes . We assume the dataset is sampled i.i.d. from an unknown data distribution . Our goal is to transform each data point into a new representation that is (1) transferable, i.e., it can be used in place of by multiple unknown vendors on a variety of downstream tasks, and (2) fair, i.e., the sensitive attributes are protected. For conciseness, we focus on the demographic parity notion of fairness (Calders et al., 2009; Zliobaite, 2015; Zafar et al., 2015)

, which requires the decisions made by a classifier over

to be independent of the sensitive attributes . We discuss in Appendix D how our approach can be extended to control other notions of fairness simultaneously, such as the equalized odds and equalized opportunity notions of fairness (Hardt et al., 2016).

We assume the representations of

are obtained by sampling from a conditional probability distribution

parameterized by

. The joint distribution of

is then given by . We formally express our desiderata for learning a controllable fair representation through the concept of mutual information:

  1. Fairness: should have low mutual information with the sensitive attributes .

  2. Expressiveness: should have high mutual information with the observations , conditioned on (in expectation over possible values of ).

The first condition encourages to be independent of ; if this is indeed the case, the downstream vendor cannot learn a classifier over the representations that discriminates based on . Intuitively, the mutual information is related to the optimal predictor of given . If is zero, then no such predictor can perform better than chance; if is large, vendors in downstream tasks could utilize to predict the sensitive attributes and make unfair decisions.

The second condition encourages to contain as much information as possible from conditioned on the knowledge of . By conditioning on , we ensure we do not encourage information in that is correlated with to leak into . The two desiderata allow to encode non-sensitive information from (expressiveness) while excluding information in (fairness).

Our goal is to choose parameters for that meet both these criteria111Simply ignoring as an input is insufficient, as may still contain information about .. Because we wish to ensure our representations satisfy fairness constraints even at the cost of using less expressive , we synthesize the two desiderata into the following constrained optimization problem:


where denotes the mutual information of and conditioned on , denotes mutual information between and

, and the hyperparameter

controls the maximum amount of mutual information allowed between and . The motivation of our “hard” constraint on – as opposed to a “soft” regularization term – is that even at the cost of learning less expressive and losing some predictive power, we view as important ensuring that our representations are fair to the extent dictated by .

Both mutual information terms in Equation 1 are difficult to compute and optimize. In particular, the optimization objective in Equation 1 can be expressed as the following expectation:

while the constraint on involves the following expectation:

Even though

is known analytically and assumed to be easy to evaluate, both mutual information terms are difficult to estimate and optimize.

To offset the challenge in estimating mutual information, we introduce upper and lower bounds with tractable Monte Carlo gradient estimates. We introduce the following lemmas, with the proofs provided in Appendix A. We note that similar bounds have been proposed in Alemi et al. (2016, 2017); Zhao et al. (2018); Grover et al. (2019).

2.1 Tractable Lower Bound for

We begin with a (variational) lower bound on the objective function related to expressiveness which we would like to maximize in Equation 1.

Lemma 1.

For any conditional distribution (parametrized by )

where is the entropy of conditioned on , and denotes KL-divergence.

Since entropy and KL divergence are non-negative, the above lemma implies the following lower bound:


2.2 Tractable Upper Bound for

Next, we provide an upper bound for the constraint term that specifies the limit on unfairness. In order to satisfy this fairness constraint, we wish to implicitly minimize this term.

Lemma 2.

For any distribution , we have:


Again, using the non-negativity of KL divergence, we obtain the following upper bound:


In summary, Equation 2 and Equation 4 imply that we can compute tractable Monte Carlo estimates for the lower and upper bounds to and respectively, as long as the variational distributions and

can be evaluated tractably, e.g., Bernoulli and Gaussian distributions. Note that the distribution

is assumed to be tractable.

2.3 A Tighter Upper Bound to via Adversarial Training

It would be tempting to use , the tractable upper bound from Equation 4, as a replacement for in the constraint of Equation 1. However, note from Equation 3 that is also an upper bound to , which is the objective function (expressiveness) we would like to maximize in Equation 1. If this was constrained too tightly, we would constrain the expressiveness of our learned representations. Therefore, we introduce a tighter bound via the following lemma.

Lemma 3.

For any distribution , we have:


Using the non-negativity of KL divergence as before, we obtain the following upper bound on :



is typically low-dimensional (e.g., a binary variable, as in 

Hardt et al. (2016); Zemel et al. (2013)), we can choose in Equation 5

to be a kernel density estimate based on the dataset

. By making as small as possible, our upper bound gets closer to .

While is a valid upper bound to , the term appearing in is intractable to evaluate, requiring an integration over . Our solution is to approximate

with a parametrized model

with parameters obtained via the following objective:


Note that the above objective corresponds to maximum likelihood prediction with inputs and labels using . In contrast to , the distribution is tractable and implies the following lower bound to :

It follows that we can approximate through the following adversarial training objective:


Here, the goal of the adversary is to minimize the difference between the tractable approximation given by and the intractable true upper bound . We summarize this observation in the following result:

Corollary 4.

If , then

for any distribution .

It immediately follows that when , i.e., the adversary approaches global optimality, we obtain the true upper bound. For any other finite value of , we have:


2.4 A practical objective for controllable fair representations

Recall that our goal is to find tractable estimates to the mutual information terms in Equation 1 to make the objective and constraints tractable. In the previous sections, we have derived a lower bound for (which we want to maximize) and upper bounds for (which we want to implicitly minimize to satisfy the constraint). Therefore, by applying these results to the optimization problem in Equation 1, we obtain the following constrained optimization problem:


where , , and are introduced in Equations 24 and 6 respectively.

Both and provide a way to limit . is guaranteed to be an upper bound to but also upper-bounds (which we would like to maximize), so it is more suitable when we value true guarantees on fairness over expressiveness. may more accurately approximate but is guaranteed to be an upper bound only in the case of an optimal adversary. Hence, it is more suited for scenarios where the user is satisfied with guarantees on fairness in the limit of adversarial training, and we wish to learn more expressive representations. Depending on the underlying application, the user can effectively remove either of the constraints or (or even both) by setting the corresponding to infinity.

3 A Unifying Framework for Related Work

Multiple methods for learning fair representations have been proposed in the literature. Zemel et al. (2013)

propose a method for clustering individuals into a small number of discrete fair representations. Discrete representations, however, lack the representational power of distributed representations, which vendors desire. In order to learn distributed fair representations,

Edwards and Storkey (2015), Eissman et al. (2018) and Madras et al. (2018) each propose adversarial training, where the latter (LAFTR) connects different adversarial losses to multiple notions of fairness. Louizos et al. (2015) propose VFAE for learning distributed fair representations by using a variational autoencoder architecture with additional regularization based on Maximum Mean Discrepancy (MMD) (Gretton et al., 2007). Each of these methods is limited to the case of a binary sensitive attribute because their measurements of fairness are based on statistical parity (Zemel et al., 2013), which is defined only for two groups.

Interestingly, each of these methods can be viewed as optimizing an approximation of the Lagrangian dual of our objective in Equation 10, with particular fixed settings of the Lagrangian multipliers:


where , and are defined as in Equation 10, and the multipliers are hyperparameters controlling the relative strengths of the constraints (which now act as “soft” regularizers).

We use “approximation” to suggest these objectives are not exactly the same as ours, as ours can deal with more than two groups in the fairness criterion and theirs cannot. However, all the fairness criteria achieve at a global optimum; in the following discussions, for brevity we use to indicate their objectives, even when they are not identical to ours222We also have not included the task classification error in their methods, as we do not assume a single, specific task or assume access to labels in our setting..

Here, the values of do not affect the final solution. Therefore, if we wish to find representations that satisfy specific constraints, we would have to search over the hyperparameter space to find feasible solutions, which could be computationally inefficient. We call this class of approaches Mutual Information-based Fair Representations (MIFR333Pronounced “Mipha”.). In Table 1, we summarize these existing methods.

Zemel et al. (2013) 0
Edwards and Storkey (2015) 0
Madras et al. (2018) 0
Louizos et al. (2015) 1
Table 1: Summarizing the components in existing methods. The hyperparameters (e.g. , , ) are from the original notations of the corresponding methods.
  • Zemel et al. (2013) consider as well as minimizing statistical parity (Equation 4 in their paper); they assume is discrete, bypassing the need for adversarial training. Their objective is equivalent to Equation 11 with .

  • Edwards and Storkey (2015) considers (where is Gaussian) and adversarial training where the adversary tries to distinguish the representations from two groups (Equation 9). Their objective is equivalent to Equation 11 with .

  • Madras et al. (2018) considers and adversarial training, which optimizes over surrogates to the demographic parity distance between two groups (Equation 4). Their objective is equivalent to Equation 11 with .

  • Louizos et al. (2015) consider , with and the maximum mean discrepancy between two sensitive groups () (Equation 8). However, as is the VAE objective, their solutions does not prefer high mutual information between and (referred to as the “information preference” property (Chen et al., 2016; Zhao et al., 2017b, a, 2018)). Their objective is equivalent to Equation 11 with .

All of the above methods requires hand-tuning to govern the trade-off between the desiderata, because each of these approaches optimizes the dual with fixed multipliers instead of optimizing the multipliers to satisfy the fairness constraints, is ignored, so these approaches cannot ensure that the fairness constraints are satisfied. Using any of these approaches to empirically achieve a desirable limit on unfairness requires manually tuning the multipliers (e.g., increase some until the corresponding constraint is satisfied) over many experiments and is additionally difficult because there is no interpretable relationship between the multipliers and a limit on unfairness.

Our method is also related to other works on fairness and information theory. Komiyama et al. (2018) solve least square regression under multiple fairness constraints. Calmon et al. (2017) transform the dataset to prevent discrimination on specific classification tasks. Zhao et al. (2018) discussed information-theoretic constraints in the context of learning latent variable generative models, but did not discuss fairness.

4 Dual Optimization for Controllable Fair Representations

In order to exactly solve the dual of our practical objective from Equation 10 and guarantee that the fairness constraints are satisfied, we must optimize the model parameters as well as the Lagrangian multipliers, which we do using the following dual objective:


where are the multipliers and and represent the constraints.

If we assume we are optimizing in the distribution space (i.e. corresponds to the set of all valid distributions ), then we can show that strong duality holds (our primal objective from Equation 10 equals our dual objective from Equation 12).

Theorem 5.

If , then strong duality holds for the following optimization problem over distributions and :


where denotes and denotes and .

We show the complete proof in Appendix A.4. Intuitively, we utilize the convexity of KL divergence (over the pair of distributions) and mutual information (over the conditional distribution) to verify that Slater’s conditions hold for this problem.

In practice, we can perform standard iterative gradient updates in the parameter space: standard gradient descent over , gradient ascent over (which parameterizes only the adversary), and gradient ascent over . Intuitively, the gradient ascent over corresponds to a multiplier increasing when its constraint is not being satisfied, encouraging the representations to satisfy the fairness constraints even at a cost to representation expressiveness. Empirically, we show that this scheme is effective despite non-convexity in the parameter space.

Note that given finite model capacity, an

that is too small may correspond to no feasible solutions in the parameter space; that is, it may be impossible for the model to satisfy the specified fairness constraints. Here we introduce heuristics to estimate the mimimum feasible

. The minimum feasible and can be estimated by running the standard conditional VAE algorithm on the same model and estimating the value of each divergence. Feasible can be approximated by , since ; This can easily be estimated empirically when is binary or discrete.

5 Experiments

We aim to experimentally answer the following:

  • Do our information-theoretical objectives align well with existing notions of fairness?

  • Do our constraints achieve their intended effects?

  • How do MIFR and L-MIFR compare when learning controllable fair representations?

  • How are the learned representations affected by other hyperparameters, such as the number of iterations used for adversarial training in ?

  • Does L-MIFR have the potential to balance different notions of fairness?

5.1 Experimental Setup

We evaluate our results on three datasets (Zemel et al., 2013; Louizos et al., 2017; Madras et al., 2018). The first is the UCI German credit dataset444https://archive.ics.uci.edu/ml/datasets, which contains information about 1000 individuals, with a binary sensitive feature being whether the individual’s age exceeds a threshold. The downstream task is to predict whether the individual is offered credit or not. The second is the UCI Adult dataset555https://archive.ics.uci.edu/ml/datasets/adult, which contains information of over 40,000 adults from the 1994 US Census. The downstream task is to predict whether an individual earns more than $50K/year. We consider the sensitive attribute to be gender, which is pre-processed to be a binary value. The third is the Heritage Health dataset666https://www.kaggle.com/c/hhp, which contains information of over 60,000 patients. The downstream task is to predict whether the Charlson Index (an estimation of patient mortality) is greater than zero. Diverging from previous work (Madras et al., 2018), we consider sensitive attributes to be age and gender, where there are 9 possible age values and 2 possible gender values; hence the sensitive attributes have 18 configurations. This prevents VFAE (Louizos et al., 2015) and LAFTR (Madras et al., 2018) from being applied, as both methods reply on some statistical distance between two groups, which is not defined when there are 18 groups in question777 is only defined for binary sensitive variables in (Madras et al., 2018)..

We assume that the model does not have access to labels during training; instead, it supplies its representations to an unknown vendor’s classifier, whose task is to achieve high prediction with labels. We compare the performance of MIFR, the model with fixed multipliers, and L-MIFR, the model using the Lagrangian dual optimization method. We provide details of the experimental setup in Appendix B. Specifically, we consider the simpler form for commonly used in VAEs, where is a fixed prior; the use of other more flexible parametrized forms of , such as normalizing flows (Dinh et al., 2016; Rezende and Mohamed, 2015)

and autoregressive models 

(Kingma et al., 2016; van den Oord et al., 2016), is left as future work.

We estimate the mutual information values and on the test set using the following equations:

where is estimated via the empirical statistics over the training set, and is estimated via kernel density estimation over samples from with sampled from the training set. Kernel density estimates are reasonable since both and are low dimensional (for example, Adult considers a 10-dimension for 40,000 individuals). However, computing or requires a summation over the training set, so we only compute these mutual information quantities during evaluation. We include our implementations in https://github.com/ermongroup/lag-fairness.

5.2 Mutual Information, Prediction Accuracy, and Fairness

Figure 1: The relationship between mutual information and fairness related quantities. Each dot is the representations from an instance of MIFR with a different set of hyperparameters. Green line represents features obtained via principle component analysis. Increased mutual information between inputs and representations increase task performance (left) and unfairness (right). For Health we do not include since it is not defined for more than two groups.
Figure 2: Corresponding values under different with L-MIFR. After is fixed, we consider a range of values for the other constraint, leading to a distribution of for each (hence the box plot).

We investigate the relationship between mutual information and prediction performance by considering area under the ROC curve (AUC) for prediction tasks. We also investigate the relationship between mutual information and traditional fairness metrics by considering the fairness metric in Madras et al. (2018), which compares the absolute expected difference in classifier outcomes between two groups. is only defined on two groups of classifier outcomes, so it is not defined for the Health

dataset when considering the sensitive attributes to be “age and gender”, which has 18 groups. We use logistic regression classifiers for prediction tasks.

From the results results in Figure 1, we show that there are strong positive correlations between and test AUC, and between and ; increases in decrease fairness. We also include a baseline in Figure 1 where the features are obtained via the top- principal components (where is the dimension of ), which has slightly better AUC but significantly worse fairness as measured by . Therefore, our information theoretic notions of fairness/expressiveness align well with existing notions such as /test AUC.

5.3 Controlling Representation Fairness with L-MIFR

Keeping all other constraint budgets fixed, any increase in for an arbitrary constraint implies an increase in the unfairness budget; consequently, we are able to trade-off fairness for more informative representations when desired.

We demonstrate this empirically via an experiment where we note the values corresponding to a range of budgets at a fixed configuration of the other constraint budgets (). From Figure 2, increases as increases, and holds under different values of the other constraints . This suggest that we can use to control (our fairness criteria) of the learned representations.

We further show the changes in (a traditional fairness criteria) values as we vary in Figure 3. In Adult, clearly increases as increases; this is less obvious in German, as is already very low. These results suggest that the L-MIFR user can control the level of fairness of the representations quantitatively via .

Figure 3: under different levels of with L-MIFR. generally increases as increases.

5.4 Improving Representation Expressiveness with L-MIFR

Recall that our goal is to perform controlled fair representation learning, which requires us to learn expressive representations subject to fairness constraints. We compare two approaches that could achieve this: 1) MIFR, which has to consider a range of Lagrange multipliers (e.g. from a grid search) to obtain solutions that satisfy the constraints; 2) L-MIFR, which finds feasible solutions directly by optimizing the Lagrange multipliers.

We evaluate both methods on 4 sets of constraints by modifying the values of (which is the tighter estimate of ) while keeping fixed, and we compare the expressiveness of the features learned by the two methods in Figure 4. For MIFR, we perform a grid search running configurations. In contrast, we run one instance of L-MIFR for each setting, which takes roughly the same time to run as one instance of MIFR (the only overhead is updating the two scalar values and ).

Figure 4: Expressiveness vs. . A larger feasible region (as measured by ) leads to more expressive representations (as measured by ).

In terms of representation expressiveness, L-MIFR outperforms MIFR even though MIFR took almost 25x the computational resources. Therefore, L-MIFR is significantly more computationally efficient than MIFR at learning controlled fair representation.

5.5 Ablation Studies

The objective requires adversarial training, which involves iterative training of with . We assess the sensitivity of the expressiveness and fairness of the learned representations to the number of iterations for per iteration for . Following practices in (Gulrajani et al., 2017) to have more iterations for critic, we consider , and use the same number of total iterations for training.

1 2 5 10
Table 2: Expressiveness and fairness of the representations from L-MIFR under various .

In Table 2, we evaluate and obtained L-MIFR on Adult () and Health (). This suggests that the final solution of the representations is not very sensitive to , although larger seem to find solutions that are closer to .

5.6 Fair Representations under Multiple Notions

Finally, we demonstrate how L-MIFR could control multiple fairness constraints simultaneously, thereby finding representations that are reasonably fair when there are multiple fairness notions being considered. We consider the Adult dataset, and describe the demographic parity, equalized odds and equalized opportunity notions of fairness in terms of mutual information, which we denote as , , respectively (see details in Appendix D about how and are derived).

For L-MIFR, we set and other values to . For MIFR, we consider a more efficient approach than random grid search. We start by setting every ; then we multiply the value for a particular constraint by until the constraint is satisfied by MIFR; we finish when all the constraints are satisfied888This allows MIFR to approach the feasible set from outside, so the solution it finds will generally have high expressiveness.. We find that this requires us to update the of , and four times each (so corresponding ); this costs 12x the computational resources needed by L-MIFR.

MIFR 9.34 9.39 0.09 0.10 0.07
L-MIFR 9.94 9.95 0.08 0.09 0.04
Table 3: Learning one representation for multiple notions of fairness on Adult. L-MIFR learns representations that are better than MIFR on all the measurements instead of only . Here for and for other constraints.

We compare the representations learned by L-MIFR and MIFR in Figure 3. L-MIFR outperforms MIFR in terms of , , and , while only being slightly worse in terms of . Since , the L-MIFR solution is still feasible. This demonstrates that even with a thoughtfully designed method for tuning , MIFR is still much inferior to L-MIFR in terms of computational cost and representation expressiveness.

6 Discussion

In this paper, we introduced an objective for learning controllable fair representations based on mutual information. This interpretation allows us to unify and explain existing work. In particular, we have shown that a range of existing approaches optimize an approximation to the Lagrangian dual of our objective with fixed multipliers, fixing the trade-off between fairness and expressiveness. We proposed a dual optimization method that allows us to achieve higher expressiveness while satisfying the user-specified limit on unfairness.

In future work, we are interested in formally and empirically extending this framework and the corresponding dual optimization method to other notions of fairness. It is also valuable to investigate alternative approaches to training the adversary (Gulrajani et al., 2017), the usage of more flexible  (Rezende and Mohamed, 2015), and alternative solutions to bounding .


This research was supported by NSF (#1651565, #1522054, #1733686), ONR (N00014-19-1-2145), AFOSR (FA9550-19-1-0024), and FLI.



Appendix A Proofs

a.1 Proof of Lemma 1


where the last inequality holds because KL divergence is non-negative. ∎

a.2 Proof of Lemma 2


a.3 Proof of Lemma 3


Again, the last inequality holds because KL divergence is non-negative. ∎

a.4 Proof of Theorem 5


Let us first verify that this problem is convex.

  • Primal: is affine in , convex in due to the concavity of , and independent of .

  • First condition: is convex in and (because of convexity of KL-divergence), and independent of .

  • Second condition: since and


    Let , . We have

    where we use the convexity of KL divergence in the inequality. Since is independent of , both and are convex in .

Then we show that the problem has a feasible solution by construction. In fact, we can simply let be some fixed distribution over , and for all . In this case, and are independent, so , . This corresponds to the case where is simply random noise that does not capture anything in .

Hence, Slater’s condition holds, which is a sufficient condition for strong duality. ∎

Appendix B Experimental Setup Details

We consider the following setup for our experiments.

  • For MIFR, we modify the weight for reconstruction error , as well as and for the constraints, which creates a total of configurations; values smaller since high values of prefers solutions with low