1 Introduction
As we experience the widespread adoption of machine learning models in automated decisionmaking, we have witnessed increased reports of instances in which the employed model results in discrimination against certain groups of individuals datta2015automated ; sweeney2013discrimination ; bolukbasi2016man ; machinebias2016 . In this context, discrimination is defined as the unwanted distinction against individuals based on their membership to a specific tribal or group. For instance, machinebias2016 presents an example of a computerbased risk assessment model for recidivism, which is biased against certain ethnicities. In another example, datta2015automated demonstrates gender discrimination in online advertisements for web pages associated with employment. In addition to its ethical standpoint, equal treatment of different groups is legally required by many countries civilright1964 . Thus, research on fairness in machine learning encountered significant attention in recent years; see calmon2017optimized ; feldman2015certifying ; hardt2016equality ; zhang2018mitigating ; xu2018fairgan ; dwork2018decoupled ; fish2016confidence ; woodworth2017learning ; zafar2017fairness ; zafar2015fairness ; perez2017fair ; bechavod2017penalizing .
Antidiscrimination laws imposed by many countries typically evaluate fairness by notions of disparate treatment and disparate impact. We say a decisionmaking process suffers from disparate treatment if its decisions discriminate against individuals of a certain protected group based on their sensitive/protected attribute information. On the other hand, we say the disparate impact if the decisions adversely affect a protected group of individuals with certain sensitive attribute zafar2015fairness . In simpler words, disparate treatment is intentional discrimination against a protected group, while the disparate impact is an unintentional disproportionate outcome that hurts a protected group. To quantify fairness, several notions of fairness have been proposed in the recent decade calders2009building ; hardt2016equality . Examples of these notions include demographic parity,
equalized odds
, and equalized opportunity.Demographic parity condition requires that the model output (for example, assigned label) is independent of sensitive attributes. This definition might not be desirable when the base groundtruth outcome of the two groups are completely different. This shortcoming can be addressed using equalized odds notion hardt2016equality which requires that the model output is conditionally independent of sensitive attributes given the groundtruth label. Finally, equalized opportunity requires having equal false positive or false negative rates across protected groups. All above fairness notions require (conditional) independence between the model output and the sensitive attribute.
Machine learning approaches for imposing fairness can be broadly classified into three main categories: preprocessing methods, postprocessing methods, and inprocessing methods. Preprocessing methods modify the training data to remove discriminatory information before passing data to the decisionmaking process
calders2009building ; feldman2015certifying ; kamiran2010classification ; kamiran2009classifying ; kamiran2012data ; dwork2012fairness ; calmon2017optimized ; ruggieri2014using . These methods map the training data to a transformed space in which the dependencies between the class label and the sensitive attributes are removed edwards2015censoring ; hardt2016equality ; xu2018fairgan ; sattigeri2018fairness ; raff2018gradient ; madras2018learning ; zemel2013learning ; louizos2015variational . On the other hand, postprocessing methods adjust the output of a trained classifier to remove discrimination while maintaining high classification accuracy fish2016confidence ; dwork2018decoupled ; woodworth2017learning . The third category is the inprocess approach that enforces fairness by either introducing constraints or adding a regularization term to the training procedure zafar2017fairness ; zafar2015fairness ; perez2017fair ; bechavod2017penalizing ; berk2017convex ; agarwal2018reductions ; celis2019classification ; donini2018empirical ; rezaei2019fair ; kamishima2011fairness ; zhang2018mitigating ; bechavod2017penalizing ; kearns2017preventing ; menon2018cost ; alabi2018unleashing . The Rényi fair inference framework proposed in this paper also belongs to this inprocess category.Among inprocessing methods, many add a regularization term or constraints to impose statistical independence between the classifier output and the sensitive attributes. To do that, various independence proxies such as mutual information kamishima2011fairness , false positive/negative rates bechavod2017penalizing , equalized odds donini2018empirical , Pearson correlation coefficient zafar2015fairness ; zafar2017fairness , Hilbert Schmidt independence criterion perez2017fair were used. As will be discussed in section 2
, many of these methods cannot capture higher order dependence between random variables or lead to computationally expensive algorithms. Motivated by these limitations, we propose to use Rényi correlation to impose several known group fairness measures. Rényi correlation captures high order dependencies between random variables. Moreover, Rényi correlation is a normalized measure and can be computed efficiently in certain instances.
Using Rényi correlation coefficient as a regularization term, we propose a minmax optimization framework for fair statistical inference. We apply our Rényi framework to the classification problem and show that it can be solved up to firstorder stationarity using an iterative procedure. In addition to classification, we apply our regularization framework to the fair Kmeans clustering problem under disparate impact doctrine. Many recent works on fair clustering propose a twophase algorithm for solving the
centers and median fair clustering problems chierichetti2017fair ; backurs2019scalable ; rosner2018privacy ; bercea2018cost . In the first phase, data points are partitioned into small subsets, referred to as fairlets, that satisfy fairness requirements. Then in the second phase, these fairlets are merged to form clusters by one of the existing clustering algorithms. A similar approach was proposed by schmidt2018fair for the means clustering problem. These methods impose fairness as a hard constraint which can degrade the clustering quality significantly. Moreover, phase one is typically computationally expensive and limits the scalability of the method. However, Rényi fair means clustering method imposes fairness as a soft constraint and its complexity is no higher than the traditional means clustering method; also known as Lloyd’s algorithm. Finally, we evaluate the performance of our algorithms on Bank and Adult datasets.Contributions:
We introduce Rényi correlation as a tool to impose several notions of group fairness. Unlike Pearson correlation and HSIC, Rényi correlation captures higher order dependence of random variables.
Using Rényi correlation as a regularization term in training, we propose a minmax formulation for fair statistical inference. Unlike methods that use an adversarial neural network to impose fairness, we show that in particular instances (such as binary classification or sensitive attributes), it suffices to use a simple quadratic function as the adversarial objective. This observation helped us to develop a simple multistep gradient ascent descent algorithm for fair inference and guarantee its theoretical convergence to firstorder stationarity.
Our Rényi correlation framework leads to a natural fair classification method and a novel fair means clustering algorithm. For means clustering problem, we show that sufficiently large regularization coefficient yields perfect fairness under disparate impact doctrine. Unlike the twophase methods proposed in chierichetti2017fair ; backurs2019scalable ; rosner2018privacy ; bercea2018cost ; schmidt2018fair , our method does not require any preprocessing step, is scalable, and allows for regulating the tradeoff between the clustering quality and fairness.
2 Rényi Correlation as a Measure of Dependence
The most widely used notions for group fairness in machine learning are demographic parity, equalized odds, and equalized opportunities. These notions require (conditional) independence between a certain model output and a sensitive attribute. This independence is typically imposed by adding fairness constraints or regularization terms to the training objective function. For instance, kamishima2011fairness
added a regularization term based on mutual information. Since estimating mutual information between model output and sensitive variables during training is not computationally tractable,
kamishima2011fairnessapproximates the probability density functions using a logistic regression model. To have a tighter estimation,
song2019learning used an adversarial approach that estimates the joint probability density function using a parameterized neural network. Although these works start from a welljustified objective function, they end up solving approximations of the objective function due to computational barriers. Thus, no fairness guarantee can be provided even when the resulting optimization problems are solved to global optimality in the large sample size scenarios. A more tractable measure of dependence between two random variables is the Pearson correlation. The Pearson correlation coefficient between the two random variables and is defined as where denotes the covariance anddenotes the variance. The Pearson correlation coefficient is used in
zafar2015fairness to decorrelate the binary sensitive attribute and the decision boundary of the classifier. A major drawback of Pearson correlation is that it only captures linear dependencies between random variables. In fact, two random variables and may have strong dependence but have zero Pearson correlation. This property raises concerns about the use of the Pearson correlation for imposing fairness. Similar to the Pearson correlation, the HSIC measure proposed in perez2017fair may be zero even if the two variables have strong dependencies. While universal Kernels can be used to resolve this issue, they could arrive at the expense of computational interactability. In addition, HSIC is not a normalized dependence measure gretton2005kernel ; gretton2005measuring which raises concerns about the appropriateness of using it as a measure of dependence.In this paper, we suggest to use HirschfeldGebeleinRényi correlation renyi1959measures ; hirschfeld1935connection ; gebelein1941statistische as a dependence measure between random variables to impose fairness. Rényi correlation, which is also known as maximal correlation, between two random variables and is defined as
(1) 
where the supremum is over the set of measurable functions and satisfying the constraints. Unlike HSIC and Pearson correlation, Rényi correlation is a normalized measure that captures higherorder dependencies between random variables. Rényi correlation between two random variables is zero if and only if the random variables are independent, and it is one if there is a strict dependence between the variables renyi1959measures .In addition, as we will discuss in section 3, leads to a computationally tractable framework when used to impose fairness in certain cases. These computational and statistical benefits make Rényi correlation a powerful tool in the context of fair inference.
3 A General MinMax Framework for Rényi Fair Inference
Consider a learning task over a given random variable . Our goal is to minimize the average inference loss
where our loss function is parameterized with parameter
. To find the optimal value of parameter with the smallest average loss, we need to solve the following optimization problemwhere the expectation is taken over . Notice that this formulation is quite general and can include regression, classification, clustering, or dimensionality reduction tasks as special cases.
Assume that, in addition to minimizing the average loss, we are interested in bringing fairness to our learning task. Let be the sensitive attribute and be a certain output of our inference task using parameter . Assume we are interested in reducing the dependence between the random variable and the sensitive attribute . To balance the goodnessoffit and fairness, one can solve the following optimization problem
(2) 
where is a positive scalar balancing fairness and goodnessoffit. Notice that the above framework is quite general. For example, may be the assigned label in a classification task, the assigned cluster in a clustering task, or the output of a regressor in a regression task.
Using the definition of Rényi correlation, we can rewrite optimization problem (2) as
(3) 
where the supremum is taken over the set of measurable functions. The next natural question to ask is whether this optimization problem can be efficiently solved in practice. This question motivates the discussions of the following subsection.
3.1 Computing Rényi Correlation
The objective function in (3) may be nonconvex in in general. Several algorithms have been recently proposed for solving such nonconvex minmax optimization problems sanjabi2018convergence ; nouiehed2019solving ; jin2019minmax . Most of these methods require solving the inner maximization problem to (approximate) global optimality. More precisely, we need to be able to solve the optimization problem described in (1
). While popular heuristic approaches such as parameterizing the functions
and with neural networks can be used to solve (1), we focus on solving this problem in a more rigorous manner. In particular, we narrow down our focus to the discrete random variable case. This case holds for many practical sensitive attributes among which are the gender and race. In what follows, we show that in this case, (
1) can be solved “efficiently” to global optimality.Theorem 3.1 (Restated from witsenhausen1975sequences ).
Let and be two discrete random variables. Then the Rényi coefficient
is equal to the second largest singular value of the matrix
, where .The above theorem provides a computationally tractable approach for computing the Rényi coefficient. This computation could be further simplified when one of the random variables is binary.
Theorem 3.2 (Rephrased from Theorem 2 in farnia2015minimum and Theorem 3 in razaviyayn2015discrete ).
Suppose that is a discrete random variable and is a binary random variable. Let
be the onehot encoded version of
, i.e., if , where is theth standard unit vector. Let
. Then,(4) 
where Equivalently,
(5) 
Proof.
Consider the random variable and its onehot encoded version . Then any function can be equivalently represented as where . Hence this function is separable in the sense defined in farnia2015minimum . Consequently, Theorem 2 in farnia2015minimum implies the desired result. ∎
Let us specialize our framework to classification and clustering problems in the next two sections.
4 Rényi Fair Classification
In a typical (multiclass) classification problem, we are given samples from a random variable and the goal is to predict from . Here is the input feature vector, and is the class label. Let be the output of our classifier with
Here is that parameter of the classifier that needs to be tuned. For example, could represent the output of a neural network after softmax layer; or the soft probability label assigned by a logistic regression model. In order to find the optimal parameter , we need to solve the optimization problem
(6) 
where is the loss function and the expectation is taken over the random variable . Let be the sensitive attribute. We say a model satisfies demographic parity if the assigned label is independent of the sensitive attribute , see dwork2012fairness . Using our regularization framework, to find the optimal parameter balancing classification accuracy and fairness objective, we need to solve
(7) 
4.1 General Discrete Case
When is discrete, Theorem 3.1 implies that (7) can be rewritten as
(8) 
Here is the right singular vector corresponding to the largest singular value of Given training data sampled from the random variable , we can estimate the entries of the matrix using and where is the set of samples with sensitive attribute . Motivated by the algorithm proposed in jin2019minmax , we present Algorithm 1 for solving(8).
To understand the convergence behavior of Algorithm 1 for the nonconvex optimization problem (8), we need to first define an approximate stationary solution. Let us define . Assume further that has Lipschitz gradient, then is weakly convex; for more details check rafique2018non . For such weakly convex function, we say is a stationary solution if the gradient of its Moreau envelop is smaller than epsilon, i.e., with and is a given constant. The following theorem demonstrate the convergence of Algorithm 1.
Theorem 4.1 (Rephrased from Theorem 27 in jin2019minmax ).
4.2 Binary Case
When is binary, we can obtain a more efficient algorithm compared to Algorithm 1 by exploiting Theorem 3.2. Particularly, by a simple scaling of and ignoring the constant terms, the optimization problem (7) can be written as
(9) 
Defining , the above problem can be rewritten as
Thus, given training data sampled from the random variable , we need to solve
(10) 
Notice that the maximization problem in (10) is concave, separable, and has a closedform solution. Motivated by nouiehed2019solving , we propose Algorithm 2 for solving (10).
While the result in Theorem 4.1 can be applied to Algorithm 2, under the following assumption, we can achieve a better convergence rate using the methodology in nouiehed2019solving .
Assumption 4.1.
We assume that there exists a constant scalar such that
This assumption is reasonable when softmax is used. This is because we can always assume
lies in a compact set in practice, and hence the output of the softmax layer cannot be arbitrarily small.
Theorem 4.2 (Rephrased from nouiehed2019solving ).
Notice that this convergence rate is clearly a faster rate than the one obtained in Theorem 4.1.
Remark 4.3 (Extension to multiple sensitive attributes).
Our discrete Rényi classification framework can naturally be extended to the case of multiple discrete sensitivity attributes by concatenating all attributes into one. For instance, when we have two sensitivity attribute and , we can consider them as a single attribute corresponding to the four combinations of .
Remark 4.4 (Extension to other notions of fairness).
Our proposed framework imposes the demographic parity notion of group fairness. However, other notions of group fairness may be represented by (conditional) independence conditions. For such cases, we can again apply our framework. For example, we say a predictor satisfies equalized odds condition if the predictor is conditionally independent of the sensitive attribute given the true label . Similar to formulation (7), the equalized odds fairness notion can be achieved by the following minmax problem
(11) 
5 Fair Rényi Clustering
In this section, we apply the proposed fair Rényi framework to the widespread means clustering problem. Given a set of data points , in the means problem, we seek to partition them into clusters such that the following objective function is minimized:
(12) 
where is the centroid of cluster ; the variable if data point belongs to cluster and it is zero otherwise; and represent the association matrix and the cluster centroids respectively. Now, suppose we have an additional sensitive attribute for each one of the given data points. In order to have a fair clustering under disparate impact doctrine, we need to make the random variable independent of . In other words, we need to make the clustering assignment independent of the sensitive attribute . Using our framework in (2), we can easily add a regularizer to this problem to impose fairness under disparate impact doctrine. In particular, for binary sensitive attribute , using Theorem 3.2, and absorbing the constants into the hyperparameter , we need to solve
(13) 
where encodes the clustering information of data point and is the sensitive attribute for data point .
Fixing the assignment matrix , and cluster centers , the vector can be updated in closedform. More specifically, at each iteration equals to the current proportion of the privileged group in the th cluster. Combining this idea with the update rules of assignments and cluster centers in the standard Kmeans algorithm, we proposed Algorithm 3, which is a fair means algorithm under disparate impact doctrine.
The main difference between this algorithm and the popular means algorithm is in Step 6 of Algorithm 3. When , this step would be identical to the update of cluster assignment variables in means. However, when , Step 6 considers fairness when computing the distance considered in updating the cluster assignments.
5.1 Numerical Experiments
In this section, we evaluate our fair logistic regression, and fair kmeans algorithms by performing experiments on the standard Bank ^{1}^{1}1 https://archive.ics.uci.edu/ml/datasets/Bank%20Marketing. and Adult ^{2}^{2}2https://archive.ics.uci.edu/ml/datasets/adult. datasets. The Bank dataset contains the information of individuals contacted by a Portuguese bank institution. For this dataset, we sampled 3 continuous features: age, balance, and duration; and the sensitive attribute is the marital status of the individuals. The Adult dataset contains the census information of individuals including education, gender, capitalgain, and etc. We selected continuous features (age, fnlwgt, capitalgain, educationnum, hoursperweek), and sampled data samples for the fair Kmeans problem. For the fair logistic regression problem, we run our algorithm on both datasets considering all training samples. For the two datasets cases, we consider the sensitivity attribute to be the gender of the individuals. We implemented Algorithms 2 and 3 to solve the fair logistic regression and fair Kmeans clustering problem for the described datasets. The results are summarized in Figure 1.
We use the deviation of the elements of the vector as a measure of fairness. The element of represents the ratio of the number of data points that belong to the privileged group () in cluster over the number of data points in that cluster. The deviation of these elements is a measure for the deviation of these ratios across different clusters. A clustering solution is exactly fair if the all entries of are the same. For , we plot in Figure 1 the minimum, maximum, average, and average standard deviation of the entries of vector for different values of . For an exactly fair clustering solution, these values should be the same. As we can see in Figure 1 part (a) and (b), increasing yields exact fair clustering at the price of a higher clustering loss.
For fair logistic regression, we use Demographic Parity violation as a measure of fairness. The latter notion is defined as Smaller DP violation indicates a more fair solution. Plot (c) demonstrates the DP violation and training error for different values of . We can see the that increasing yields a more fair solution. This comes at the price of a larger training error. On the other hand, in (d) we show a plot of the DP violation and test error for different values of . As an interesting observation, we noticed that the test error initially decreases as we increase (between and ). This can be due to the bias of the unfair logistic regression against the protected group. Hence, our fairness term plays the role of a regularizer which improves the generalization. However, if we further increase , the test performance drops again indicating that the fairness term dominates for such high values of .
References
 (1) C. R. Act. Civil rights act of 1964, title vii,equal employment opportunities. 1964.
 (2) A. Agarwal, A. Beygelzimer, M. Dudík, J. Langford, and H. Wallach. A reductions approach to fair classification. arXiv preprint arXiv:1803.02453, 2018.
 (3) D. Alabi, N. Immorlica, and A. Kalai. Unleashing linear optimizers for groupfair learning and optimization. In Conference On Learning Theory, pages 2043–2066, 2018.
 (4) J. Angwin, J. Larson, S. Mattu, and L. Kirchner. Machine bias. ProPublica, 2016.
 (5) A. Backurs, P. Indyk, K. Onak, B. Schieber, A. Vakilian, and T. Wagner. Scalable fair clustering. arXiv preprint arXiv:1902.03519, 2019.
 (6) Y. Bechavod and K. Ligett. Penalizing unfairness in binary classification. arXiv preprint arXiv:1707.00044, 2017.
 (7) I. O. Bercea, M. Groß, S. Khuller, A. Kumar, C. Rösner, D. R. Schmidt, and M. Schmidt. On the cost of essentially fair clusterings. arXiv preprint arXiv:1811.10319, 2018.
 (8) R. Berk, H. Heidari, S. Jabbari, M. Joseph, M. Kearns, J. Morgenstern, S. Neel, and A. Roth. A convex framework for fair regression. arXiv preprint arXiv:1706.02409, 2017.
 (9) T. Bolukbasi, K.W. Chang, J. Y. Zou, V. Saligrama, and A. T. Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information processing systems, pages 4349–4357, 2016.
 (10) T. Calders, F. Kamiran, and M. Pechenizkiy. Building classifiers with independency constraints. In 2009 IEEE International Conference on Data Mining Workshops, pages 13–18. IEEE, 2009.
 (11) F. Calmon, D. Wei, B. Vinzamuri, K. N. Ramamurthy, and K. R. Varshney. Optimized preprocessing for discrimination prevention. In Advances in Neural Information Processing Systems, pages 3992–4001, 2017.
 (12) L. E. Celis, L. Huang, V. Keswani, and N. K. Vishnoi. Classification with fairness constraints: A metaalgorithm with provable guarantees. In Proceedings of the Conference on Fairness, Accountability, and Transparency, pages 319–328. ACM, 2019.
 (13) F. Chierichetti, R. Kumar, S. Lattanzi, and S. Vassilvitskii. Fair clustering through fairlets. In Advances in Neural Information Processing Systems, pages 5029–5037, 2017.
 (14) A. Datta, M. C. Tschantz, and A. Datta. Automated experiments on ad privacy settings. Proceedings on privacy enhancing technologies, 2015(1):92–112, 2015.
 (15) M. Donini, L. Oneto, S. BenDavid, J. S. ShaweTaylor, and M. Pontil. Empirical risk minimization under fairness constraints. In Advances in Neural Information Processing Systems, pages 2791–2801, 2018.
 (16) C. Dwork, M. Hardt, T. Pitassi, O. Reingold, and R. Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226. ACM, 2012.
 (17) C. Dwork, N. Immorlica, A. T. Kalai, and M. Leiserson. Decoupled classifiers for groupfair and efficient machine learning. In Conference on Fairness, Accountability and Transparency, pages 119–133, 2018.
 (18) H. Edwards and A. Storkey. Censoring representations with an adversary. arXiv preprint arXiv:1511.05897, 2015.

(19)
F. Farnia, M. Razaviyayn, S. Kannan, and D. Tse.
Minimum hgr correlation principle: From marginals to joint distribution.
In 2015 IEEE International Symposium on Information Theory (ISIT), pages 1377–1381. IEEE, 2015.  (20) M. Feldman, S. A. Friedler, J. Moeller, C. Scheidegger, and S. Venkatasubramanian. Certifying and removing disparate impact. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 259–268. ACM, 2015.
 (21) B. Fish, J. Kun, and Á. D. Lelkes. A confidencebased approach for balancing fairness and accuracy. In Proceedings of the 2016 SIAM International Conference on Data Mining, pages 144–152. SIAM, 2016.
 (22) H. Gebelein. Das statistische problem der korrelation als variationsund eigenwertproblem und sein zusammenhang mit der ausgleichsrechnung. ZAMMJournal of Applied Mathematics and Mechanics/Zeitschrift für Angewandte Mathematik und Mechanik, 21(6):364–379, 1941.
 (23) A. Gretton, O. Bousquet, A. Smola, and B. Schölkopf. Measuring statistical dependence with hilbertschmidt norms. In International conference on algorithmic learning theory, pages 63–77. Springer, 2005.
 (24) A. Gretton, R. Herbrich, A. Smola, O. Bousquet, and B. Schölkopf. Kernel methods for measuring independence. Journal of Machine Learning Research, 6(Dec):2075–2129, 2005.

(25)
M. Hardt, E. Price, N. Srebro, et al.
Equality of opportunity in supervised learning.
In Advances in neural information processing systems, pages 3315–3323, 2016.  (26) H. O. Hirschfeld. A connection between correlation and contingency. In Mathematical Proceedings of the Cambridge Philosophical Society, volume 31, pages 520–524. Cambridge University Press, 1935.
 (27) C. Jin, P. Netrapalli, and M. I. Jordan. Minmax optimization: Stable limit points of gradient descent ascent are locally optimal. arXiv preprint arXiv:1902.00618, 2019.
 (28) F. Kamiran and T. Calders. Classifying without discriminating. In 2009 2nd International Conference on Computer, Control and Communication, pages 1–6. IEEE, 2009.
 (29) F. Kamiran and T. Calders. Classification with no discrimination by preferential sampling. In Proc. 19th Machine Learning Conf. Belgium and The Netherlands, pages 1–6. Citeseer, 2010.
 (30) F. Kamiran and T. Calders. Data preprocessing techniques for classification without discrimination. Knowledge and Information Systems, 33(1):1–33, 2012.
 (31) T. Kamishima, S. Akaho, and J. Sakuma. Fairnessaware learning through regularization approach. In 2011 IEEE 11th International Conference on Data Mining Workshops, pages 643–650. IEEE, 2011.
 (32) M. Kearns, S. Neel, A. Roth, and Z. S. Wu. Preventing fairness gerrymandering: Auditing and learning for subgroup fairness. arXiv preprint arXiv:1711.05144, 2017.
 (33) C. Louizos, K. Swersky, Y. Li, M. Welling, and R. Zemel. The variational fair autoencoder. arXiv preprint arXiv:1511.00830, 2015.
 (34) D. Madras, E. Creager, T. Pitassi, and R. Zemel. Learning adversarially fair and transferable representations. arXiv preprint arXiv:1802.06309, 2018.
 (35) A. K. Menon and R. C. Williamson. The cost of fairness in binary classification. In Conference on Fairness, Accountability and Transparency, pages 107–118, 2018.
 (36) M. Nouiehed, M. Sanjabi, J. D. Lee, and M. Razaviyayn. Solving a class of nonconvex minmax games using iterative first order methods. arXiv preprint arXiv:1902.08297, 2019.
 (37) A. PérezSuay, V. Laparra, G. MateoGarcía, J. MuñozMarí, L. GómezChova, and G. CampsValls. Fair kernel learning. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pages 339–355. Springer, 2017.

(38)
E. Raff and J. Sylvester.
Gradient reversal against discrimination: A fair neural network
learning approach.
In
2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA)
, pages 189–198. IEEE, 2018.  (39) H. Rafique, M. Liu, Q. Lin, and T. Yang. Nonconvex minmax optimization: Provable algorithms and applications in machine learning. arXiv preprint arXiv:1810.02060, 2018.
 (40) M. Razaviyayn, F. Farnia, and D. Tse. Discrete rényi classifiers. In Advances in Neural Information Processing Systems, pages 3276–3284, 2015.
 (41) A. Rényi. On measures of dependence. Acta mathematica hungarica, 10(34):441–451, 1959.
 (42) A. Rezaei, R. Fathony, O. Memarrast, and B. Ziebart. Fair logistic regression: An adversarial perspective. arXiv preprint arXiv:1903.03910, 2019.
 (43) C. Rösner and M. Schmidt. Privacy preserving clustering with constraints. arXiv preprint arXiv:1802.02497, 2018.
 (44) S. Ruggieri. Using tcloseness anonymity to control for nondiscrimination. Trans. Data Privacy, 7(2):99–129, 2014.
 (45) M. Sanjabi, J. Ba, M. Razaviyayn, and J. D. Lee. On the convergence and robustness of training gans with regularized optimal transport. In Advances in Neural Information Processing Systems, pages 7091–7101, 2018.
 (46) P. Sattigeri, S. C. Hoffman, V. Chenthamarakshan, and K. R. Varshney. Fairness gan. arXiv preprint arXiv:1805.09910, 2018.
 (47) M. Schmidt, C. Schwiegelshohn, and C. Sohler. Fair coresets and streaming algorithms for fair kmeans clustering. arXiv preprint arXiv:1812.10854, 2018.
 (48) J. Song, P. Kalluri, A. Grover, S. Zhao, and S. Ermon. Learning controllable fair representations. arXiv preprint arXiv:1812.04218, 2019.
 (49) L. Sweeney. Discrimination in online ad delivery. arXiv preprint arXiv:1301.6822, 2013.
 (50) H. S. Witsenhausen. On sequences of pairs of dependent random variables. SIAM Journal on Applied Mathematics, 28(1):100–113, 1975.
 (51) B. Woodworth, S. Gunasekar, M. I. Ohannessian, and N. Srebro. Learning nondiscriminatory predictors. arXiv preprint arXiv:1702.06081, 2017.
 (52) D. Xu, S. Yuan, L. Zhang, and X. Wu. Fairgan: Fairnessaware generative adversarial networks. In 2018 IEEE International Conference on Big Data (Big Data), pages 570–575. IEEE, 2018.
 (53) M. B. Zafar, I. Valera, M. Gomez Rodriguez, and K. P. Gummadi. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. In Proceedings of the 26th International Conference on World Wide Web, pages 1171–1180. International World Wide Web Conferences Steering Committee, 2017.
 (54) M. B. Zafar, I. Valera, M. G. Rodriguez, and K. P. Gummadi. Fairness constraints: Mechanisms for fair classification. arXiv preprint arXiv:1507.05259, 2015.
 (55) R. Zemel, Y. Wu, K. Swersky, T. Pitassi, and C. Dwork. Learning fair representations. In International Conference on Machine Learning, pages 325–333, 2013.
 (56) B. H. Zhang, B. Lemoine, and M. Mitchell. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 335–340. ACM, 2018.
Comments
There are no comments yet.