Fair and Consistent Federated Learning

by   Sen Cui, et al.
Tsinghua University
cornell university

Federated learning (FL) has gain growing interests for its capability of learning from distributed data sources collectively without the need of accessing the raw data samples across different sources. So far FL research has mostly focused on improving the performance, how the algorithmic disparity will be impacted for the model learned from FL and the impact of algorithmic disparity on the utility inconsistency are largely unexplored. In this paper, we propose an FL framework to jointly consider performance consistency and algorithmic fairness across different local clients (data sources). We derive our framework from a constrained multi-objective optimization perspective, in which we learn a model satisfying fairness constraints on all clients with consistent performance. Specifically, we treat the algorithm prediction loss at each local client as an objective and maximize the worst-performing client with fairness constraints through optimizing a surrogate maximum function with all objectives involved. A gradient-based procedure is employed to achieve the Pareto optimality of this optimization problem. Theoretical analysis is provided to prove that our method can converge to a Pareto solution that achieves the min-max performance with fairness constraints on all clients. Comprehensive experiments on synthetic and real-world datasets demonstrate the superiority that our approach over baselines and its effectiveness in achieving both fairness and consistency across all local clients.



There are no comments yet.


page 1

page 2

page 3

page 4


Unified Group Fairness on Federated Learning

Federated learning (FL) has emerged as an important machine learning par...

A Survey of Fairness-Aware Federated Learning

Recent advances in Federated Learning (FL) have brought large-scale mach...

Towards Multi-Objective Statistically Fair Federated Learning

Federated Learning (FL) has emerged as a result of data ownership and pr...

Learning to Collaborate

In this paper, we focus on effective learning over a collaborative resea...

Collaborative Fairness in Federated Learning

In current deep learning paradigms, local training or the Standalone fra...

Equality Is Not Equity: Proportional Fairness in Federated Learning

Ensuring fairness of machine learning (ML) algorithms is becoming an inc...

Introducing the Expohedron for Efficient Pareto-optimal Fairness-Utility Amortizations in Repeated Rankings

We consider the problem of computing a sequence of rankings that maximiz...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Federated learning (FL) yang2019federated refers to the paradigm of learning from fragmented data without sacrificing privacy. FL has aroused broad interests from diverse disciplines including high-stakes scenarios such as loan approvals, criminal justice, healthcare, etc xu2020federated . An increasing concern is whether these FL systems induce disparity in local clients in these cases. For example, ProPublica reported that an algorithm used across the US for predicting a defendant’s risk of future crime produced higher scores to African-Americans than Caucasians on average angwin2016machine . This has caused severe concerns from the public on the real deployment of data mining models and made algorithmic fairness an important research theme in recent years.

Existing works on algorithmic fairness in machine learning have mostly focused on individual learning scenarios. There has not been much research on how FL will impact the model fairness at different local clients

111We would like to emphasize that the algorithmic fairness we studied in this paper is not the federated fairness studied in li2019fair and mohri2019agnostic , as their fairness is referring to the performance at different clients, which is consistency.. Recently, Du et al. du2020fairness proposed a fairness-aware method, which considered the global fairness of the model learned through a kernel re-weighting mechanism. However, such a mechanism can not guarantee to achieve fairness at local clients in FL scenario, since different clients will have different distributions across protected groups. For example, if we are building a mortality prediction model for COVID-19 patients within a hospital system vaid2021federated , where each individual hospital can be viewed as a local client. Different hospitals will have different patient populations with distinct demographic compositions including race or gender. In this case, the model fairness at each hospital is important because that’s where the model will be deployed, and it is unlikely that global model fairness can lead to local model fairness.

Due to the potential trade-off between algorithmic fairness and model utility, one aiming to mitigate the algorithmic disparity on local clients can exacerbate the inconsistency of the model performance (i.e., the model performance is different at different local clients). There have been researches li2019fair ; mohri2019agnostic trying to address the inconsistency without considering algorithmic fairness. In particular, Mohri et al. mohri2019agnostic proposed an agnostic federated learning (AFL) algorithm that maximizes the performance of the worst performing client. Li et al. li2019fair proposed a q-Fair Federated Learning (q-FFL) approach to weigh different local clients differently by taking the -th power of the local empirical loss when constructing the optimization objective of the global model.

In this paper, we consider the problem of enforcing both algorithmic fairness and performance consistency across all local clients in FL. Specifically, suppose we have local clients, and represents the model utility for client , and is the model disparity quantified by some computational fairness metric (e.g., demographic parity dwork2012fairness or equal opportunity hardt2016equality ). Following the idea of AFL, we can maximize the utility of the worst-performed client to achieve performance consistency. We also propose to assign each client a "fairness budget" to ensure certain level of local fairness, i.e., with being a pre-specified fairness budget for client . Therefore, we can formulate our problem as a constrained multi-objective optimization framework as shown in Figure 1, where each local model utility can be viewed as an optimization objective.

Since models with fairness and min-max performance may be not unique, we also require the model to be Pareto optimal. A model is Pareto optimal if and only if the utility of any client cannot be further optimized without degrading some others. A Pareto optimal solution of this problem cannot be achieved by existing linear scalarization methods in federated learning (e.g., federated average, or FedAve in mcmahan2017communication ), as the non-i.i.d data distributions across different clients can cause a non-convex Pareto Front of utilities (all Pareto solutions form Pareto Front). Therefore, we propose FCFL, a new federated learning framework to obtain a fair and consistent model for all local clients. Specifically, we first utilize a surrogate maximum function (SMF) that considers the utilities involved simultaneously instead of the single worst, and then optimize the model to achieve Pareto optimality by controlling the gradient direction without hurting client utilities. Theoretical analysis proves that our method can converge to a fairness-constrained Pareto min-max model and experiments on both synthetic and real-world data sets show that FCFL can achieve a Pareto min-max utility distribution with fairness guarantees in each client.

2 Related Work

Algorithm fairness is defined as the disparities in algorithm decisions made across groups formed by protected variables (such as gender and race). Some approaches have been proposed to mathematically define if an algorithm is fair. For example, demographic parity dwork2012fairness requires the classification results to be independent of the group memberships and equalized opportunity hardt2016equality

seeks for equal false negative rates across different groups. Plenty of approaches have been proposed to reduce model disparity. One type of method is to train a classifier first and then post-adjust the prediction by setting different thresholds for different groups 

hardt2016equality . Other methods have been developed for optimization of fairness metrics during the model training process through adversarial learning beutel2017data ; louizos2015variational ; madras2018learning ; zemel2013learning ; zhang2018mitigating or regularization kamishima2011fairness ; zafar2015fairness ; beutel2019putting . Du et al. du2020fairness considered algorithm fairness in the federated learning setting and proposed a regularization method that assigns a reweighing value on each training sample for loss objective and fairness regularization to deal with the global disparity. This method cannot account for the discrepancies among the model disparities at different local clients. In this paper, we propose to treat fairness as a constraint and optimize a multi-objective optimization with multiple fairness constraints for all clients while maximally maintain the model utility.

As we introduced in the introduction, the global consensus model learned from federated learning may have different performances on different clients. There are existing research trying to address such inconsistency issue by maximizing the utility of the worst-performing client. In particular, Li et al. li2019fair propose q-FFL to obtain a min-max performance of all clients by empirically adjusting the power of the objectives, which cannot always guarantee a more consistent model utility distribution without sufficient searching for appropriate power values. Mohri et al. mohri2019agnostic propose AFL, a min-max optimization scheme which focuses on the single worst client. However, focusing on the single worst objective can cause another client to perform worse, thus we propose to take all objectives into account and optimize a surrogate maximum function to achieve a min-max performance distribution in this paper.

Multi-objective optimization aims to learn a model that gives consideration to all objectives involved. The optimization methods for multi-objective typically involve linear scalarization or its variants, such as those with adaptive weights chen2018gradnorm , but it is challenging for these approaches to handling the competing performance among different clients mahapatra2020multi . Martinez et al. martinez2020minimax

proposed a multi-objective optimization framework called Min-Max Pareto Fairness (MMPF) to achieve consistency by inducing a min-max performance of all groups based on convex assumption, which is fairly strong as non-convex objectives are ubiquitous. In this paper, we formulate the problem of achieving both fairness and consistency in federated networks through constrained multi-objective optimization. Previous research on solving this problem has been mostly focusing on gradient-free algorithms such as evolutionary algorithms

coello2006evolutionary , physics-based and deterministic approaches evtushenko2014deterministic . Gradient-based methods are still under-explored zerbinati2011comparison

. We propose a novel gradient-based method FCFL , which searches for the desired gradient direction iteratively by solving constrained Linear Programming (LP) problems to achieve fairness and consistency simultaneously in federated networks.

3 Problem Setup

The problem to be solved in this paper is formally defined in this section. Specifically, we will introduce the algorithmic fairness problem, how to extend existing fairness criteria to federated setting, and the consistency issues of model utility in federated learning.

3.1 Preliminaries

Federated Learning. Suppose there are local clients and each client is associated with a specific dataset , , where the input space and output space are shared across all clients. There are samples in the -th client and each sample is denoted as . The goal of the federated learning is to collaboratively learn a global model with the parameters to predict the label as on each client. The classical federated learning aims to minimize the empirical risk over the samples from all clients i.e., where is the loss objective of the -th client.

Fairness. Fairness refers to the disparities in the algorithm decisions made across different groups formed by the sensitive attribute, such as gender and race. If we denote the dataset on the -th client as , where is the binary sensitive attribute, then we can define the multi-client fairness as follows:

Definition 1 (Multi-client fairness (MCF)).

A learned model achieves multi-client fairness if meets the following condition:


where denotes the disparity induced by the model and is the given fairness budget of the -th client. The disparity on the -th client can be measured by demographic parity (DP) dwork2012fairness and Equal Opportunity (EO) hardt2016equality as follows:


As data heterogeneity may cause different disparities across all clients, the fairness budgets in Definition 3.1 specifies the tolerance of model disparity at the -th client.

Consistency. Due to the discrepancies among data distributions across different clients, the model performance on different clients could be different. Moreover, the inconsistency will be magnified when we adjust the model to be fair on local clients. There are existing research trying to improve the model consistency by maximizing the utility of the worst performing client li2019fair ; mohri2019agnostic :

where the is over the losses across all clients.

3.2 Fair and Consistent Federated Learning (FCFL)

Our goal is to learn a model which 1) satisfies MCF as we defined in Definition 3.1; 2) maintains consistent performances across all clients. We will use defined in Eq.(2) as measurement of disparity in our main text while the same derivations can be developed when adapting other metrics, so we have and is the function of calculating model disparity on the -th client. Similarly, the model utility loss can be evaluated by different metrics (such as cross-entropy, hinge loss and squared loss, etc). In the rest of this paper we will use () for () without causing further confusions. We formulate FCFLas the problem of optimizing the utility-related objectives to achieve Pareto Min-Max performance with fairness constraints:


The definitions of Pareto Solution and Pareto Front, which are fundamental concepts in multi-objective optimization, are as follows:

Definition 2 (Pareto Solution and Pareto Front).


represents the utility loss vector on

learning tasks with hypothesis , we say is a Pareto Solution if there is no hypothesis that dominates : , i.e.,

All Pareto solutions form Pareto Front .

From Definition 2, for a given hypothesis set and the objective vector , the Pareto solution avoids unnecessary harm to client utilities and may not be unique. We prefer a Pareto solution that achieves a higher consistency. Following the work in li2019fair ; mohri2019agnostic , we want to obtain a Pareto solution with min-max performance. Figure 1 shows the relationships among different model hypothesis sets, and we explain the meanings of different notations therein as follows:

(1) is the set of model hypotheses satisfying MCF, i.e.,

(2) is the set of model hypotheses achieving min-max performance (consistency) with MCF:


(3) is the set of model hypotheses achieving Pareto optimality with MCF:


where Eq.(5a) satisfies meets MCF, and Eq.(5b) ensures that is a Pareto model with MCF.

(4) is our desired set of model hypotheses achieving Pareto optimality and min-max performance with MCF: .

Figure 1: (a) The architecture of our proposed fairness-constrained min-max problem. (b) The relationship of the 5 hypothesis sets involved and is the desired hypothesis set. (c) Two optimization paths to achieve a fair Pareto min-max model: 1) : the gray dotted line represents that the initial model first achieves Pareto optimality with MCF then achieves min-max performance; 2) : the black solid line denotes that the initial model first achieves min-max performance with MCF then achieves Pareto optimality.

In summary, our goal is to obtain a fair and consistent model to achieve Pareto optimality and min-max performance with MCF.

4 Fairness-Constrained Min-Max Pareto Optimization

4.1 Preliminary: Gradient-based Multi-Objective Optimization

Pareto solutions of the multi-objective optimization problem can be reached by gradient descent procedures. Specifically, given the initial hypothesis with parameter , we optimize by moving to a specific gradient direction with a step size : . is a descent direction if it decreases the objectives (). Suppose is the gradient of the -th objective with respect to the parameter , if we select which satisfies for all , is a descent direction and decreases after the iteration.

If we directly search for the descent direction to achieve , the computation cost can be tremendous when is a high-dimensional vector. Désidéri et al. desideri2012multiple proposed to find a descent direction in the convex hull of the gradients of all objectives denoted as by searching for a -dimension vector (typically in deep model), which is formulated as follows:


4.2 Overview of the Optimization Framework

To obtain a fair Pareto min-max model, there are two optimization paths shown in Figure 1. The gray dotted line denotes the optimization path where we first achieve Pareto optimality with MCF, then we try to achieve min-max performance while keeping Pareto optimality. However, it’s hard to adjust a Pareto model to another mahapatra2020multi . Therefore, we propose to first achieve min-max performance with MCF then achieve Pareto optimality as the black solid line in Figure 1. In particular, we propose a two-stage optimization method for this constrained multi-objective problem: 1) constrained min-max optimization to achieve a fair min-max model; 2) constrained Pareto optimization to continue to optimize the model to achieve Pareto optimality while keeping min-max performance with MCF.

Constrained Min-Max Optimization We define a constrained min-max optimization problem on the hypothesis set :


where is the given fairness budget on the -th client. By optimizing the constrained objective in Eq.(7), we obtain a model that 1) satisfies MCF; 2) achieves the optimal utility on the worst performing client.

Constrained Pareto Optimization Caring only about the utility of the worst-performing client can lead to unnecessary harm to other clients since the rest objectives can be further optimized. Therefore, we then continue to optimize to achieve Pareto optimality:


where we optimize without hurting the model utility on any client so that the converged Pareto model of Eq.(8) satisfies . Moreover, satisfies MCF as the constraint in Eq.(8c), so is a Pareto min-max model with fairness constraints on all clients.

4.3 Achieving Min-Max Performance with MCF

Minimizing the current maximum value of all objectives directly in Eq.(7) can cause another worse objective and can be computationally hard when faced with a tremendous amount of clients. We will use () to denote () without causing further confusions for expression simplicity. We propose to use a smooth surrogate maximum function (SMF) polak2003algorithms to approximate an upper bound of and as follows:


It is obvious that and . For , we can get a similar conclusion. The objective in Eq.(7) is approximated as follows:

Property 1.

There always exists an initial trivial model which satisfies the MCF criterion by treating all samples equally (e.g., ).

From Property 1, we can always initialize to satisfy in Eq.(10b). Then we optimize the upper bound of when ensuring MCF. As the hypothesis owns the parameter , we use and to represent the gradient and , respectively. We propose to search for a descent direction for Eq.(10) in the convex hull of in two cases where is defined in Eq.(6). For the -th iteration:
(1) if satisfies the fairness constraint defined in Eq.(10b), we focus on the descent of :


(2) if violates the fairness constraint, we aim to the descent of without causing the ascent of :


If the obtained gradient satisfies , we decrease the parameter as:


where is the decay ratio that . From Eq.(13), we narrow the gap between and by decreasing the parameter as every time the algorithm approaches convergence. From Eq.(11) and Eq.(12), we optimize either or and keep without ascent in each iteration.

4.4 Achieving Pareto Optimality and Min-Max Performance with MCF

As the model obtained from constrained min-max optimization cannot guarantee Pareto optimality, we continue to optimize to be a Pareto model without causing the utility descent on any client. We propose a constrained linear scalarization objective to reach a Pareto model and the -th iteration is formulated as follows:


where and is the convex hull of . The non-positive angle of with each gradient ensures that all objective values decrease. Similarly, if we aim to reach the Pareto solution without causing utility descent only on the worst performing client, the constraint in Eq.(14b) is replaced by .

Different from constrained min-max optimization in Section 4.3 where the objective to be optimized in each iteration depends on whether , in constrained Pareto optimization procedure, as we have achieved MCF, we optimize the objective in Eq.(14) for a dominate model in each iteration. Specifically, we restrict to keep fairness on each client given the reached hypothesis . Meanwhile, we constrain to descend or remain unchanged until any objective cannot be further minimized. Algorithm 1 in the Appendix shows all steps of our method. Moreover, the convergence analysis and the discussion on the time complexity of our framework are in Appendix.

5 Experiments

We intuitively demonstrate the behavior of our method by conducting experiments on synthetic data. For the experiments on two real-world federated datasets with fairness issues, we select two different settings to verify the effectiveness of our method: (1) assign equal fairness budgets for all local clients; (2) assign client-specific fairness budgets. More experimental results and detailed implementation are in Appendix.

5.1 Experimental Setup

Federated Datasets (1) Synthetic dataset: following the setting in lin2019pareto ; mahapatra2020multi , the synthetic data is from two given non-convex objectives; (2) UCI Adult dataset mohri2019agnostic : Adult contains more than 40000 adult records and the task is to predict whether an individual earns more than 50K/year given other features. Following the federated setting in li2019fair ; mohri2019agnostic , we split the dataset into two clients. One is PhD client in which all individuals are PhDs and the other is non-PhD client. In our experiments, we select race and gender as sensitive attributes, respectively. (3)eICU dataset: We select eICU pollard2018eicu , a clinical dataset collecting patients about their admissions to ICUs with hospital information. Each instance is a specific ICU stay. We follow the data preprocessing procedure in johnson2018generalizability and naturally treat 11 hospitals as 11 local clients in federated networks. We conduct the task of predicting the prolonged length of stay (whether the ICU stay is longer than 1 week, ) and select as the sensitive attribute.

Evaluation Metrics (1) Utility metric: we use to measure the model utility in our experiments; (2) Disparity metrics: our method is compatible with various of fairness metrics. In our experiments, we select two metrics (marginal-based metric Demographic Parity dwork2012fairness and conditional-based metric Equal Opportunity hardt2016equality to measure the disparities defined in Eq.(2)(The results of Equal Opportunity are in the Appendix); (3) Consistency: following the work li2019fair ; mohri2019agnostic , we use the utility on the worst-performing client to measure consistency.

Baselines As we do not find prior works proposed for achieving fairness in each client, we select FA du2020fairness , MMPF martinez2020minimax and build FedAve+FairReg as baselines in our experiments. For all baselines, we try to train the models to achieve the optimal utility with fairness constraints. If the model cannot satisfy the fairness constraints, we keep the minimum of disparities with reasonable utilities. (1) MMPF martinez2020minimax , Martinez et al. develop MMPF which optimizes all objectives on convex assumption to induce a min-max utility of all groups; (2) FA du2020fairness , Du et al. propose FA, a kernel-based model-agnostic method with regularizations for addressing fairness problem on a new unknown client instead of all involved clients; (3) FedAve+FairReg, we build FedAve+FairReg, which optimizes the linear scalarized objective with the fairness regularizations of all clients.

(a) violates the constraints
(b) satisfies the constraints
Figure 2: Optimization trajectories of FCFL in dimensional solution space (). The initialization violates fairness constrains (left) and satisfies fairness constraints (right).

5.2 Experiments on Synthetic Dataset

Following the setting in lin2019pareto ; mahapatra2020multi , the synthetic data is from the two non-convex objectives to be minimized in Eq.(15) and the Pareto Front of the two objectives is also non-convex.


Non-convex Pareto Front means that linear scalarization methods (e.g., FedAve) miss any solution in the concave part of the Pareto Front. In this experiment, we optimize under the constraint . Considering the effect of the initialization in our experiment, we conduct experiments when the initialization satisfies the constraints and violates the constraints.

From the results in Figure 2, when the initialization violates the constraints in Figure 2(a), the objective decreases in each step until it satisfies the constraint and finally FCFL  reaches the constrained optimal . As the initialization satisfies the constraints in Figure 2(b), our method focuses on optimizing until it achieves the optimal with the constraint .

5.3 Experiments on Real-world Datasets with Equal Fairness Budgets

Figure 3: The disparities and accuracies on both clients as (top) and as of on Adult dataset when race is the sensitive attribute.

5.3.1 Income Prediction on Adult Dataset

We show the results with the sensitive attribute being race in our main text and the results when gender is the sensitive attribute are in Appendix. We set the fairness budgets defined in Eq.(1) in two different cases, (1) looser constraint: ; (2) stricter constraint: .

From Figure 3, FCFL  achieves min-max performance on PhD client in the two cases with MCF. MMPF fails to achieve MCF as . FA and FedAve+FairReg achieve MCF with but violate fairness constraint as . From Figure 3, our method achieves a comparable performance on non-PhD client compared to baselines.

5.3.2 Prolonged Length of Stay Prediction on eICU Dataset

The length of the patient’s hospitalization is critical for the arrangement of the medical institutions as it is related to the allocation of limited ICU resources. We conduct experiments on the prediction of prolonged length of stay(whether the ICU stay is longer than 1 week) on eICU dataset. We use race as sensitive attribute and set the fairness budgets defined in Eq.(1) in two cases: (1) looser constraint: ; (2) stricter constraint:

(a) MMPF
(b) FA
(c) FedAve+FairReg
(d) ours with
(e) ours with
Figure 4: Experiments on LoS Prediction task with the sensitive attribute being race. The points in each figure denote the clients and X and Y coordinates of the points denote the disparities and the accuracies, respectively.

From Figure 4, our method achieves min-max performance with fairness budget compared to the baselines. When we constrain , all baselines fail to satisfy the constraints and the disparities are about 0.1 while our method significantly decreases the disparities and the maximum of the disparities is . Besides, we maintain comparable utilities on all clients compared to baselines.

5.4 Experiments with Client-Specific Fairness Budgets

Data heterogeneity encourages the different levels of the disparities on different clients. Consistent fairness budgets can cause unexpected hurt on some specific clients with severe disparities. We explore the performance of our method given client-specific fairness budgets. Specifically, we firstly conduct unconstrained min-max experiments and measure the original disparities on all clients, then we constrain the model disparities based on the original disparities of all clients, i.e., .

(a) Adult
(b) Adult
(c) eICU
(d) eICU
Figure 5: Client-specific constraint experiment on Adult and eICU.

From Figure 5(a) and Figure 5(b), we show the effect of the client-specific fairness budgets on model disparities and utilities. The decreasing means the stricter constraints on both clients and FCFL  reduces the disparities significantly as shown in Figure 5(a). With the stricter client-specific constraints on both clients, the utilities on both clients decrease slightly as shown in Figure 5(b), which implies that FCFL achieves a great balance between the fairness and the utility of all clients. FCFL is compatible with client-specific fairness budgets which enhances its flexibility and avoids severe hurt to the specific clients.

The results of LoS prediction task with client-specific fairness budgets are shown in Figure 5(c) and Figure 5(d). As decreases from 1.0 to 0.2, the maximum of all client disparities in Figure 5(c) decrease from 0.2 to 0.05 which means the model becomes fairer on all clients. Figure 5(d) shows the minimum of the utilities which slightly decreases from 0.62 to 0.6 and our method achieves an acceptable trade-off between model fairness and utility as the amount of clients increases in this task.

6 Conclusion

In this paper, we investigate the consistency and fairness issues in federated networks as the learned model deployed on local clients can cause inconsistent performances and disparities without elaborate design. We propose a novel method called FCFL to overcome the disparity and inconsistency concerns in the favored direction of gradient-based constrained multi-objective optimization. Comprehensive empirical evaluation results measured by quantitative metrics demonstrate the effectiveness, superiority, and reliability of our proposed method.


  • [1] Julia Angwin, Jeff Larson, Surya Mattu, and Lauren Kirchner. Machine bias: There’s software used across the country to predict future criminals. And it’s biased against blacks. ProPublica, 23, 2016.
  • [2] Alex Beutel, Jilin Chen, Tulsee Doshi, Hai Qian, Allison Woodruff, Christine Luu, Pierre Kreitmann, Jonathan Bischof, and Ed H Chi. Putting fairness principles into practice: Challenges, metrics, and improvements. In Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society, pages 453–459, 2019.
  • [3] Alex Beutel, Jilin Chen, Zhe Zhao, and Ed H Chi. Data decisions and theoretical implications when adversarially learning fair representations. arXiv preprint arXiv:1707.00075, 2017.
  • [4] Zhao Chen, Vijay Badrinarayanan, Chen-Yu Lee, and Andrew Rabinovich. Gradnorm: Gradient normalization for adaptive loss balancing in deep multitask networks. In International Conference on Machine Learning, pages 794–803. PMLR, 2018.
  • [5] CA Coello Coello. Evolutionary multi-objective optimization: a historical view of the field. IEEE computational intelligence magazine, 1(1):28–36, 2006.
  • [6] Jean-Antoine Désidéri. Multiple-gradient descent algorithm (mgda) for multiobjective optimization. Comptes Rendus Mathematique, 350(5-6):313–318, 2012.
  • [7] Wei Du, Depeng Xu, Xintao Wu, and Hanghang Tong. Fairness-aware agnostic federated learning. arXiv preprint arXiv:2010.05057, 2020.
  • [8] Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, and Richard Zemel. Fairness through awareness. In Proceedings of the 3rd innovations in theoretical computer science conference, pages 214–226, 2012.
  • [9] Yu G Evtushenko and MA Posypkin. A deterministic algorithm for global multi-objective optimization. Optimization Methods and Software, 29(5):1005–1019, 2014.
  • [10] Moritz Hardt, Eric Price, and Nathan Srebro.

    Equality of opportunity in supervised learning.

    In Proceedings of the 30th International Conference on Neural Information Processing Systems, pages 3323–3331, 2016.
  • [11] Alistair E. W. Johnson, Tom J. Pollard, and Tristan Naumann. Generalizability of predictive models for intensive care unit patients. Machine Learning for Health (ML4H) Workshop at NeurIPS 2018, 2018.
  • [12] Toshihiro Kamishima, Shotaro Akaho, and Jun Sakuma. Fairness-aware learning through regularization approach. In 2011 IEEE 11th International Conference on Data Mining Workshops, pages 643–650. IEEE, 2011.
  • [13] Tian Li, Maziar Sanjabi, Ahmad Beirami, and Virginia Smith. Fair resource allocation in federated learning. In International Conference on Learning Representations, 2019.
  • [14] Xi Lin, Hui-Ling Zhen, Zhenhua Li, Qing-Fu Zhang, and Sam Kwong. Pareto multi-task learning. In Advances in Neural Information Processing Systems, pages 12060–12070, 2019.
  • [15] Christos Louizos, Kevin Swersky, Yujia Li, Max Welling, and Richard Zemel. The variational fair autoencoder. arXiv preprint arXiv:1511.00830, 2015.
  • [16] David Madras, Elliot Creager, Toniann Pitassi, and Richard Zemel. Learning adversarially fair and transferable representations. In International Conference on Machine Learning, pages 3384–3393, 2018.
  • [17] Debabrata Mahapatra and Vaibhav Rajan. Multi-task learning with user preferences: Gradient descent with controlled ascent in pareto optimization. In International Conference on Machine Learning, pages 6597–6607. PMLR, 2020.
  • [18] Natalia Martinez, Martin Bertran, and Guillermo Sapiro. Minimax pareto fairness: A multi objective perspective. In International Conference on Machine Learning, pages 6755–6764. PMLR, 2020.
  • [19] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pages 1273–1282. PMLR, 2017.
  • [20] Mehryar Mohri, Gary Sivek, and Ananda Theertha Suresh. Agnostic federated learning. In International Conference on Machine Learning, pages 4615–4625, 2019.
  • [21] E Polak, JO Royset, and RS Womersley. Algorithms with adaptive smoothing for finite minimax problems. Journal of Optimization Theory and Applications, 119(3):459–484, 2003.
  • [22] Tom J Pollard, Alistair EW Johnson, Jesse D Raffa, Leo A Celi, Roger G Mark, and Omar Badawi. The eicu collaborative research database, a freely available multi-center database for critical care research. Scientific data, 5(1):1–13, 2018.
  • [23] Akhil Vaid, Suraj K Jaladanki, Jie Xu, Shelly Teng, Arvind Kumar, Samuel Lee, Sulaiman Somani, Ishan Paranjpe, Jessica K De Freitas, Tingyi Wanyan, et al. Federated learning of electronic health records improves mortality prediction in patients hospitalized with covid-19. JMIR medical informatics, 2021.
  • [24] Jie Xu, Benjamin S Glicksberg, Chang Su, Peter Walker, Jiang Bian, and Fei Wang. Federated learning for healthcare informatics. Journal of Healthcare Informatics Research, pages 1–19, 2020.
  • [25] Qiang Yang, Yang Liu, Yong Cheng, Yan Kang, Tianjian Chen, and Han Yu. Federated learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 13(3):1–207, 2019.
  • [26] Muhammad Bilal Zafar, Isabel Valera, Manuel Gomez Rodriguez, and Krishna P Gummadi. Fairness constraints: Mechanisms for fair classification. arXiv preprint arXiv:1507.05259, 2015.
  • [27] Rich Zemel, Yu Wu, Kevin Swersky, Toni Pitassi, and Cynthia Dwork. Learning fair representations. In International Conference on Machine Learning, pages 325–333, 2013.
  • [28] Adrien Zerbinati, Jean-Antoine Desideri, and Régis Duvigneau. Comparison between mgda and paes for multi-objective optimization. 2011.
  • [29] Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. Mitigating unwanted biases with adversarial learning. In Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society, pages 335–340, 2018.