1 Introduction
Effective learning of machine learning models over a collaborative network of data clients has drawn considerable interest in recent years. Frequently, due to the privacy concerns, we cannot simultaneously access the raw data residing on different clients. Therefore, distributed
li2014scaling or federated learning mcmahan2017communication strategies have been proposed, where typically model parameters are updated locally at each client with its own data and the parameter updates, such as gradients, are transmitted out and communicate with other clients. During this process, it is usually assumed that the participation in the network comes at no cost, i.e., every client is willing to participate in the collaboration. However, this is not always true in reality.One example is the clinical research network (CRN) involving multiple hospitals fleurence2014launching . Each hospital has its own patient population. The patient data are sensitive and cannot be shared with other hospitals. If we want to build a risk prediction model with the patient data within this network in a privacypreserving way, the expectation from each hospital is that a better model can be obtained through participating in the CRN compared to the one built from its own data collected from various clinical practice with big efforts. In this scenario, there has been prior study showing that the model performance can decrease when collaborating with hospitals with very distinct patient populations due to negative transfer induced by sample distribution discrepancies wang2019characterizing ; pan2009survey .
With these considerations, in this paper, we propose a novel learning to collaborate framework. We allow the participating clients in a large collaborative network to form nonoverlapping collaboration coalitions. Each coalition includes a subset of clients such that the collaboration among them can benefit their respective model performance. We aim at identifying the collaboration coalitions that can lead to a collaboration equilibrium, i.e., there are no other coalition settings that any of the individual clients can benefit more (i.e., achieve better model performance).
In order to obtain the coalitions that can lead to a collaboration equilibrium, we propose a Pareto optimization framework to identify the necessary collaborators for each client in the network to achieve its maximum utility. In particular, we optimize a local model associated with a specific client on the Pareto front of the learning objectives of all clients. Through the analysis of the geometric location of such optimal model on the Pareto front, we can identify the necessary collaborators of each client. The relationships between each client and its necessary collaborators can be encoded in a benefit graph as exemplified in Figure 1 (a), where we have a collaborative network with 6 clients . Then we can derive the coalitions corresponding to the collaboration equilibrium through an iterative process introduced as follows. Specifically, we define a stable coalition as the minimum set such that its all involved clients can achieve its maximal utility. From the perspective of graph theory, these stable coalitions are actually the strongly connected components of the benefit graph. For example, in Figure 1 (b) is a stable coalition as all clients can achieve their best performance by collaborating with the clients in (compared with collaborating with other clients in the network). By removing the stable coalitions and rebuilding the benefit graph of the remaining client iteratively as shown in Figure 1 (b) and (c), we can identify all coalitions as in Figure 1 (d) and prove that the obtained coalitions can lead to a collaboration equilibrium.
We empirically evaluate our method on synthetic data, UCI adult kohavi1996scaling , a classical FL benchmark data set CIFAR10 krizhevsky2009learning , and a realworld electronic health record (EHR) data repository eICU pollard2018eicu , which includes patient EHR data in ICU from multiple hospitals. The results show our method significantly outperforms existing relevant methods. The experiments on eICU data demonstrate that our algorithm is able to derive a good collaboration strategy for the hospitals to collaborate.
2 Related Work
2.1 Federated Learning
Federated learning (FL) mcmahan2017communication refers to the paradigm of learning from fragmented data without sacrificing privacy. In a typical FL setting, a global model is learned from the data residing in multiple distinct local clients. However, a single global model may lead to performance degradation on certain clients due to data heterogeneity. Personalized federated learning (PFL) kulkarni2020survey , which aims at learning a customized model for each client in the federation, has been proposed to tackle this challenge. For example, Zhang et al. zhang2020personalized proposes to adjust the weights of the objectives corresponding to all clients dynamically; Fallah et al. fallah2020personalized proposes a metalearning based method for achieving an effective shared initialization of all local models followed by a finetuning procedure; Shamsian et al. shamsian2021personalized proposes to learn a central hypernetwork which can generate a set of customized models for each client. FL assumes all clients are willing to participate in the collaboration and existing methods have not considered whether the collaboration can really benefit each client or not. Without benefit, a local client could be reluctant to participate in the collaboration, which is a realistic scenario we investigate in this paper. One specific FL setup that is relevant to our work is clustered federated learning sattler2020clustered ; mansour2020three , which groups the clients with similar data distributions and trains a model for each client group. The scenario we are considering in this paper is to form collaboration coalitions based on the performance gain each client can get for its corresponding model, rather than sample distribution similarities.
2.2 MultiTask Learning and Negative Transfer
Multitask learning caruana1997multitask (MTL) aims at learning shared knowledge across multiple interrelated tasks for mutual benefits. Typical examples include hard model parameter sharing kokkinos2017ubernet , soft parameter sharing lu2017fully , and neural architecture search (NAS) for a shared model architecture real2019regularized . However, sharing representations or model structures cannot guarantee model performance gain due to the existence of negative transfer, while we directly consider forming collaboration coalitions according to individual model performance benefits. In addition, MTL usually assumes the data from all tasks are accessible, while our goal is to learn a personalized model for each client through collaborating with other clients without directly accessing to their raw data. It is worth mentioning that there are also clustered MTL approaches standley2020tasks ; zamir2018taskonomy which assume the models for the tasks within the same group are similar to each other, while we want the clients within each coalition can benefit each other through collaboration when learning their respective models.
3 Collaboration Learning Problem
The collaboration learning problem to be solved in this paper is formally defined in this section. Specifically, we will first introduce the necessary notations and definitions in Section 3.1, and then define the collaboration equilibrium we aim to achieve in Section 3.2.
3.1 Definitions and Notations
Suppose there are clients in a collaborative network and each client is associated with a specific learning task based on its own data , where the input space and the output space may or may not share across all clients. Each client pursues collaboration with others to learn a personalized model by maximizing its utility (i.e., model performance) without sacrificing data privacy. There is no guarantee that one client can always benefit from the collaboration with others, and the client would be reluctant to participate in the collaboration if there is no benefit. In the following we describe this through a concrete example.
No benefit, no collaboration.
Suppose the local data owned by different clients satisfy the following conditions: 1) all local data are from the same distribution ; 2) . Since contains more data than other clients, cannot benefit from collaboration with any other clients, so will learn a local model using its own data. Once refuses to collaboration, will also work on its own as can only improve its utility by collaborating with . will learn individually out of the same concerns. Finally, there is no collaboration among any clients.
Due to the discrepancies of the sample distributions across different clients, the best local model for a specific client is very likely to come from collaborating with a subset of clients rather than all of them. Suppose denotes the model utility of client when collaborating with the clients in client set . In the following we define as the maximum model utility that can achieve when collaborating with different subsets of .
Definition 1 (Maximum Achievable Utility (MAU)).
This is the maximum model utility for a specific client to collaborate with different subsets of client set :
(1) 
From Definition 1, MAU satisfies if is a subset of . Each client aims to identify its “optimal set" of collaborators from to maximize its local utility, which is defined as follows.
Definition 2 (Optimal Collaborator Set (OCS)).
A client set is an optimal collaborator set for if and only if satisfies
(2a)  
(2b) 
Eq.(2a) means that can achieve its maximal utility when collaborating with and Eq.(2b) means that all clients in are necessary. In this way, the relationships between any client and its optimal collaborator set can be represented by a graph which is called the benefit graph (BG). Specifically, for a given client set , we use to denote its corresponding BG. For the example in Figure 1 (a), an arrow from to means , e.g., means . For a client set , if every member can achieve its maximum model utility through the collaboration with other members within (without collaboration with other members outside ), then we call a coalition.
Forming coalitions for maximizing the local model utilities
Figure 2 shows an example BG with 6 clients. can achieve its optimal model utility by collaborating with . Similarly, and can achieve their optimal model utility through collaborating with and . In this case, denotes a collaboration coalition, and each client achieves its optimal utility by collaborating with other clients in . If is taken out from , will leave as well because it cannot gain any benefit through collaboration with others, and then will leave for the same reason. With this ring structure of , none of the clients in can achieve its best performance without collaborating with the clients in .
3.2 Problem Setup
As each client in aims to maximize its local model utility by forming a collaboration coalition with others, all clients in can form several nonoverlapping collaboration coalitions. In order to derive those coalitions, we propose the concept of collaboration equilibrium (CE) as follows.
Suppose we have a set of coalitions such that and for , then we say reaches CE if it satisfies the following two axioms.
Axiom 1 (Inner Agreement).
All collaboration coalitions satisfy inner agreement, i.e.,
(3) 
From Axiom 1, inner agreement emphasizes that the clients of each coalition agree to form this coalition. It gives the necessary condition for a collaboration coalition to be formed such that any of the subset can benefit from the collaboration with . Eq.(3) tells us that there always exists a client in that opposes leaving because its utility will go down if is split from . In this way, inner agreement guarantees that all coalitions will not fall apart or the clients involved will suffer. For example, in Figure 2 does not satisfy inner agreement, because the clients in the subset achieves their optimal utility in and can leave without any loss.
Axiom 2 (Outer Agreement).
The collaboration strategy should satisfy outer agreement, i.e.,
(4) 
From Axiom 2, outer agreement guarantees that there is no other coalition which can benefit each client involved more than achieves. Eq.(4) tells us that if is a coalition not from , there always exists a client and a coalition in such that can benefit more.
The collaboration strategy in Figure 2 is a CE in which the clients in and achieve their optimal model utility. Though does not achieve its maximum model utility in , there is no other coalitions which can attract and to form a new coalition with . Therefore, all clients have no better choice but agree upon this collaboration strategy.
Our goal is to obtain a collaboration strategy to achieve CE which satisfies Axiom 1 and Axiom 2, so that all clients achieve their optimal model utilities in the collaboration coalition. In the next section, we introduce our algorithm in detail on 1) how to derive a collaboration strategy that can achieve CE from the benefit graph and 2) how to construct the benefit graph.
4 Collaboration Equilibrium
In this section, we will introduce our framework on learning to collaborate. Firstly, we propose an iterative graphtheory based method to achieve CE based on a given benefit graph.
4.1 Achieving Collaboration Equilibrium Given the Benefit Graph
In theory, there are collaboration strategies for partitioning clients into several coalitions, where is the Bell number which denotes how many solutions for partitioning a set with elements bell1934exponential . Optimizing a set partition takes exponential time and could be intractable for arbitrarily many clients. In this section, we propose an iterative method for deriving a collaboration strategy which achieves CE with polynomial time complexity. Specifically, at each iteration, we search for a stable coalition which is formally defined in Definition 3 below, then we remove the clients in the stable coalition and rebuild the benefit graph for the remaining clients. The iterations will continue until all clients are able to identify their own coalitions.
Definition 3 (Stable Coalition).
Given a client set , a coalition is stable if it satisfies

Each client in achieves its maximal model utility, i.e.,
(5) 
Any sub coalition cannot achieve the maximal utility for all clients in , i.e.,
(6)
From Definition 3, Eq.(5) means that any client in a stable coalition cannot improve its model utility further. Eq.(6) states that this coalition is stable as any sub coalition can benefit from . Therefore any sub coalition has no motivation to leave . Eq.(5) implies that a stable coalition will not welcome any other clients to join as others will not benefit the clients in further. In Figure 2, and are the two stable coalitions. In order to identify the stable coalitions from the benefit graph, we first introduce the concept of strongly connected component in a directed graph.
Definition 4 (Strongly Connected Component tarjan1972depth ).
A subgraph is a strongly connected component of a given directed graph if it satisfies: 1) It is strongly connected, which means that there is a path in each direction between each pair of vertices in ; 2) It is maximal, which means no additional vertices from can be included in without breaking the property of being strongly connected.
Then we derive a graphbased method to obtain the collaboration coalitions that can achieve collaboration equilibrium by identifying all stable coalition iteratively according to Theorem 1 below.
Theorem 1.
(Proof in Appendix) Given a client set and its , the stable coalitions are strongly connected components of .
With Theorem 1, we need to identify all strongly connected components of , which can be achieved using the Tarjan algorithm tarjan1972depth with time complexity , where is the number of nodes and is the number of edges. Then following Eq.(5), we judge whether a strongly connected component is a stable coalition by checking whether all clients have achieved their maximal model utility. A stable coalition has no interest to collaborate with other clients, so will be removed and the remaining clients will continue to seek collaborations until all clients find their coalitions. In this way, we can achieve a partitioning strategy, with the details shown in Algorithm 1 in the Appendix.
Theorem 2.
(Proof in Appendix) The collaboration strategy obtained above achieves collaboration equilibrium.
The clients in all stable coalitions found in each iteration cannot improve their model utility further and will not collaborate with others because there are no additional benefits. Therefore, the collaboration strategy can be approved by all clients. The iterative method achieves CE considering the varies in each iteration after removing the stable coalitions, which can be timeconsuming because we need to redefine in each iteration by relearning an optimal personalized model for each remaining client.
Assumption 1.
The benefit graph of a subset () is the subgraph of the .
Assumption 1 claims that the benefit graph of the remaining clients keeps unchanged when the subgraph is split from . It implies that for each pair of clients and , whether is one of the optimal collaborators for will not be affected by other clients. In this case, we do not need to rebuild the benefit graph and have the following corollary.
Corollary 1.
(proof in Appendix) When Assumption 1 holds, the strongly connected components of leads to a collaboration equilibrium.
4.2 Determine the Benefit Graph by Specific Pareto Optimization
Definition 5 (Pareto Solution and Pareto Front).
We consider objectives corresponding to clients: . Given a learned hypothesis
, suppose the loss vector
represents the utility loss on clients with hypothesis , we say is a Pareto Solution if there is no hypothesis that dominates h: , i.e.,In a collaboration network with clients , as each client has its own learning task which can be formulated as a specific objective, we use to represent the Pareto Front (PF) of the client set formed by all Pareto hypothesis.
For clients seeking collaboration, determining its benefit graph requires 1) learning an optimal personalized model for each client which achieves its maximal utility ; 2) identifying which clients are necessary for obtaining . For 1), we propose to search for an optimal model on the Pareto Front which we call Specific Pareto Optimization (SPO); for 2), we determine the according to Pareto Front embedding property proposed in the Proposition 1.
SPO for achieving the maximal utility of the target client given the collaborator set
Given a collaborator set and a target client , our goal is to learn an optimal model for collaborating with other clients in . While the task on each client in can be formulated as an objective, from Definition 5, achieves Pareto optimality when any objective cannot be further optimized without degrading some others. While there are infinity models that satisfy Pareto optimality on training data, a core goal is to select a Pareto model which achieves the maximal utility on true data distribution of each client. The model with the minimal empirical risk of the target objective may not be the best local model, because it may rely on some taskunrelated information from the training data to achieve Pareto optimality. From Figure 3 (c), achieves the minimal loss on the true data distribution of which is not the optimal model on training data as shown in Figure 3 (b). As each direction vector corresponds to a specific Pareto model as shown in Figure 3 (b), we propose to optimize the direction vector to reach a model on PF which achieves the optimal performance on validation data.
Proposition 1 (Pareto Front Embedding Property).
(proof in Appendix) Suppose and are the loss vectors achieved by the PFs and where , then
(7) 
Determining OCS according to the geometric location of the optimal model on the PF
As SPO aims to find an optimal model for a target client given a collaborator set, however, there are collaborator sets for each client and a natural problem is how to determine an optimal collaborator set (OCS). We propose to determine the OCS for each client by the geometric location of the reached optimal model on the PF of the full coalition which “contains” the PF of all sub coalitions according to Proposition 1. This means that if an optimal model reached by SPO which maximizes the utility of on belongs to the PF of a sub coalition , then is the OCS of . For example, suppose there are 3 clients seeking collaboration and the PF of the 3 corresponding objectives is shown in Figure 3 (a), the model achieving the optimal utility on we reached by SPO on the PF is also a Pareto model for , so the OCS of is . In our implementation, we use a hypernetwork to learn the full PF . To verify the effectiveness of the OCS we obtain, we relearn a model only using the clients in OCS and get similar results as the model reached in . Detailed information about our implementation is in Appendix.
5 Experiments
To intuitively demonstrate the motivation of collaboration equilibrium and the effectiveness of SPO, we conduct experiments on synthetic data, a realworld UCI dataset Adult kohavi1996scaling and a benchmark data set CIFAR10 lecun1998gradient . Morever, we verify the practicability of our framework on a realworld multiple hospitals collaboration network using the electronic health record (EHR) data set eICU pollard2018eicu . As SPO aims to achieve an optimal model utility by optimizing the personalized model on the PF of all clients, we use SPO to denote the model utility achieved by SPO. According to the OCS determined by SPO we achieve a CE for all clients and the model utility of each client in the CE can be different from the utility achieved by SPO. We use CE to denote the model utility achieved in the CE without causing further confusions.
5.1 Synthetic Experiments
Synthetic data
Suppose there are 6 clients in the collaboration network. The synthetic features owned by each client are generated by ; the groundtruth weights are samples as where
represents the client variance (if
increases, the data distribution discrepancy among clients will increase). Labels of the clients are observed with i.i.d noise . To generate conflicting learning tasks assigned to different clients, we flip over the label of some clients: andFrom Table 3, when there are fewer samples () and less distribution discrepancy in the client set or with similar label generation process, these clients collaborate with others to achieve a low MSE. In this case, the OCS of each client is the clients with similar learning tasks and we achieve CE as as shown in the top of Figure 4 (a). With the increase of the number of samples and the distribution discrepancy, collaboration cannot benefit the clients and all clients will learn individually on their own data. Therefore, when and , the OCS of each client is itself and the collaboration strategy leads to a CE as shown in the bottom of Figure 4 (a).
UCI adult data
adult contains more than 40000 adult records and the task is to predict whether an individual earns more than 50K/year given other features (e.g., age, gender, education, etc.). Following the setting in li2019fair ; mohri2019agnostic , we split the data set into two clients. One is PhD client () in which all individuals are PhDs and the other is nonPhD client (). In this experiment, we implement SPO on this data set and compare the performance with existing relevant methods AFL mohri2019agnostic and qFFL li2019fair ^{1}^{1}1The results of baselines are from li2019fair .
The two clients and have different data distribution and nonPhD client has more than 30000 samples while PhD client has about 500 samples. From Table 3, SPO achieves a higher accuracy compared to baselines especially on PhD clients (77.0). nonPhD client achieves an optimal accuracy (83.5) by local training. Therefore, PhD client improves its performance by collaborating with nonPhD client while the performance of nonPhD client declines. The benefit graph is shown in the top of Figure 4 (b). The CE is noncollaboration as in the bottom of Figure 4 (b) and the model of both clients in the CE are trained individually.
5.2 Benchmark Experiments
We compare our method with previous personalized federated learning (PFL) methods on CIFAR10
krizhevsky2009learning ^{2}^{2}2The results of baselines are from shamsian2021personalized . Following the setting in mcmahan2016federated , we simulate noni.i.d environment by randomly assigning two classes to each client among ten total classes. Baselines we evaluate are as follows: (1) Local training on each client; (2) FedAvg mcmahan2016federated ; (3) PerFedAvg fallah2020personalized , a metalearning based PFL algorithm. (4) pFedMe t2020personalized , a PFL approach which adds a Moreauenvelopes loss term; (5) LGFedAvg liang2020think PFL method with local feature extractor and global output layers; (6) FedPer arivazhagan2019federated, a PFL approach that learns personal classifier on top of a shared feature extractor; (7) pFedHN
shamsian2021personalized , a PFL approach that generates models by training a hypernetwork. In all experiments, our target network shares the same architecture as the baseline models. For each client, we split 87% of the training data for learning a Pareto Front by collaborating with the others and the remaining 13% of the training data for optimizing the direction vector to reach an optimal model as shown in Figure 3 (c). More implementation details are in Appendix.Table 3 reports the results of all methods. FedAve achieves a lower accuracy (51.4) compared to local training (86.46) which means that training a global model can hurt the performance of each client. Compared to other PFL methods in Table 3, SPO reaches an optimal model on the PF of all objectives and achieves a higher accuracy (92.47). As the features learned from the images are transferable though there is a label shift among all clients, the collaboration among all clients leads to a more efficient feature extractor for each client. Therefore, the benefit graph of this collaboration network is a fully connected graph and the collaboration equilibrium is that all clients form a full coalition for collaboration as shown in Figure 4 (c). In this experiment, the accuracy model of each clients in CE equals to the accuracy achieved by SPO.
I  

OCS  CE (MSE)  OCS  CE (MSE)  
0.24±0.08  1e4±.0  
0.26±0.08  1e4±.0  
0.24±0.04  1e4±.0  
0.26±0.07  1e4±.0  
0.26±0.09  1e4±.0  
0.26±0.03  1e4±.0 
methods  Accuracy  

AFL  82.6 ± .5  73.0 ± 2.2 
qFFL  82.4 ± .1  74.4 ± .9 
local  83.5 ± .0  66.9 ± 1.0 
SPO(ours)  82.8 ± .3  77.0 ± .7 
CE  83.5 ± .0  66.9 ± 1.0 
methods  accuracy 

Local  86.46 ± 4.02 
FedAve  51.42 ± 2.41 
PerFedAve  76.65 ± 4.84 
FedPer  87.27 ± 1.39 
pFedMe  87.69 ± 1.93 
LGFedAve  89.11 ± 2.66 
pFedHN  90.83 ± 1.56 
SPO (ours)  92.47 ± 4.80 
CE  92.47 ± 4.80 
5.3 Hospital Collaboration
eICU pollard2018eicu is a clinical data set collecting the patients about their admissions to ICUs with hospital information. Each instance is a specific ICU stay. We follow the data preprocessing procedure in sheikhalishahi2019benchmarking and naturally treat different hospitals as local clients. We conduct the task of predicting inhospital mortality which is defined as the patient’s outcome at the hospital discharge. This is a binary classification task, where each data sample spans a 1hour window. In this experiment, we select 5 hospitals with more patient samples (about 1000) and 5 hospitals with less patient samples . Due to label imbalance (more than 90% samples have negative labels), we use AUC to measure the utility for each client as in sheikhalishahi2019benchmarking . For all methods, we use the ANN as the network structure as in sheikhalishahi2019benchmarking .
methods  AUC  

Local  66.89  85.03  61.83  68.83  82.31  59.65  67.78  40.00  61.90  70.00 
FedAve  71.92  89.36  81.00  73.89  80.23  70.18  52.22  40.00  61.90  75.00 
SPO(ours)  76.35  91.80  80.28  70.52  86.93  82.46  71.11  40.00  76.19  83.33 
CE  77.93  87.28  70.47  70.64  83.48  64.92  68.89  45.00  61.90  70.00 
The model AUC of each hospital is reported in Table 4. Because of the lack of patient data for each hospital, Local achieves a relatively lower AUC compared to FedAve and SPO. While patient populations vary substantially from hospital to hospital, SPO learns a personalized model for each hospital and outperforms FedAve from Table 4.
Collaboration Equilibrium
The optimal collaborator sets of all hospitals determined by SPO are shown in the benefit graph in Figure 5(a). From Figure 5(a), is the necessary collaborator for all other hospitals while cannot contribute to any other hospitals. Since and are the unique necessary collaborator for each other, is a stable coalition as shown in Figure 5(b). We show all strongly connected components in Figure 5(b) and the final collaboration equilibrium is in Figure 5(c). For the stable coalition , as and are major hospitals with more patient data, they can contribute to the vast majority of hospitals and only major hospitals can benefit them. is a tiny clinic that cannot contribute to any hospitals, so no hospital is willing to collaborate with it and has to learn a local model with its own data by forming a simple coalition . For the remaining hospitals, on the one hand they cannot benefit or so they cannot form coalitions with them, on the other hand they refuse to contribute without any charge. They choose form the coalition to maximize their AUC. Therefore, the CE in this hospital collaboration network is achieved by the collaboration strategy and the model AUC of each client in the CE is in Table 4. CE guarantees that every client in its coalition will not collaborate with harmful clients, so the client may achieve a higher utility in a CE compared to collaborating with everyone such as the AUC of in CE (77.93) is higher than in SPO (76.35).
6 Conclusion
In this paper, we investigate collaboration learning in a meaningful and practical scenario. We propose a learning to collaborate framework to achieve collaboration equilibrium such that any of the individual clients cannot improve their performance further. We develop a Pareto optimization method for identifying which clients are worthy of collaboration and propose a graphbased method for reaching collaboration equilibrium. Comprehensive experiments on benchmark and realworld data sets demonstrated the validity of our proposed framework. In our study, some small clients could be isolated as they cannot benefit others. Our framework can quantify both the benefit to and the contribution from each client in a network. In practice, such information can be utilized to either provide incentives or to impose charges on each client, to facilitate and enhance the foundation of the network or coalition.
References
 [1] Manoj Ghuhan Arivazhagan, Vinay Aggarwal, Aaditya Kumar Singh, and Sunav Choudhary. Federated learning with personalization layers. arXiv preprint arXiv:1912.00818, 2019.
 [2] Eric Temple Bell. Exponential polynomials. Annals of Mathematics, pages 258–277, 1934.
 [3] Rich Caruana. Multitask learning. Machine learning, 28(1):41–75, 1997.
 [4] Alireza Fallah, Aryan Mokhtari, and Asuman Ozdaglar. Personalized federated learning: A metalearning approach. arXiv preprint arXiv:2002.07948, 2020.
 [5] Rachael L Fleurence, Lesley H Curtis, Robert M Califf, Richard Platt, Joe V Selby, and Jeffrey S Brown. Launching pcornet, a national patientcentered clinical research network. Journal of the American Medical Informatics Association, 21(4):578–582, 2014.

[6]
Ron Kohavi.
Scaling up the accuracy of naivebayes classifiers: A decisiontree hybrid.
In Kdd, volume 96, pages 202–207, 1996. 
[7]
Iasonas Kokkinos.
Ubernet: Training a universal convolutional neural network for low, mid, and highlevel vision using diverse datasets and limited memory.
InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, pages 6129–6138, 2017.  [8] Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009.
 [9] Viraj Kulkarni, Milind Kulkarni, and Aniruddha Pant. Survey of personalization techniques for federated learning. In 2020 Fourth World Conference on Smart Trends in Systems, Security and Sustainability (WorldS4), pages 794–797. IEEE, 2020.
 [10] Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradientbased learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, 1998.
 [11] Mu Li, David G Andersen, Jun Woo Park, Alexander J Smola, Amr Ahmed, Vanja Josifovski, James Long, Eugene J Shekita, and BorYiing Su. Scaling distributed machine learning with the parameter server. In 11th USENIX Symposium on Operating Systems Design and Implementation (OSDI 14), pages 583–598, 2014.
 [12] Tian Li, Maziar Sanjabi, Ahmad Beirami, and Virginia Smith. Fair resource allocation in federated learning. arXiv preprint arXiv:1905.10497, 2019.
 [13] Paul Pu Liang, Terrance Liu, Liu Ziyin, Nicholas B Allen, Randy P Auerbach, David Brent, Ruslan Salakhutdinov, and LouisPhilippe Morency. Think locally, act globally: Federated learning with local and global representations. arXiv preprint arXiv:2001.01523, 2020.
 [14] Yongxi Lu, Abhishek Kumar, Shuangfei Zhai, Yu Cheng, Tara Javidi, and Rogerio Feris. Fullyadaptive feature sharing in multitask networks with applications in person attribute classification. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 5334–5343, 2017.
 [15] Yishay Mansour, Mehryar Mohri, Jae Ro, and Ananda Theertha Suresh. Three approaches for personalization with applications to federated learning. arXiv preprint arXiv:2002.10619, 2020.
 [16] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and Blaise Aguera y Arcas. Communicationefficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pages 1273–1282. PMLR, 2017.
 [17] H Brendan McMahan, Eider Moore, Daniel Ramage, and Blaise Agüera y Arcas. Federated learning of deep networks using model averaging. arXiv preprint arXiv:1602.05629, 2016.
 [18] Mehryar Mohri, Gary Sivek, and Ananda Theertha Suresh. Agnostic federated learning. In International Conference on Machine Learning, pages 4615–4625. PMLR, 2019.

[19]
Sinno Jialin Pan and Qiang Yang.
A survey on transfer learning.
IEEE Transactions on knowledge and data engineering, 22(10):1345–1359, 2009.  [20] Tom J Pollard, Alistair EW Johnson, Jesse D Raffa, Leo A Celi, Roger G Mark, and Omar Badawi. The eicu collaborative research database, a freely available multicenter database for critical care research. Scientific data, 5(1):1–13, 2018.
 [21] Esteban Real, Alok Aggarwal, Yanping Huang, and Quoc V Le. Regularized evolution for image classifier architecture search. In Proceedings of the aaai conference on artificial intelligence, volume 33, pages 4780–4789, 2019.

[22]
Felix Sattler, KlausRobert Müller, and Wojciech Samek.
Clustered federated learning: Modelagnostic distributed multitask
optimization under privacy constraints.
IEEE Transactions on Neural Networks and Learning Systems
, 2020.  [23] Aviv Shamsian, Aviv Navon, Ethan Fetaya, and Gal Chechik. Personalized federated learning using hypernetworks. arXiv preprint arXiv:2103.04628, 2021.
 [24] Seyedmostafa Sheikhalishahi, Vevake Balaraman, and Venet Osmani. Benchmarking machine learning models on eicu critical care dataset. arXiv preprint arXiv:1910.00964, 2019.
 [25] Trevor Standley, Amir Zamir, Dawn Chen, Leonidas Guibas, Jitendra Malik, and Silvio Savarese. Which tasks should be learned together in multitask learning? In International Conference on Machine Learning, pages 9120–9132. PMLR, 2020.
 [26] Canh T Dinh, Nguyen Tran, and Tuan Dung Nguyen. Personalized federated learning with moreau envelopes. Advances in Neural Information Processing Systems, 33, 2020.
 [27] Robert Tarjan. Depthfirst search and linear graph algorithms. SIAM journal on computing, 1(2):146–160, 1972.
 [28] Zirui Wang, Zihang Dai, Barnabás Póczos, and Jaime Carbonell. Characterizing and avoiding negative transfer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 11293–11302, 2019.
 [29] Amir R Zamir, Alexander Sax, William Shen, Leonidas J Guibas, Jitendra Malik, and Silvio Savarese. Taskonomy: Disentangling task transfer learning. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 3712–3722, 2018.
 [30] Michael Zhang, Karan Sapra, Sanja Fidler, Serena Yeung, and Jose M Alvarez. Personalized federated learning with first order model optimization. arXiv preprint arXiv:2012.08565, 2020.
Comments
There are no comments yet.