Learning to Collaborate

In this paper, we focus on effective learning over a collaborative research network involving multiple clients. Each client has its own sample population which may not be shared with other clients due to privacy concerns. The goal is to learn a model for each client, which behaves better than the one learned from its own data, through secure collaborations with other clients in the network. Due to the discrepancies of the sample distributions across different clients, it is not necessarily that collaborating with everyone will lead to the best local models. We propose a learning to collaborate framework, where each client can choose to collaborate with certain members in the network to achieve a "collaboration equilibrium", where smaller collaboration coalitions are formed within the network so that each client can obtain the model with the best utility. We propose the concept of benefit graph which describes how each client can benefit from collaborating with other clients and develop a Pareto optimization approach to obtain it. Finally the collaboration coalitions can be derived from it based on graph operations. Our framework provides a new way of setting up collaborations in a research network. Experiments on both synthetic and real world data sets are provided to demonstrate the effectiveness of our method.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

01/07/2022

Multi-Model Federated Learning

Federated learning is a form of distributed learning with the key challe...
07/07/2020

Personalized Federated Learning: An Attentive Collaboration Approach

For the challenging computational environment of IOT/edge computing, per...
08/19/2021

Fair and Consistent Federated Learning

Federated learning (FL) has gain growing interests for its capability of...
10/03/2015

Client Profiling for an Anti-Money Laundering System

We present a data mining approach for profiling bank clients in order to...
05/03/2021

The Best Thresholds for Rapid Identification of Episodic and Chronic Homeless Shelter Use

This paper explores how to best identify clients for housing services ba...
10/21/2019

Crypto Mining Makes Noise

A new cybersecurity attack (cryptojacking) is emerging, in both the lite...
09/04/2009

Assessing the Impact of Informedness on a Consultant's Profit

We study the notion of informedness in a client-consultant setting. Usin...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Effective learning of machine learning models over a collaborative network of data clients has drawn considerable interest in recent years. Frequently, due to the privacy concerns, we cannot simultaneously access the raw data residing on different clients. Therefore, distributed

li2014scaling or federated learning  mcmahan2017communication strategies have been proposed, where typically model parameters are updated locally at each client with its own data and the parameter updates, such as gradients, are transmitted out and communicate with other clients. During this process, it is usually assumed that the participation in the network comes at no cost, i.e., every client is willing to participate in the collaboration. However, this is not always true in reality.

One example is the clinical research network (CRN) involving multiple hospitals fleurence2014launching . Each hospital has its own patient population. The patient data are sensitive and cannot be shared with other hospitals. If we want to build a risk prediction model with the patient data within this network in a privacy-preserving way, the expectation from each hospital is that a better model can be obtained through participating in the CRN compared to the one built from its own data collected from various clinical practice with big efforts. In this scenario, there has been prior study showing that the model performance can decrease when collaborating with hospitals with very distinct patient populations due to negative transfer induced by sample distribution discrepancies wang2019characterizing ; pan2009survey .

Figure 1: (1) The benefit graph on all clients; each node denotes a client and the edge from to represents is one of the necessary collaborators for ; (2) Finding all stable coalitions and remove them; (3) reconstruct the benefit graph on the remaining clients; after is removed, re-identifies its necessary collaborators in which is as the added the red arrow from to in the figure; (4) iteratite (2) and (3) until achieving collaboration equilibrium.

With these considerations, in this paper, we propose a novel learning to collaborate framework. We allow the participating clients in a large collaborative network to form non-overlapping collaboration coalitions. Each coalition includes a subset of clients such that the collaboration among them can benefit their respective model performance. We aim at identifying the collaboration coalitions that can lead to a collaboration equilibrium, i.e., there are no other coalition settings that any of the individual clients can benefit more (i.e., achieve better model performance).

In order to obtain the coalitions that can lead to a collaboration equilibrium, we propose a Pareto optimization framework to identify the necessary collaborators for each client in the network to achieve its maximum utility. In particular, we optimize a local model associated with a specific client on the Pareto front of the learning objectives of all clients. Through the analysis of the geometric location of such optimal model on the Pareto front, we can identify the necessary collaborators of each client. The relationships between each client and its necessary collaborators can be encoded in a benefit graph as exemplified in Figure 1 (a), where we have a collaborative network with 6 clients . Then we can derive the coalitions corresponding to the collaboration equilibrium through an iterative process introduced as follows. Specifically, we define a stable coalition as the minimum set such that its all involved clients can achieve its maximal utility. From the perspective of graph theory, these stable coalitions are actually the strongly connected components of the benefit graph. For example, in Figure 1 (b) is a stable coalition as all clients can achieve their best performance by collaborating with the clients in (compared with collaborating with other clients in the network). By removing the stable coalitions and re-building the benefit graph of the remaining client iteratively as shown in Figure 1 (b) and (c), we can identify all coalitions as in Figure 1 (d) and prove that the obtained coalitions can lead to a collaboration equilibrium.

We empirically evaluate our method on synthetic data, UCI adult kohavi1996scaling , a classical FL benchmark data set CIFAR10 krizhevsky2009learning , and a real-world electronic health record (EHR) data repository eICU pollard2018eicu , which includes patient EHR data in ICU from multiple hospitals. The results show our method significantly outperforms existing relevant methods. The experiments on eICU data demonstrate that our algorithm is able to derive a good collaboration strategy for the hospitals to collaborate.

2 Related Work

2.1 Federated Learning

Federated learning (FL) mcmahan2017communication refers to the paradigm of learning from fragmented data without sacrificing privacy. In a typical FL setting, a global model is learned from the data residing in multiple distinct local clients. However, a single global model may lead to performance degradation on certain clients due to data heterogeneity. Personalized federated learning (PFL) kulkarni2020survey , which aims at learning a customized model for each client in the federation, has been proposed to tackle this challenge. For example, Zhang et al. zhang2020personalized proposes to adjust the weights of the objectives corresponding to all clients dynamically; Fallah et al. fallah2020personalized proposes a meta-learning based method for achieving an effective shared initialization of all local models followed by a fine-tuning procedure; Shamsian et al. shamsian2021personalized proposes to learn a central hypernetwork which can generate a set of customized models for each client. FL assumes all clients are willing to participate in the collaboration and existing methods have not considered whether the collaboration can really benefit each client or not. Without benefit, a local client could be reluctant to participate in the collaboration, which is a realistic scenario we investigate in this paper. One specific FL setup that is relevant to our work is clustered federated learning sattler2020clustered ; mansour2020three , which groups the clients with similar data distributions and trains a model for each client group. The scenario we are considering in this paper is to form collaboration coalitions based on the performance gain each client can get for its corresponding model, rather than sample distribution similarities.

2.2 Multi-Task Learning and Negative Transfer

Multi-task learning caruana1997multitask (MTL) aims at learning shared knowledge across multiple inter-related tasks for mutual benefits. Typical examples include hard model parameter sharing kokkinos2017ubernet , soft parameter sharing lu2017fully , and neural architecture search (NAS) for a shared model architecture real2019regularized . However, sharing representations or model structures cannot guarantee model performance gain due to the existence of negative transfer, while we directly consider forming collaboration coalitions according to individual model performance benefits. In addition, MTL usually assumes the data from all tasks are accessible, while our goal is to learn a personalized model for each client through collaborating with other clients without directly accessing to their raw data. It is worth mentioning that there are also clustered MTL approaches standley2020tasks ; zamir2018taskonomy which assume the models for the tasks within the same group are similar to each other, while we want the clients within each coalition can benefit each other through collaboration when learning their respective models.

3 Collaboration Learning Problem

The collaboration learning problem to be solved in this paper is formally defined in this section. Specifically, we will first introduce the necessary notations and definitions in Section 3.1, and then define the collaboration equilibrium we aim to achieve in Section 3.2.

3.1 Definitions and Notations

Suppose there are clients in a collaborative network and each client is associated with a specific learning task based on its own data , where the input space and the output space may or may not share across all clients. Each client pursues collaboration with others to learn a personalized model by maximizing its utility (i.e., model performance) without sacrificing data privacy. There is no guarantee that one client can always benefit from the collaboration with others, and the client would be reluctant to participate in the collaboration if there is no benefit. In the following we describe this through a concrete example.

No benefit, no collaboration.

Suppose the local data owned by different clients satisfy the following conditions: 1) all local data are from the same distribution ; 2) . Since contains more data than other clients, cannot benefit from collaboration with any other clients, so will learn a local model using its own data. Once refuses to collaboration, will also work on its own as can only improve its utility by collaborating with . will learn individually out of the same concerns. Finally, there is no collaboration among any clients.

Due to the discrepancies of the sample distributions across different clients, the best local model for a specific client is very likely to come from collaborating with a subset of clients rather than all of them. Suppose denotes the model utility of client when collaborating with the clients in client set . In the following we define as the maximum model utility that can achieve when collaborating with different subsets of .

Definition 1 (Maximum Achievable Utility (MAU)).

This is the maximum model utility for a specific client to collaborate with different subsets of client set :

(1)

From Definition 1, MAU satisfies if is a subset of . Each client aims to identify its “optimal set" of collaborators from to maximize its local utility, which is defined as follows.

Definition 2 (Optimal Collaborator Set (OCS)).

A client set is an optimal collaborator set for if and only if satisfies

(2a)
(2b)

Eq.(2a) means that can achieve its maximal utility when collaborating with and Eq.(2b) means that all clients in are necessary. In this way, the relationships between any client and its optimal collaborator set can be represented by a graph which is called the benefit graph (BG). Specifically, for a given client set , we use to denote its corresponding BG. For the example in Figure 1 (a), an arrow from to means , e.g., means . For a client set , if every member can achieve its maximum model utility through the collaboration with other members within (without collaboration with other members outside ), then we call a coalition.

Figure 2: Forming coalitions for maximizing the local utility

Forming coalitions for maximizing the local model utilities

Figure 2 shows an example BG with 6 clients. can achieve its optimal model utility by collaborating with . Similarly, and can achieve their optimal model utility through collaborating with and . In this case, denotes a collaboration coalition, and each client achieves its optimal utility by collaborating with other clients in . If is taken out from , will leave as well because it cannot gain any benefit through collaboration with others, and then will leave for the same reason. With this ring structure of , none of the clients in can achieve its best performance without collaborating with the clients in .

3.2 Problem Setup

As each client in aims to maximize its local model utility by forming a collaboration coalition with others, all clients in can form several non-overlapping collaboration coalitions. In order to derive those coalitions, we propose the concept of collaboration equilibrium (CE) as follows.

Suppose we have a set of coalitions such that and for , then we say reaches CE if it satisfies the following two axioms.

Axiom 1 (Inner Agreement).

All collaboration coalitions satisfy inner agreement, i.e.,

(3)

From Axiom 1, inner agreement emphasizes that the clients of each coalition agree to form this coalition. It gives the necessary condition for a collaboration coalition to be formed such that any of the subset can benefit from the collaboration with . Eq.(3) tells us that there always exists a client in that opposes leaving because its utility will go down if is split from . In this way, inner agreement guarantees that all coalitions will not fall apart or the clients involved will suffer. For example, in Figure 2 does not satisfy inner agreement, because the clients in the subset achieves their optimal utility in and can leave without any loss.

Axiom 2 (Outer Agreement).

The collaboration strategy should satisfy outer agreement, i.e.,

(4)

From Axiom 2, outer agreement guarantees that there is no other coalition which can benefit each client involved more than achieves. Eq.(4) tells us that if is a coalition not from , there always exists a client and a coalition in such that can benefit more.

The collaboration strategy in Figure 2 is a CE in which the clients in and achieve their optimal model utility. Though does not achieve its maximum model utility in , there is no other coalitions which can attract and to form a new coalition with . Therefore, all clients have no better choice but agree upon this collaboration strategy.

Our goal is to obtain a collaboration strategy to achieve CE which satisfies Axiom 1 and Axiom 2, so that all clients achieve their optimal model utilities in the collaboration coalition. In the next section, we introduce our algorithm in detail on 1) how to derive a collaboration strategy that can achieve CE from the benefit graph and 2) how to construct the benefit graph.

4 Collaboration Equilibrium

In this section, we will introduce our framework on learning to collaborate. Firstly, we propose an iterative graph-theory based method to achieve CE based on a given benefit graph.

4.1 Achieving Collaboration Equilibrium Given the Benefit Graph

In theory, there are collaboration strategies for partitioning clients into several coalitions, where is the Bell number which denotes how many solutions for partitioning a set with elements bell1934exponential . Optimizing a set partition takes exponential time and could be intractable for arbitrarily many clients. In this section, we propose an iterative method for deriving a collaboration strategy which achieves CE with polynomial time complexity. Specifically, at each iteration, we search for a stable coalition which is formally defined in Definition 3 below, then we remove the clients in the stable coalition and re-build the benefit graph for the remaining clients. The iterations will continue until all clients are able to identify their own coalitions.

Definition 3 (Stable Coalition).

Given a client set , a coalition is stable if it satisfies

  1. Each client in achieves its maximal model utility, i.e.,

    (5)
  2. Any sub coalition cannot achieve the maximal utility for all clients in , i.e.,

    (6)

From Definition 3, Eq.(5) means that any client in a stable coalition cannot improve its model utility further. Eq.(6) states that this coalition is stable as any sub coalition can benefit from . Therefore any sub coalition has no motivation to leave . Eq.(5) implies that a stable coalition will not welcome any other clients to join as others will not benefit the clients in further. In Figure 2, and are the two stable coalitions. In order to identify the stable coalitions from the benefit graph, we first introduce the concept of strongly connected component in a directed graph.

Definition 4 (Strongly Connected Component tarjan1972depth ).

A subgraph is a strongly connected component of a given directed graph if it satisfies: 1) It is strongly connected, which means that there is a path in each direction between each pair of vertices in ; 2) It is maximal, which means no additional vertices from can be included in without breaking the property of being strongly connected.

Then we derive a graph-based method to obtain the collaboration coalitions that can achieve collaboration equilibrium by identifying all stable coalition iteratively according to Theorem 1 below.

Theorem 1.

(Proof in Appendix) Given a client set and its , the stable coalitions are strongly connected components of .

With Theorem 1, we need to identify all strongly connected components of , which can be achieved using the Tarjan algorithm tarjan1972depth with time complexity , where is the number of nodes and is the number of edges. Then following Eq.(5), we judge whether a strongly connected component is a stable coalition by checking whether all clients have achieved their maximal model utility. A stable coalition has no interest to collaborate with other clients, so will be removed and the remaining clients will continue to seek collaborations until all clients find their coalitions. In this way, we can achieve a partitioning strategy, with the details shown in Algorithm 1 in the Appendix.

Theorem 2.

(Proof in Appendix) The collaboration strategy obtained above achieves collaboration equilibrium.

The clients in all stable coalitions found in each iteration cannot improve their model utility further and will not collaborate with others because there are no additional benefits. Therefore, the collaboration strategy can be approved by all clients. The iterative method achieves CE considering the varies in each iteration after removing the stable coalitions, which can be time-consuming because we need to redefine in each iteration by re-learning an optimal personalized model for each remaining client.

Assumption 1.

The benefit graph of a subset () is the subgraph of the .

Assumption 1 claims that the benefit graph of the remaining clients keeps unchanged when the subgraph is split from . It implies that for each pair of clients and , whether is one of the optimal collaborators for will not be affected by other clients. In this case, we do not need to re-build the benefit graph and have the following corollary.

Corollary 1.

(proof in Appendix) When Assumption 1 holds, the strongly connected components of leads to a collaboration equilibrium.

4.2 Determine the Benefit Graph by Specific Pareto Optimization

Definition 5 (Pareto Solution and Pareto Front).

We consider objectives corresponding to clients: . Given a learned hypothesis

, suppose the loss vector

represents the utility loss on clients with hypothesis , we say is a Pareto Solution if there is no hypothesis that dominates h: , i.e.,

In a collaboration network with clients , as each client has its own learning task which can be formulated as a specific objective, we use to represent the Pareto Front (PF) of the client set formed by all Pareto hypothesis.

For clients seeking collaboration, determining its benefit graph requires 1) learning an optimal personalized model for each client which achieves its maximal utility ; 2) identifying which clients are necessary for obtaining . For 1), we propose to search for an optimal model on the Pareto Front which we call Specific Pareto Optimization (SPO); for 2), we determine the according to Pareto Front embedding property proposed in the Proposition 1.

SPO for achieving the maximal utility of the target client given the collaborator set

Given a collaborator set and a target client , our goal is to learn an optimal model for collaborating with other clients in . While the task on each client in can be formulated as an objective, from Definition 5, achieves Pareto optimality when any objective cannot be further optimized without degrading some others. While there are infinity models that satisfy Pareto optimality on training data, a core goal is to select a Pareto model which achieves the maximal utility on true data distribution of each client. The model with the minimal empirical risk of the target objective may not be the best local model, because it may rely on some task-unrelated information from the training data to achieve Pareto optimality. From Figure 3 (c), achieves the minimal loss on the true data distribution of which is not the optimal model on training data as shown in Figure 3 (b). As each direction vector corresponds to a specific Pareto model as shown in Figure 3 (b), we propose to optimize the direction vector to reach a model on PF which achieves the optimal performance on validation data.

Figure 3: (a) the loss plane of learned from training data of ; (b) the loss curve of learned from training data of is embedded in the loss plane of ; are 4 direction vectors corresponding to 4 Pareto models in ; (3) the performances of the models on true (testing) distributions and () achieves the optimal performance on client () corresponding to () in (b).
Proposition 1 (Pareto Front Embedding Property).

(proof in Appendix) Suppose and are the loss vectors achieved by the PFs and where , then

(7)

From Proposition 1, the loss vectors achieved by the PF of a sub-coalition are embedded in the loss vectors of the original coalition, such as the loss curve of is in the loss plane of shown in Figure 3 (a) and (b).

Determining OCS according to the geometric location of the optimal model on the PF

As SPO aims to find an optimal model for a target client given a collaborator set, however, there are collaborator sets for each client and a natural problem is how to determine an optimal collaborator set (OCS). We propose to determine the OCS for each client by the geometric location of the reached optimal model on the PF of the full coalition which “contains” the PF of all sub coalitions according to Proposition 1. This means that if an optimal model reached by SPO which maximizes the utility of on belongs to the PF of a sub coalition , then is the OCS of . For example, suppose there are 3 clients seeking collaboration and the PF of the 3 corresponding objectives is shown in Figure 3 (a), the model achieving the optimal utility on we reached by SPO on the PF is also a Pareto model for , so the OCS of is . In our implementation, we use a hypernetwork to learn the full PF . To verify the effectiveness of the OCS we obtain, we re-learn a model only using the clients in OCS and get similar results as the model reached in . Detailed information about our implementation is in Appendix.

5 Experiments

To intuitively demonstrate the motivation of collaboration equilibrium and the effectiveness of SPO, we conduct experiments on synthetic data, a real-world UCI dataset Adult kohavi1996scaling and a benchmark data set CIFAR10 lecun1998gradient . Morever, we verify the practicability of our framework on a real-world multiple hospitals collaboration network using the electronic health record (EHR) data set eICU pollard2018eicu . As SPO aims to achieve an optimal model utility by optimizing the personalized model on the PF of all clients, we use SPO to denote the model utility achieved by SPO. According to the OCS determined by SPO we achieve a CE for all clients and the model utility of each client in the CE can be different from the utility achieved by SPO. We use CE to denote the model utility achieved in the CE without causing further confusions.

5.1 Synthetic Experiments

Synthetic data

Suppose there are 6 clients in the collaboration network. The synthetic features owned by each client are generated by ; the ground-truth weights are samples as where

represents the client variance (if

increases, the data distribution discrepancy among clients will increase). Labels of the clients are observed with i.i.d noise . To generate conflicting learning tasks assigned to different clients, we flip over the label of some clients: and

From Table 3, when there are fewer samples () and less distribution discrepancy in the client set or with similar label generation process, these clients collaborate with others to achieve a low MSE. In this case, the OCS of each client is the clients with similar learning tasks and we achieve CE as as shown in the top of Figure 4 (a). With the increase of the number of samples and the distribution discrepancy, collaboration cannot benefit the clients and all clients will learn individually on their own data. Therefore, when and , the OCS of each client is itself and the collaboration strategy leads to a CE as shown in the bottom of Figure 4 (a).

UCI adult data

adult contains more than 40000 adult records and the task is to predict whether an individual earns more than 50K/year given other features (e.g., age, gender, education, etc.). Following the setting in li2019fair ; mohri2019agnostic , we split the data set into two clients. One is PhD client () in which all individuals are PhDs and the other is non-PhD client (). In this experiment, we implement SPO on this data set and compare the performance with existing relevant methods AFL mohri2019agnostic and q-FFL li2019fair 111The results of baselines are from li2019fair .

The two clients and have different data distribution and non-PhD client has more than 30000 samples while PhD client has about 500 samples. From Table 3, SPO achieves a higher accuracy compared to baselines especially on PhD clients (77.0). non-PhD client achieves an optimal accuracy (83.5) by local training. Therefore, PhD client improves its performance by collaborating with non-PhD client while the performance of non-PhD client declines. The benefit graph is shown in the top of Figure 4 (b). The CE is non-collaboration as in the bottom of Figure 4 (b) and the model of both clients in the CE are trained individually.

5.2 Benchmark Experiments

We compare our method with previous personalized federated learning (PFL) methods on CIFAR-10 

krizhevsky2009learning 222The results of baselines are from  shamsian2021personalized . Following the setting in mcmahan2016federated , we simulate non-i.i.d environment by randomly assigning two classes to each client among ten total classes. Baselines we evaluate are as follows: (1) Local training on each client; (2) FedAvg mcmahan2016federated ; (3) Per-FedAvg fallah2020personalized , a meta-learning based PFL algorithm. (4) pFedMe t2020personalized , a PFL approach which adds a Moreau-envelopes loss term; (5) LG-FedAvg liang2020think PFL method with local feature extractor and global output layers; (6) FedPer arivazhagan2019federated

, a PFL approach that learns personal classifier on top of a shared feature extractor; (7) pFedHN 

shamsian2021personalized , a PFL approach that generates models by training a hyper-network. In all experiments, our target network shares the same architecture as the baseline models. For each client, we split 87% of the training data for learning a Pareto Front by collaborating with the others and the remaining 13% of the training data for optimizing the direction vector to reach an optimal model as shown in Figure 3 (c). More implementation details are in Appendix.

Table 3 reports the results of all methods. FedAve achieves a lower accuracy (51.4) compared to local training (86.46) which means that training a global model can hurt the performance of each client. Compared to other PFL methods in Table 3, SPO reaches an optimal model on the PF of all objectives and achieves a higher accuracy (92.47). As the features learned from the images are transferable though there is a label shift among all clients, the collaboration among all clients leads to a more efficient feature extractor for each client. Therefore, the benefit graph of this collaboration network is a fully connected graph and the collaboration equilibrium is that all clients form a full coalition for collaboration as shown in Figure 4 (c). In this experiment, the accuracy model of each clients in CE equals to the accuracy achieved by SPO.

I
OCS CE (MSE) OCS CE (MSE)
0.24±0.08 1e-4±.0
0.26±0.08 1e-4±.0
0.24±0.04 1e-4±.0
0.26±0.07 1e-4±.0
0.26±0.09 1e-4±.0
0.26±0.03 1e-4±.0
Table 2: Adult
methods Accuracy
AFL 82.6 ± .5 73.0 ± 2.2
q-FFL 82.4 ± .1 74.4 ± .9
local 83.5 ± .0 66.9 ± 1.0
SPO(ours) 82.8 ± .3 77.0 ± .7
CE 83.5 ± .0 66.9 ± 1.0
Table 3: CIFAR10
methods accuracy
Local 86.46 ± 4.02
FedAve 51.42 ± 2.41
Per-FedAve 76.65 ± 4.84
FedPer 87.27 ± 1.39
pFedMe 87.69 ± 1.93
LG-FedAve 89.11 ± 2.66
pFedHN 90.83 ± 1.56
SPO (ours) 92.47 ± 4.80
CE 92.47 ± 4.80
Table 1: Synthetic
Figure 4: Collaboration equilibrium on synthetic data, Adult and CIFAR10.

5.3 Hospital Collaboration

eICU pollard2018eicu is a clinical data set collecting the patients about their admissions to ICUs with hospital information. Each instance is a specific ICU stay. We follow the data pre-processing procedure in sheikhalishahi2019benchmarking and naturally treat different hospitals as local clients. We conduct the task of predicting in-hospital mortality which is defined as the patient’s outcome at the hospital discharge. This is a binary classification task, where each data sample spans a 1-hour window. In this experiment, we select 5 hospitals with more patient samples (about 1000) and 5 hospitals with less patient samples . Due to label imbalance (more than 90% samples have negative labels), we use AUC to measure the utility for each client as in sheikhalishahi2019benchmarking . For all methods, we use the ANN as the network structure as in sheikhalishahi2019benchmarking .

methods AUC
Local 66.89 85.03 61.83 68.83 82.31 59.65 67.78 40.00 61.90 70.00
FedAve 71.92 89.36 81.00 73.89 80.23 70.18 52.22 40.00 61.90 75.00
SPO(ours) 76.35 91.80 80.28 70.52 86.93 82.46 71.11 40.00 76.19 83.33
CE 77.93 87.28 70.47 70.64 83.48 64.92 68.89 45.00 61.90 70.00
Table 4: eICU

The model AUC of each hospital is reported in Table 4. Because of the lack of patient data for each hospital, Local achieves a relatively lower AUC compared to FedAve and SPO. While patient populations vary substantially from hospital to hospital, SPO learns a personalized model for each hospital and outperforms FedAve from Table 4.

(a) benefit graph
(b) strongly connected components
(c) collaboration equilibrium
Figure 5: Collaboration Equilibrium of 10 real hospitals

Collaboration Equilibrium

The optimal collaborator sets of all hospitals determined by SPO are shown in the benefit graph in Figure 5(a). From Figure 5(a), is the necessary collaborator for all other hospitals while cannot contribute to any other hospitals. Since and are the unique necessary collaborator for each other, is a stable coalition as shown in Figure 5(b). We show all strongly connected components in Figure 5(b) and the final collaboration equilibrium is in Figure 5(c). For the stable coalition , as and are major hospitals with more patient data, they can contribute to the vast majority of hospitals and only major hospitals can benefit them. is a tiny clinic that cannot contribute to any hospitals, so no hospital is willing to collaborate with it and has to learn a local model with its own data by forming a simple coalition . For the remaining hospitals, on the one hand they cannot benefit or so they cannot form coalitions with them, on the other hand they refuse to contribute without any charge. They choose form the coalition to maximize their AUC. Therefore, the CE in this hospital collaboration network is achieved by the collaboration strategy and the model AUC of each client in the CE is in Table 4. CE guarantees that every client in its coalition will not collaborate with harmful clients, so the client may achieve a higher utility in a CE compared to collaborating with everyone such as the AUC of in CE (77.93) is higher than in SPO (76.35).

6 Conclusion

In this paper, we investigate collaboration learning in a meaningful and practical scenario. We propose a learning to collaborate framework to achieve collaboration equilibrium such that any of the individual clients cannot improve their performance further. We develop a Pareto optimization method for identifying which clients are worthy of collaboration and propose a graph-based method for reaching collaboration equilibrium. Comprehensive experiments on benchmark and real-world data sets demonstrated the validity of our proposed framework. In our study, some small clients could be isolated as they cannot benefit others. Our framework can quantify both the benefit to and the contribution from each client in a network. In practice, such information can be utilized to either provide incentives or to impose charges on each client, to facilitate and enhance the foundation of the network or coalition.

References