Federated Learning for Healthcare Informatics

11/13/2019 ∙ by Jie Xu, et al. ∙ cornell university 0

Recent rapid development of medical informatization and the corresponding advances of automated data collection in clinical sciences generate large volume of healthcare data. Proper use of these big data is closely related to the perfection of the whole health system, and is of great significance to drug development, health management and public health services. However, in addition to the heterogeneous and highly dimensional data characteristics caused by a spectrum of complex data types ranging from free-text clinical notes to various medical images, the fragmented data sources and privacy concerns of healthcare data are also huge obstacles to multi-institutional healthcare informatics research. Federated learning, a mechanism of training a shared global model with a central server while keeping all the sensitive data in local institutions where the data belong, is a new attempt to connect the scattered healthcare data sources without ignoring the privacy of data. This survey focuses on reviewing the current progress on federated learning including, but not limited to, healthcare informatics. We summarize the general solutions to the statistical challenges, system challenges and privacy issues in federated learning research for reference. By doing the survey, we hope to provide a useful resource for health informatics and computational research on current progress of how to perform machine learning techniques on heterogeneous data scattered in a large volume of institutions while considering the privacy concerns on sharing data.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1. Introduction

With the fast improvements in the informatization level of medical institutions, massive data have been produced during the processing of medical services, health care and health management, including electronic medical records, medical insurance information, health log, genetic inheritance, medical experimental results, scientific research data, etc (Raghupathi and Raghupathi, 2014; Groves et al., 2016). Analyzing these big data which stored in multiple institutions by machine learning techniques plays a great role in various aspects, e.g., effectively integrating medical information resources, sharing diagnosis and treatment technology, accelerating drug research and development, assisting doctors in accurate judgement, reducing medical costs, predicting treatment plans and curative effects (Dua et al., 2014; Dash et al., 2019)

. Watson, one of the most famous applications of artificial intelligence in the medical field, focusing on the diagnosis of various cancer diseases and providing medical advice. A recent document revealed that Watson had mistakenly prescribed a drug that could have killed a patient during a simulation

111https://www.statnews.com/2018/07/25/ibm-watson-recommended-unsafe-incorrect-treatments/. The misdiagnosis is largely due to data sources are far from enough. Sufficient medical data is key to accelerating and promoting medical research through the application of AI technology. However, the special properties of medical data, e.g., heterogeneous, sensitive and poor accessibility, are huge obstacles not only to healthcare sciences, but also to computational research.

Personal medical data involve individual privacy, while medical experimental data or scientific research data are not only related to the privacy of data subjects, industry development, and even related to national security. The development of genomics and the change of the rules of research activities make the disclosure of privacy almost inevitable. Therefore, there have been regulatory policies or protection mechanisms for the privacy of data subjects being set to restrict the data access. The Standards for Privacy of Individually Identifiable Health Information, commonly known as the HIPAA (Health Insurance Portability and Accountability Act) Privacy Rule222https://www.hhs.gov/hipaa/for-professionals/privacy/index.html, establishes the first national standards in the United States to protect patients’ personal or protected health information (PHI). On May 25, 2018, the General Data Protection Regulation (GDPR) issued by the European Union set strict rules on data security and privacy protection, emphasizing that the collection of user Data must be open and transparent (Voigt and Von dem Bussche, 2017). After GDPR being enforceable, the California Consumer Privacy Act (CCPA) of 2018, the China Internet Security Law and some other laws have also strengthened their attention to data security. In this environment where governments have their own patient privacy protection mechanisms, analyzing medical big data may be subject to different, even conflicting regulations. When data come from a variety of sources, medical data analysts must abide by the provisions of multiple privacy regulation laws, increasing the difficulty of research. The balance between medical data analysis and patient privacy protection has indeed become a difficult and urgent problem to be solved.

Federated learning is a new attempt to solve the data dilemma faced by traditional machine learning methods. It enables training a shared global model with a central server while keeping all the sensitive data in local institutions where the data belong. In advance of involving much machine learning algorithms, the concept of “federated” has been well applied in learning community (Hill, 1985; Kellogg, 1999), distributed data management and retrieval (Rehak et al., 2005; Mukherjee and Jaffe, 2005; Barcelos et al., 2011). In 1976, Patrick Hill, a philosophy professor, first developed the Federated Learning Community (FLC) to bring people together to learn from each other, and helped students overcome the anonymity and isolation of large research universities (Hill, 1985). After that, to support the discovery and access of learning content from diverse collection of content repositories, there are several efforts aimed at building federations of learning content and content repositories (Rehak et al., 2005; Mukherjee and Jaffe, 2005; Barcelos et al., 2011). In 2005, Rehak et al. (Rehak et al., 2005) developed a reference model that described how to establish an interoperable repository infrastructure by creating federations of repositories, where the metadata are collected from the contributing repositories into a central registry provided with a single point of discovery and access. The ultima goal of this model is to enable learning content from diverse content repositories to be found, retrieved and reused. Anyway, the practice of federated learning community or federated search service more or less provide references for the development of federated learning algorithms.

Before the term “federated learning” was formally introduced to describe the distributed-style learning technique of existing machine learning algorithms (Konečnỳ et al., 2016a, b; McMahan et al., 2016), there have been several work studied the analogous settings. In 2012, Balcan et al. (Balcan et al., 2012) considered the problem of PAC-learning from distributed data and analyzed the fundamental communication questions, followed by general upper and lower bounds on the amount of communication required to obtain good outcomes. Richtrik et al. (Richtárik and Takáč, 2016)

developed a distributed coordinate descent method called Hydra for solving loss minimization problems with big data, where computations are done locally on each node, with minimum communication overhead. They also gave bounds on communication rounds sufficient to approximately solve the strongly convex problem with high probability, and showed how it depended on the data and partitioning. Later on, Fercoq

et al. (Fercoq et al., 2014) extended Hydra and proposed Hydra

for minimizing regularized non-strongly convex loss functions. They implemented the method on the largest supercomputer on UK and showed the method is capable of dealing with a LASSO problem with 50 billion variables. Following the development of big data analysis and big deep learning models, federated learning as an efficient distributed-style learning technique is getting more and more attention.

When Google first proposed federated learning concept in 2016, the application scenario is Gboard - a virtual keyboard of Google for touchscreen mobile devices with support for more than 600 language varieties (Hard et al., 2018; McMahan et al., 2017b; Yang et al., 2018b; Ramaswamy et al., 2019; Chen et al., 2019a). At present, to implement the practical application of federated learning, the WeBank AI team has already committed to promoting the standardization of federal learning. In October 2018, they submitted to IEEE standards institute a proposal on establishing a federal Learning standard – ”Guide for Architectural Framework and Application of Federated Machine Learning” (Federated Learning infrastructure and Application standard), which was approved in December 2018. Later, under the guidance of Prof. Qiang Yang, IEEE P3652.1 (Association, 2018) (federated learning infrastructure and applications) standards working group was established. As this dialogue language supported by the international legal system between enterprises being established, the expansion of federal learning ecological can be further promoted.

As an innovative mechanism that could train global model from multiple parties with privacy-preserving property, federated learning has many other promising applications besides healthcare, e.g., virtual keyboard prediction (Hard et al., 2018; McMahan et al., 2017b; Yang et al., 2018b; Ramaswamy et al., 2019; Chen et al., 2019a), smart retail (Zhao et al., 2019), financial, vehicle-to-vehicle communication (Samarakoon et al., 2018) and so on. Therefore, we want to summarizes the current progress on federated learning including, but not limited to, healthcare informatics. We hope to provide a useful resource for health informatics and computational research for reference.

There have been some related summative works on federated learning (Dai et al., 2018; Yang et al., 2019). Dai et al. (Dai et al., 2018) provide an overview of the architecture and optimization approach for federated data analysis, where Newton-Ralphson method and alternating direction multiplier (ADMM) framework are used for distributed computation. Yang et al. (Yang et al., 2019) provide definitions, architectures and applications for the federated learning framework, and introduce a general privacy-preserving techniques that can be applied to federated learning. They also categorize federated learning based on the distribution characteristics of the data. Different from these works, this paper mainly summarizes the current progress on federated learning. We discuss the general solutions to the statistical challenges, system challenges and privacy issues in federated learning research. By doing the survey, we hope to provide a useful resource for health informatics and computational research on current progress of how to perform machine learning techniques on heterogeneous data scattered in a large volume of institutions while considering the privacy concerns on sharing data.

The rest of survey is organized as follows. In Sec. 2, we give a general overview of federated learning and define some notations in Tab. 1 which will be used later. Then, we summarize the challenges of federated learning and introduce the current progress on studying these issues in the next three Sections 3,4,5. After that, we briefly summarize the federated optimization algorithms in Sec. 6. In Sec. 7, we introduce some other applications and the popular platforms or federated learning research and hope to provide a useful resource for the beginners. Finally, we conclude the paper and discuss some other probably encountered questions when the federated learning is applied in healthcare area in Sec. 9.

Symbol Description
Number of activated clients
Total number of data points participated in collaboratively training
Target data distribution for the learning model
Number of data points stored on client
Data distribution associated to client
Table 1. List of Important notations

2. Federated Learning Problem Setting

Federated learning is a problem of training a high-quality shared global model with a central server from decentralized data scattered among extremely large number of different clients.

Formally, assume there are activated clients (a client could be a mobile, a wearable device or a medical institution, etc). Let denote the data distribution associated to client and the number of samples available from that client. is the total sample size. Federated machine learning problem boils down to solving a empirical risk minimization problem of the form (Konečnỳ et al., 2015, 2016a; McMahan et al., 2017a):


The objective in problem (1) can be rephrased as a linear combination of the local empirical objectives . In particular, algorithms for federated learning face with following challenges (Smith et al., 2017; Caldas et al., 2018b):

  • Statistical: The data distribution among all clients differ greatly, i.e., , we have . It is such that any data points available locally are far from being a representative sample of the overall distribution, i.e., .

  • Communication: The number of clients is large and can be much bigger than the average number of training sample stored in the activated clients, i.e., .

  • Privacy and Security: Additional privacy protections are needed for unreliable participating clients. It is impossible to ensure none of the millions of clients are malicious.

Next, we will detailedly survey existing federated learning related works on handling with these challenges.

3. Statistical Challenges of Federated Learning

The naive way to solve the federated learning problem is through Federated Averaging (FedAvg(McMahan et al., 2017a). It is demonstrated can work with certain non-IID data by requiring all the clients to share the same model. However, FedAvg

does not address the statistical challenge of strongly skewed data distributions. The performance of convolutional neural networks trained with

FedAvg algorithm can reduce significantly due to the weight divergence (Zhao et al., 2018). We roughly organize existing research on dealing with the statistical challenge of federated learning into two groups, i.e., consensus solution and pluralistic solution. We will detailedly discuss them in the following.

3.1. Consensus Solution

Most centralized models are trained on the aggregate training sample obtained from the samples drawn from the local clients (Smith et al., 2017; Zhao et al., 2018)

. Intrinsically, the centralized model is trained to minimize the loss with respect to the uniform distribution 

(Mohri et al., 2019):


where is the target data distribution for the learning model. However, this specific uniform distribution is not an adequate solution in most scenarios.

To address this issue, the recent proposed solution is to model the target distribution or force the data adapt to the uniform distribution (Zhao et al., 2018; Mohri et al., 2019). Specifically, Mohri et al. (Mohri et al., 2019) proposed a minimax optimization scheme, i.e., agnostic federated learning (AFL), where the centralized model is optimized for any possible target distribution formed by a mixture of the client distributions. This method has only been applied at small scales. Compared to AFL, Li et al. (Li et al., 2019b) proposed -Fair Federated Learning (q-FFL

), assigning higher weight to devices with poor performance, so that the distribution of accuracy in the network reduces in variance. They empirically demonstrate the improved flexibility and scalability of

q-FFL compared to AFL. Duan et al. (Duan, 2019) built a self-balancing federated learning framework called Astraea, rebalancing the training by performing data augmentation to minority classes and rescheduling clients to achieve a partial equilibrium.

Another commonly used method is sharing a small portion of data. Zhao et al. (Zhao et al., 2018) proposed a data-sharing strategy to improve FedAvg with non-IID data by creating a small subset of data which is globally shared between all the clients. The shared subset is required containing a uniform distribution over classes from the central server to the clients. Similarly, Yoshida et al. (Yoshida et al., 2019) presents a protocol called Hybird-FL, extending FedCS (Nishio and Yonetani, 2018) to mitigate the non-IID data problem. The main idea of Hybird-FL is to construct an approximately IID dataset on the server by gathering data from a limited number of clients, and the model updated by the approximately IID data is aggregated with other models updated by other clients. In addition to handle non-IID issue, Han et al. (Han and Zhang, 2019) proposed to identify training bugs (i.e., local data corruption) by sharing information of a small portion of trusted instances and noise patterns among. The trusted instances guide the local agents to select compact training subset, while the agents learn to add changes to selected data samples, in order to improve the test performance of the global model.

Besides the above methods, there are some work solving the statistical challenge by incorporating some special strategies in the optimization process (Chen et al., 2019b; Ghosh et al., 2019). Chen et al. (Chen et al., 2019b) analyzed signSGD and medianSGD in distributed settings with heterogeneous data by providing a gradient correction mechanism. After incorporating the perturbation mechanism, both algorithms are able to converge with provable rate. Ghosh et al.  (Ghosh et al., 2019)

proposed a modular algorithm for robust federated learning in a heterogeneous environment. After each client sends the local update to the server, the server runs outlier-robust clustering algorithm on these local parameters. After clustering, they run an outlier-robust distributed algorithm on each cluster, where each cluster can be thought of an instance of homogeneous distributed learning problem with possibly Byzantine machines. Different from the previous federated optimization problem, the server will do some relatively complicated task in this case.

The skewed distribution of data across different clients lead to very different learning rates for different clients, making tuning difficult without adaptive algorithms. To address these problems, Koskela et al. (Koskela and Honkela, 2019) propose a rigorous adaptive method for finding a good learning rate for SGD, and apply to differential privacy (DP) and federated learning settings. These works provide a good reference for solving the heterogeneous data problem in federated learning.

3.2. Pluralistic Solution

It is difficult to find a consensus solution that is good for all components . Instead of wastefully insisting on a consensus solution, many researchers choose to embracing this heterogeneity.

Multi-task learning is a natural way to deal with the data drawn from different distributions. It directly captures relationships amongst non-IID and unbalanced data by leveraging the relatedness between them in comparison to learn a single global model. In order to do this, it is necessary to target a particular way in which tasks are related, e.g. sharing sparsity, sharing low-rank structure, graph-based relatedness and so forth. Recently, Smith et al. (Smith et al., 2017) empirically demonstrated this point on real-world federated datasets and proposed a novel method MOCHA to solve a general convex MTL problem with handling the system challenges at the same time. Later, Corinzia et al. (Corinzia and Buhmann, 2019) introduced VIRTUAL

, an algorithm for federated multi-task learning with non-convex models. They consider the federation of central server and clients as a Bayesian network and perform training using approximated variational inference. This work bridges the frameworks of federated and transfer/continuous learning.

The success of multi-task learning rests on whether the chosen relatedness assumptions hold. Compared to this, pluralism can be a critical tool for dealing with heterogeneous data without any additional or even low-order terms that depend on the relatedness as in MTL (Eichner et al., 2019). Eichner et al. (Eichner et al., 2019) considered training in the presence of block-cyclic data, and showed that a remarkably simple pluralistic approach can entirely resolve the source of data heterogeneity. When the component distributions are actually different, pluralism can outperform the “ideal” i.i.d. baseline.

Besides, different special cases of machine learning, e.g.

, transfer learning, active learning, meta learning, are combined with federated learning principle to inherit their own advantages. Transfer learning is naturally introduced to solve the data heterogeneity problem, expands the scale of the available data and further improves the performance of the global model 

(Liu et al., 2018). Active learning and meta learning are applied on local clients to deal with insufficient labeled data (Khodak et al., 2019; Qian et al., 2019; Chen et al., 2018a).

4. Communication Efficiency of Federated Learning

In federated learning setting, training data remain distributed over a large number of clients each with unreliable and relatively slow network connections. Generally, for synchronous algorithms in federated learning (Smith et al., 2017; Konečnỳ et al., 2016b), let be the initial value, a typical round consists of the following steps:

  • A subset of existing clients is selected, each of which downloads the current model .

  • Each client in the subset computes an updated model based on their local data.

  • The model updates are sent from the selected clients to the sever.

  • The server aggregates these models (typically by averaging) to construct an improved global model, i.e.


Naively for the above protocol, the total number of bits that required during uplink (clinets server) and downlink (server clients) communication by each of the clients during training are given by


where is the total number of updates performed by each client, is the size of the model and is the entropy of the weight updates exchanged during transmitting process. is the difference between the true update size and the minimal update size (which is given by the entropy) (Sattler et al., 2019). Apparently, we can consider three ways to reduce the communication cost: a) reduce the number of clients , b) reduce the update size and c) reduce the number of updates . Starting at these three points, we can organize existing research on communication-efficient federated learning into four groups, i.e., model compression, clients selection, updates reducing and peer-to-peer learning. We will detailedly discuss them in the following.

4.1. Client Selection

The most natural and rough way is to restrict the participated clients or choose a fraction of parameters to be updated at each round. Shokri et al. (Shokri and Shmatikov, 2015)

use the selective stochastic gradient descent protocol, where the selection can be completely random or only the parameters whose current values are farther away from their local optima are selected,

i.e., those that have a larger gradient. Bui et al. (Bui et al., 2018)

improved federated learning for Bayesian Neural Networks using Partitioned Variational Inference (PVI), where the client can decide to upload the parameters back to the central server after multiple passes through its data, after one local epoch, or after just one mini-batch. Wang  

(Wang, 2019) calculated Shapley value for each feature to explain the prediction of the model, and help us further quantify the contribution function from the clients without needing to know detailed values of data. This leaves room for participants to choose a learning schedule that meets the communication constraints.

Nishio et al. (Nishio and Yonetani, 2018) propose a new protocol referred to as FedCS, where the central server manage the resources of heterogeneous clients and determine which clients should participate the current training task by analyzing the resource information of each client, such as wireless channel states, computational capacities and the size of data resources relevant to the current task. The server should decide how much data, energy and CPU resources used by the mobile devices such that the energy consumption, training latency, and bandwidth cost are minimized while meeting requirements of the training tasks. Anh (Anh et al., 2019) thus propose to use the Deep Q-Learning (DQL) (Van Hasselt et al., 2016) technique that enables the server to find the optimal data and energy management for the mobile devices participating in the Mobile Crowd-Machine Learning (MCML) through federated learning without any prior knowledge of network dynamics.

The limited communication bandwidth becomes the main bottleneck for aggregating the locally computed updates. Yang et al. (Yang et al., 2018a) thus propose a novel over-the-air computation based approach for fast global model aggregation via exploring the superposition property of a wireless multiple-access channel. During federated model training process, the clients suffer from considerable overhead in communication and computation. Without well-designed incentives, self-interested mobile devices will be reluctant to participate in federal learning tasks, which will hinder the adoption of federated learning (Kang et al., 2019). For this reason, Kim et al. (Kim et al., 2019) introduced the reward mechanism which is proportional to the training sample sizes into the proposed blockchained federated learning architecture. This measure promotes the federation of more clients with more training samples. Feng et al. (Feng et al., 2018) adopted the service pricing scheme to encourage the clients to participate the federated learning, where the price is also related to the training data size. They presented the Stackelberg game model to analyze the transmission strategy, training data pricing strategy of the self-organized mobile device and model owner’s learning service subscription in the cooperative federated learning system. They focused on the interactions among mobile devices and considered the impact of the interference costs on the profits of mobile devices. Kang et al. (Kang et al., 2019) designed an incentive mechanism based on contract theory to motivate data owners with high-accuracy local training data to participate in the learning process, so as to achieve efficient federated learning.

4.2. Model Compression

The goal of reducing uplink communication cost is to compress the server-to-client exchanges. The first way is through structured updates, where the update is directly learned from a restricted space parameterized using a smaller number of variables, e.g. sparse or low-rank (Konečnỳ et al., 2016b). The second way is lossy compression, where a full model update is first learned and then compressed using a combination of quantization, random rotations, and subsampling before sending it to the server (Konečnỳ et al., 2016b; Agarwal et al., 2018). Then the server decodes the updates before doing the aggregation. For deep neural networks, Chen et al. (Chen et al., 2019c) categorize the multiple layers into shallow and deep layers and update the parameters of the deep layers less frequently than those of the shallow layers. Also, a temporally weighted aggregation is adopted, where the most recently updated model have higher weight in the aggregation. Sattler et al. (Sattler et al., 2019) propose Sparse Ternary Compression (STC), a new compression framework that is specifically designed to meet the requirements of the Federated Learning environment. STC extends the existing compression technique of top-k gradient sparsification with a novel mechanism to enable downstream compression as well as ternarization and optimal Golomb encoding of the weight updates.

Most traditional distributed learning works focus on reducing the uplink communication cost and neglect that downloading a large model can still be considerable burden for users. For example, deep models which require significant computational resources both for training and inference are not easily downloaded and trained on edge devices. Due to this fact, many alternatives are proposed to compress the modes before deploying them on-device, e.g. pruning the least useful connections in a network (Han et al., 2015a, b), weight quantization (Hubara et al., 2016; Lin et al., 2017; De Sa et al., 2018), and model distillation (Hinton et al., 2015). However, many of these approaches are not applicable for the federated learning problem, as they are either ingrained in the training procedure or are mostly optimized for inference (Caldas et al., 2018a). Moreover, federated learning aims to deal with a large number of clients, thus communicating the global model may even become a bottleneck for the server (Caldas et al., 2018a).

Federated dropout, in which each client, instead of locally training an update to the whole global model, trains an update to a smaller sub-model (Caldas et al., 2018a). These sub-models are subsets of the global model and, as such, the computed local updates have a natural interpretation as updates to the larger global model. It is noted that federated dropout not only reduces the downlink communication but also reduces the size of uplink updates. Moreover, the local computational costs is correspondingly reduced since the local training procedure dealing with parameters with smaller dimensions. Zhu et al. (Zhu and Jin, 2019)

proposes a multi-objective federated learning to simultaneously maximize the learning performance and minimize the communication cost using a multi-objective evolutionary algorithm. To improve the scalability in evolving large neural networks, a modified sparse evolutionary algorithm method is used to indirectly encode the connectivity of the neural network which effectively reduce the number of the connections of neural networks by encoding only two hyper parameters.

4.3. Updates Reducing

Kamp et al. (Kamp et al., 2018) proposed to average models dynamically depending on the utility of the communication, which leads to a reduction of communication by an order of magnitude compared to periodically communicating state-of-the-art approaches. This is well suited for massively distributed systems with limited communication infrastructure. Guha  (Guha et al., 2019) focus on techniques for one-shot federated learning, in which they learn a global model from data in the network using only a single round of communication between the devices and the central server. Besides above works, Ren et al. (Ren et al., 2019)

theoretically analyze the detailed expression of the learning efficiency in the CPU scenario and formulate a training acceleration problem under both communication and learning resource budget. This work provides an important step towards the implementation of AI in wireless communication systems. Besides, reinforcement learning and round robin learning are used to manage the communication and computation resources 

(Anh et al., 2019; Wang et al., 2018; Liu et al., 2019a; Zhuo et al., 2019; Ickin et al., 2019).

4.4. Peer-to-Peer Learning

In federated learning, a central server is required to coordinate the training process of the global model. However, the communication cost to the central server may be not affordable since a large number of clients are usually involved. Also, many practical peer-to-peer networks are usually dynamic, and it is not possible to regularly access a fixed central server. Moreover, because of the dependence on central server, all clients are required to agree on one trusted central body, and whose failure would interrupt the training process for all clients. Therefore, some researches began to study fully decentralized framework where the central server is not required (Shayan et al., 2018; Roy et al., 2019; Lalitha et al., 2019b, a).

Towards medical applications, Roy et al. (Roy et al., 2019) proposed BrainTorrent, where all clients directly interact with each other without depending on a central body. Lalitha et al. (Lalitha et al., 2019b, a) introduce a posterior distribution over a parameter space for each client to characterize the unknown global space. The local clients are distributed over the graph/network where they only communicate with their one-hop neighbors. Each client updates its local belief based on own data, then aggregates information from the one-hop neighbors. Shayan et al. (Shayan et al., 2018) proposed a fully decentralized peer-to-peer approach called Biscotti, which uses crypto primitives and blockchain to coordinate a privacy-preserving multi-party ML process between local clients.

Although the advantages of a decentralized architecture have been proved as superior to its centralized counterpart when the nodes number is relatively large under a poor network condition (Lian et al., 2017), it generally operates only on a network where two nodes (or users) can exchange their local models only if they trust each other. However, in the case where node A may trust node B, but they still cannot communicate if node B does not trust node A. To solve this problem, He et al. (He et al., 2019) propose a central server free federated learning algorithm, named Online Push-Sum (OPS) method, to handle a generic scenario where the social network is unidirectional or of single-sided trust.

5. Privacy and Security

In federated learning, we usually assume the number of participated clients (e.g., phones, cars, …) is large and maybe reach to thousands or millions. It is impossible to ensure none of the clients are malicious. The setting of federated learning, where the model is trained locally without revealing the input data or the model’s output to any clients, prevents the direct leakage while training or using the model. However, the clients may infer some information about another client’s private dataset given the execution of , or over the shared predictive model  (Truex et al., 2018). Yang et al. (Yang et al., 2019) introduce a comprehensive secure federated learning framework, which emphasize on general privacy-preserving techniques that can be applied to federated learning. In this section, we only focus on the federated learning scenario. We first surveying the attack related works, followed by the researches dealing with privacy issues.

5.1. Attack (Honest-but-Curious and Adversary Setting)

Apparently, neither data poisoning defense nor anomaly detection can be used in federated learning, since they require access to participated clients’ training data or their uploaded model updates, respectively. The aggregation server cannot observe training data or model updates based on it without compromising participants’ privacy. All of these problems could make federated learning be vulnerable to backdoors and other model-poisoning attacks 

(Bagdasaryan et al., 2018).

5.1.1. Data Poisoning

The naive approach is that the attacker can simply train its model on label-flipping or backdoor inputs. Also, the attacker can maximize the overfitting to the backdoor data by changing the local learning rate and the number of local epochs. This naive approach does not hinder federal learning. Aggregation offsets most of the contributions of the backdoored model, and the federation model soon forgets about backdoors. The attacker needs to be selected frequently, and even then, poisoning is slow (Bagdasaryan et al., 2018).

5.1.2. Model Poisoning

Inference attack aims to learn if a particular individual participated in training or the attributes of the records in training set (Nasr et al., 2018). In native federated learning setting where additional privacy preserving techniques are not included, that is, the parameters are visible to local clients even curious adversaries. The adversary can actively exploit SGD which is widely used in training deep neural networks, to leak more information about the participated local clients’ training data. Nasr et al. (Nasr et al., 2018) adopted the privacy vulnerabilities of the SGD algorithm and designed an active white-box attack that performs gradient ascent on a set of target data samples before uploading the parameters. This gradient ascent attacker forces the target model to show great differences between target members and non-member instances, which makes the membership inference attack easier. And the accuracy of the central attacker can be further improved by isolating participant during parameter update.

Another method is using model replacement to introduce backdoor functionality into the global model (Bagdasaryan et al., 2018). In this approach, the attacker makes an ambitious attempt to replace the new global model with a malicious model in Eq. (3):


Because the data distribution among all clients differ greatly, each local model may be far from the current global model. As the global model converges, these deviations begin to cancel out, i.e., . Accordingly, the attacker can change the submitted model as below:


This attack expands the weight of the backdoored model to ensure that the attack’s contribution remains after averaging and transfers to the global model.

Bagdasaryan et al. (Bagdasaryan et al., 2018) evaluated the above attack for standard federated learning tasks under different assumptions, and showed that model replacement is much better than training data poisoning. What’s more, due to the success of the deep neural networks based machine learning models, most federated learning related papers also use deep networks. The phenomenon that deep networks tend to memorize training data makes them susceptible to various inference attacks (Nasr et al., 2018). Bhagoji et al. (Bhagoji et al., 2018) also explored the threat of model poisoning attacks on federated learning and indicated the vulnerability of the federated learning setting. Besides, due to the differences in the number of samples used in training for different participants, the disparate vulnerability (i.e., certain subgroups can be significantly more vulnerable than others) to privacy attacks on machine learning models should also be considered (Yaghini et al., 2019). Thus there is an urgent need to develop effective defense strategies.

5.2. Defense (Honest-but-Curious Setting)

In this part, all users follow the protocol honestly, but the server may attempt to learn extra information in different ways (Bonawitz et al., 2017). The most direct way to alleviate this problem is reducing the shared parameters or gradients of each client. Shokri et al. (Shokri and Shmatikov, 2015) showed that in modern deep learning, even sharing as few as gradients still results in significantly better accuracy than learning just on local data. Obviously such an approach does not solve the underlying potential threats to data privacy. To this end, there have been many efforts focus on privacy either from an individual point of view or multiparty views, especially in social media field which significantly exacerbated multiparty privacy (MP) conflicts (Thomas et al., 2010; Such and Criado, 2018).

5.2.1. Secure Multi-Party Computation

Secure multi-party computation (SMC) is a natural way to be applied to federated learning scenario, where each individual use a combination of cryptographic techniques and oblivious transfer to jointly compute a function of their private data (Pathak et al., 2010). Bonawitz et al.  (Bonawitz et al., 2017) design a secure Multi-Party Computation protocol for secure aggregation of high-dimensional data, where encryption technology is used to make the updates of a single device undetectable by the server and the sum is revealed only after receiving sufficient number of updates. This technique well dealt with one of the threats we talked before, i.e., any participant cannot inferring anything about another participant’s private data during local training process (Bagdasaryan et al., 2018).

Homomorphic encryption, due to its success in Cloud Computing, comes naturally into our sight. It has certainly been used in many federated learning researches (Hardy et al., 2017; Liu et al., 2018; Chai et al., 2019). Homomorphic encryption is a public key system, where any party can encrypt its data with a known public key and perform calculations with data encrypted by others with the same public key (Fontaine and Galand, 2007). Liu et al. (Liu et al., 2018) introduce Federated Transfer Learning (FTL) framework in a privacy-preserving setting and provide a novel approach for adapting additively homomorphic encryption to multi-party computation (MPC) with neural networks such that the accuracy is almost lossless and only minimal modifications to the neural networks is required. Chai et al. (Chai et al., 2019) propose a secure matrix factorization framework under the federated learning setting, where the distributed matrix factorization framework is enhanced with homomorphic encryption.

In addition to an additively homomorphic encryption scheme, Hardyet al. (Hardy et al., 2017) also described a three-party end-to-end solution in privacy-preserving entity resolution. They provide a formal analysis of the impact of entity resolution’s mistake on learning, which brings a clear and strong support for federated learning. Specifically, they proved that, under reasonable assumptions on the number and magnitude of entity resolution’s mistakes, federated learning is of great value in the setting where each peer’s data significantly improves the other.

Although SMC guarantee that none of the parties share anything with each other or with any third party, it can not prevent an adversary from learning some individual information, e.g.

, whose absence might change the decision boundary of a classifier, etc. Moreover, SMC protocols are usually computationally expensive even for the simplest problems, requiring iterated encryption/decryption and repeated communication between participants about some of the encrypted results 

(Pathak et al., 2010).

5.2.2. Differential Privacy

Differential privacy (DP) is an alternative theoretical model for protecting the privacy of individual data, which has been widely applied to many areas, not only traditional algorithms, e.g. boosting (Dwork et al., 2010)

, principal component analysis 

(Chaudhuri et al., 2013)

, support vector machine 

(Rubinstein et al., 2009), but also deep learning research (Abadi et al., 2016; McMahan et al., 2017b). Abadi et al. (Abadi et al., 2016) firstly demonstrate the training of deep neural networks with differential privacy, incurring a modest total privacy loss, computed over entire models with many parameters. Formally, it says:

Definition 5.1 (()-Differential Privacy (Dwork et al., 2006)).

A randomized algorithm satisfies ()-differential privacy if for any two adjacent datasets that differ in at most one entry, and for any subset of outputs ,


The parameter balances the accuracy of the differentially private and how much it leaks (Shokri and Shmatikov, 2015). The presence of a non-zero allows us to relax the strict relative shift in unlikely events (Dwork et al., 2006). In DP, a stochastic component (typically by additional noise) is usually added to or removed from the locally trained model. For instance, the Gaussian mechanism is defined by:



is the Gaussian distribution with mean

and standard deviation


Differential privacy ensures that the addition or removal does not substantially affect the outcome of any analysis, thus is also widely studied in federated learning research to prevent the indirect leakage. Besides reducing the shared parameters by selecting a small subset of gradients using sparse vector technique, Shokri et al. (Shokri and Shmatikov, 2015) choose to share perturbed values of the selected gradients under a consistent differentially private framework. They use the Laplacian mechanism to add noise which depends on the privacy budget as well as the sensitivity of the gradient for each parameter, and the (global) sensitivity of a function is defined as:


The global sensitivity estimates are expected significantly reduced, resulting in higher accuracy by ensuring the norm of all gradients is bounded for each update - either globally, or locally 

(Shokri and Shmatikov, 2015).

Afterwards, McMahan et al. (McMahan et al., 2017b) add client-level privacy protection to the federated averaging algorithm (McMahan et al., 2017a) relied heavily on privacy accounting for stochastic gradient descent (Abadi et al., 2016). But opposed to Abadi’s work (Abadi et al., 2016) which aims to protecting a single data point’s contribution, client-level privacy means the learnt model does not reveal whether a client participated during training. Almost at the same time, Geyer et al. (Geyer et al., 2017) propose a similar procedure for client level-DP, dynamically adapting the DP-preserving mechanism during decentralized training. Chen et al. (Chen et al., 2018b)

propose a differentially private autoencoder-based generative model (DP-AuGM) and a differentially private variational autoencoder-based generative model (DP-VaeGM). They conjectured that differential privacy is targeted to protect membership privacy while the key to defend against model inversion and GAN-based attacks is the perturbation of training data. Training global model with user-level DP usually adopts

FedSGD and FedAvg

with noised updates, and compute a DP guarantee using the Moments Accountant. All these processes rely on selecting a norm bound for each user’s update to the model, which requires careful parameter tuning. Happily, Thakkar

et al. (Thakkar et al., 2019) removed the need for extensive parameter tuning by adaptively setting the clipping norm applied to each user’s update.

Instead of using Gaussian mechanism, Agarwal et al. (Agarwal et al., 2018) improve previous analysis of the Binomial mechanism showing that it achieves nearly the same utility as the Gaussian mechanism, while requiring fewer representation bits. Traditionally used local differential privacy may prove too strict in practical applications. Consequently, Bhowmick et al. (Bhowmick et al., 2018) revisit the types of disclosures and adversaries against which they provide protections, and design new (minimax) optimal locally differentially private mechanisms for statistical learning problems for all privacy levels, where large privacy parameters in local differential privacy are allowed.

However, DP only protect users from data leakage to a certain extent, and may reduce performance in prediction accuracy because it is a lossy method (Cheng et al., 2019; Bagdasaryan et al., 2018). Thus, Cheng et al. (Cheng et al., 2019) propose a lossless privacy-preserving tree-boosting framework known as SecureBoost in a federated learning setting. This framework allows learning processes to be performed jointly over multiple parties with partially common user samples but different feature sets, corresponding to a vertically partitioned virtual data set. In addition to this, Truex et al. (Truex et al., 2018) combines DP with SMC to reduce the growth of noise injection as the number of parties increases without sacrificing privacy while preserving provable privacy guarantees, protecting against extraction attacks and collusion threats. Besides, combining the scalability of local DP with the high utility and MP, Ghazi et al. (Ghazi et al., 2019) provides further evidence that the shuffled model of differential privacy is a fertile “middle ground” between local differential privacy and general multi-party computations.

Problems Paper Indices
Statistical (Smith et al., 2017) (Zhao et al., 2018) (Mohri et al., 2019) (Li et al., 2019b) (Yoshida et al., 2019) (Eichner et al., 2019) (Corinzia and Buhmann, 2019) (Duan, 2019) (Han and Zhang, 2019) (Chen et al., 2019b) (Ghosh et al., 2019)
Communication (Kamp et al., 2018) (Konečnỳ et al., 2016b) (Nishio and Yonetani, 2018) (Shokri and Shmatikov, 2015) (McMahan et al., 2016) (Konečnỳ et al., 2016b) (Caldas et al., 2018a) (Chen et al., 2019c) (Agarwal et al., 2018) (Ren et al., 2019) (Bui et al., 2018) (Zhu and Jin, 2019) (Guha et al., 2019) (Anh et al., 2019) (Sattler et al., 2019) (Van Hasselt et al., 2016) (Yang et al., 2018a)
Privacy & Security (Truex et al., 2018) (Agarwal et al., 2018) (Yang et al., 2019) (Chen et al., 2018b) (Truex et al., 2018) (Fung et al., 2018b) (Fung et al., 2018a) (Bonawitz et al., 2017) (Cheng et al., 2019) (Shokri and Shmatikov, 2015) (McMahan et al., 2017b) (Geyer et al., 2017) (Bhowmick et al., 2018) (Ghazi et al., 2019) (Liu et al., 2018) (Chai et al., 2019) (Pathak et al., 2010) (Hardy et al., 2017) (Bagdasaryan et al., 2018) (Nasr et al., 2018) (Bhagoji et al., 2018)
Optimization (Konečnỳ et al., 2015) (Konečnỳ et al., 2016a) (Woodworth et al., 2018) (McMahan et al., 2017a) (Li et al., 2019c) (McMahan et al., 2017b) (Li et al., 2019b) (Li et al., 2019a) (Xie et al., 2019)
Others (Shayan et al., 2018) (Roy et al., 2019) (Lalitha et al., 2019b) (Lalitha et al., 2019a) (Thakkar et al., 2019) (Koskela and Honkela, 2019) (Kang et al., 2019) (Kim et al., 2019) (Feng et al., 2018) (Wang, 2019) (Feng et al., 2018) (Bonawitz et al., 2019)
Table 2. Summary of Papers based on Relatively More Emphasized Problem

5.2.3. Others

The current utility protocols for secure aggregation work in an honest-but-curious environment That is, if the server is honest and follows the protocol, then a curious adversary cannot learn any private information while observing all communication with the server. Unlike this protocol, a more robust and scalable primitive for privacy-preserving protocol is to shuffle user data to hide the origin of each data (Cheu et al., 2019). Based on it, Ghazi et al. (Ghazi et al., 2019) put forward a simple and more efficient protocol for aggregation in the shuffled model, where communication as well as error increases only polylogarithmically in the the number of users.

Fung et al. (Fung et al., 2018b) considered that honest clients can be separated from sybils by the diversity of gradient updates. Thus they proposed FoolsGold to defense the sybil-based poisoning attacks where the learning rate of clients that provide unique gradient updates is maintained. At the same time, the learning rate of clients that repeatedly contribute similar-looking gradient updates should be reduced. Besides, Fung et al. (Fung et al., 2018a) claimed they proposed a novel setting called brokered learning, where a short-lived, honest-but-curious broker is introduced to break the direct link between global center and local clients. This is essentially the same thing with previous federated learning works in honest-but-curious setting.

6. Federated Optimization

Many popular machine learning models have been studied in federated learning scenario, e.g.tensor factorization (Chai et al., 2019; Kim et al., 2017), Bayesian (Corinzia and Buhmann, 2019; Lalitha et al., 2019b, a; Yurochkin et al., 2019), Generative Adversarial Networks (GAN) (Wang et al., 2019; A. and V., 2019; Hardy et al., 2018). Recall Eq. (1) and suppose we have a set of data samples , then simple examples of local machine learning models include:

  • Linear regression:

  • Logistic regression:

  • Support vector machines:

A more complex non-convex problems arise in the context of neural networks, which predict through the non-convex function of the feature vector instead of the mapping . However, the resulting loss can still be written as , and the gradients can be effectively calculated using back-propagation (Konečnỳ et al., 2016a). This section briefly summarize the federated optimization algorithms, and list the baseline algorithm with/without privacy concern.

6.1. Baseline Algorithms

Instead of learning separate parameters to the data for each client as multi-task learning did (Smith et al., 2017), we mainly focus on summarizing the progress on training a single global model which corresponds to the consensus solution summarized in the previous Section 3.1.

6.1.1. Federated Averaging (FedAvg)

[Server Executes]:
for <> do
      random set of clients (each device is chosen with probability );
     Server sends to all chosen devices;
     for each client in parallel do
          [Client Update ]      
[Client Update ]:
(split into patches of size )
for <> do
     for  do
return to server
Algorithm 1 Federated Averaging. The activated clients are indexed by k, is the local minibatch size, and is the learning rate

The naive way to solve the federated learning problem without privacy is through Federated Averaging (FedAvg) Algorithm (McMahan et al., 2017a), as shown in Alg. 1. FedAvg is expected to be a baseline, but it ended up working well enough. It trains high-quality models using relatively few rounds of communication. Later, Li et al. (Li et al., 2019c) established a convergence analysis of FedAvg for strongly convex and smooth problems without assuming the data are i.i.d and all the devices are active.

In particular, if the full local dataset is treated as a single mini-batch, i.e., and only one epoch performed in [Client Update] step, the FedAvg algorithm degenerates into FedSGD (McMahan et al., 2017a). Alternative, if SVRG is used as the local solver, we could further derive Federated SVRG (FSVRG(Konečnỳ et al., 2015, 2016a).

6.1.2. Differentially Private Version (DP-FedAvg)

Some researchers (McMahan et al., 2017b; Geyer et al., 2017) further derived privacy-preserving versions of federated averaging (McMahan et al., 2017a). We list the main procedures in the Algorithm 2.

[Server Execution]:
initialize , Accountant
for <> do
     if then return
      random set of clients (each device is chosen with probability );
     Server sends to all chosen devices;
     for each client in parallel do
          [Client Update ]      
[Client Update ]:
(split into patches of size )
for <> do
     for  do
return to server
Algorithm 2 (Client-side) Differentially Private Federated Averaging. The activated clients are indexed by k, is the local minibatch size, and is the learning rate. is the set of variances for the Gaussian mechanism (GM). defines the DP we aim for. is the threshold for , the probability that -DP is broken.

6.1.3. Variants

In addition to these intuitive inferences, Li et al. (Li et al., 2019b) developed an scalable method -FedAvg inspired by fair resource allocation strategies in wireless networks, which encourages more fair accuracy distributions in federated learning. In [Client Update] step of -FedAvg, besides the local epochs of SGD, each selected client should also computes:


where can be tuned based on the desired amount of fairness (with larger inducing more fairness). Then, the server aggregation correspondingly changes to:


FedProx is proposed to tackle statistical heterogeneity (Li et al., 2019a). It is similar to FedAve and can encompass FedAvg as a special case. In [Client Update] step of FedProx, instead of just minimizing the local function as in FedAvg, the -th client uses its local solver of choice to approximately minimize the following surrogate objective :


The proximal term in Eq.(12) effectively limits the impact of local updates (by restricting them to be close to the initial model) without manaully adjusting the number of local epochs as in FedAvg. To further improve flexibility and scalability, Xie et al. (Xie et al., 2019) proposed a asynchronous federated optimization algorithm called FedAsync using similar surrogate objective. Huang et al. (Huang et al., 2018) devised a variant of FedAvg named LoAdaBoost FedAvg that was based on the median cross-entropy loss to adaptively boost the training process of clients who appear to be weak learners.

6.2. Theoretical Progress

In this section, we will roughly survey the current theoretical progress on federated learning problem. Generally, the quality of the federated learning predictions can be measured using the notion of regret (Dekel et al., 2012), defined as


where . denotes the overall data distribution. measures the difference between the cumulative loss of the predictions in federated environment and the cumulative loss of the fixed predictor , which is optimal with respect to the overall distribution .

Most theoretical papers on federated optimization usually focus on bounding the expected regret . The current theoretical progress on federated learning problem is summarised in Table 3.

Method Convexity Smoothness Assumptions Convergence
FedAvg (McMahan et al., 2017a)
FedAvg (Li et al., 2019c) Strongly Convex Lipschitz Smooth 1, 2
-FedAvg (Li et al., 2019b) Lipschitz Smooth
Pluralistic Averaging (Eichner et al., 2019) Convex Lipschitz Smooth Semi-cyclic Samples
Pluralistic Hedging (Eichner et al., 2019) Convex Lipschitz Smooth Semi-cyclic Samples
MOCHA (Smith et al., 2017) Convex Lipschitz Smooth/Continuous Ref (Smith et al., 2017)
FedProx (Li et al., 2019a) Nonconvex Lipschitz Smooth 1, 2 and Ref (Li et al., 2019a)
FedAsync (Xie et al., 2019) Weakly Convex Lipschitz Smooth 1, 2
Modular (Ghosh et al., 2019) Strongly Convex Lipschitz Smooth 1, 2 and Ref (Ghosh et al., 2019)
Table 3. Summary of Federated Optimization Algorithms

7. Applications

As an collaborative modeling mechanism that could carry out efficient machine learning under the premise of ensuring data privacy and legal compliance between multiple parties or multiple computing nodes, federated learning has attracted broad attention of all circles. Besides healthcare, federated learning has many other promising applications in various areas, e.g., virtual keyboard prediction (Hard et al., 2018; McMahan et al., 2017b; Yang et al., 2018b; Ramaswamy et al., 2019; Chen et al., 2019a), smart retail (Zhao et al., 2019), financial, vehicle-to-vehicle communication (Samarakoon et al., 2018) and so on. In the following, we first summary some federated learning works in healthcare, then we roughly introduce federated learning works in other applications for reference.

7.1. Healthcare

Federated learning is a good way to connect all the medical institutions and makes them share their experiences with privacy guarantee. In this case, the performance of machine learning model will be significantly improved by the formed large medical data set. There have been some tasks were studied in federated learning setting in healthcare, e.g., patient similarity learning (Lee et al., 2018), patient representation learning, phenotyping (Kim et al., 2017; Liu et al., 2019b), predicting future hospitalizations (Brisimi et al., 2018), predicting mortality and ICU stay time (Huang and Liu, 2019), etc.

Lee et al. (Lee et al., 2018) presented a privacy-preserving platform in a federated setting for patient similarity learning across institutions. Their model can find similar patients from one hospital to another without sharing patient-level information. Kim et al. (Kim et al., 2017) used tensor factorization models to convert massive electronic health records into meaningful phenotypes for data analysis in federated learning setting. Vepakomma et al. (Vepakomma et al., 2018) built several configurations upon a distributed deep learning method called SplitNN (Gupta and Raskar, 2018) to facilitate the health entities collaboratively training deep learning models without sharing sensitive raw data or model details. Silva et al. (Silva et al., 2018) illustrated their federated learning framework by investigating brain structural relationships across diseases and clinical cohorts. Huang et al. (Huang and Liu, 2019) sought to tackle the challenge of non-IID ICU patient data that complicated decentralized learning, by clustering patients into clinically meaningful communities and optimizing performance of predicting mortality and ICU stay time. Brisimi et al. (Brisimi et al., 2018) aimed at predicting future hospitalizations for patients with heart-related diseases using EHR data spread among various data sources/agents by solving the -regularized sparse Support Vector Machine classifier in federated learning environment. Liu et al. (Liu et al., 2019b) conducted both patient representation learning and obesity comorbidity phenotyping in a federated manner and got good results.

7.2. Others

An important application of federated learning is natural language processing task. When Google first proposed federated learning concept in 2016, the application scenario is Gboard - a virtual keyboard of Google for touchscreen mobile devices with support for more than 600 language varieties 

(Hard et al., 2018; McMahan et al., 2017b; Yang et al., 2018b; Ramaswamy et al., 2019; Chen et al., 2019a). Indeed, as users increasingly turn to mobile devices, fast mobile input methods with auto-correction, word completion, and next-word prediction features are becoming more and more important. For these natural language processing tasks, especially for next-word prediction, the data typed in mobile apps are usually better than the data from scanned books or transcribed utterances on aiding typing on a mobile keyboard. However, the language data are often with sensitive information, e.g., the text typed on a mobile phone might including passwords, search queries or text messages. Typically, language data may identify the speaker by name or some rare phrases, and then link the speaker to confidential or sensitive information (McMahan et al., 2017b). Therefore, as an innovative mechanism that could train global model from multiple parties with privacy-preserving property, federated learning has a promising application in natural language task like virtual keyboard prediction (Hard et al., 2018; McMahan et al., 2017b; Yang et al., 2018b; Ramaswamy et al., 2019; Chen et al., 2019a). Hard et al. (Hard et al., 2018)

trained a recurrent neural network language model in federated learning environment for the purpose of next-word prediction in a virtual keyboard for smartphones, which demonstrates the feasibility and benefits of training production-quality models for natural language understanding tasks while keeping users’ data on their devices.

Besides the next word prediction on Gboard, other user cases also include search query suggestions (Yang et al., 2018b), emoji prediction in a mobile keyboard (Ramaswamy et al., 2019), and learning out-of-vocabulary (OOV) words for the purpose of expanding the vocabulary of a virtual keyboard for smartphones (Chen et al., 2019a). Except for the text data, Leroy (Leroy et al., 2019)

investigated the use of federated learning on crowd-sourced speech data, to solve out-of-domain issues such as wake word detection. Additionally, they open source the

Hey Snips wake word dataset to further foster transparent research in the application of federated learning to speech data. Bonawitz et al. (Bonawitz et al., 2019)

built a scalable production system based on TensorFlow for federated learning in the domain of mobile devices. They addresses numerous practical issues and describe the resulting high-level design.

Other applications include smart retail (Zhao et al., 2019), financial, vehicle-to-vehicle communication (Samarakoon et al., 2018) and so on. Smart retail aims to use machine learning technology to provide personalized services to customers based on some data like user purchasing power and product characteristics, including product recommendation and sales services. Zhao et al. (Zhao et al., 2019) designed a smart system to help Internet-of-Things (IoT) device manufacturers leverage customers’ data and built a machine learning model to predict customers’ requirements and possible consumption behaviours in federated learning (FL) environment. They also add differential privacy to protect the privacy of customers’ data. For financial applications, one example is that WeBank use federated learning principle to detect multiparty borrowing which is the pain point of financial institutions. Under the federated learning mechanism, there is no need to set up a central database, which not only protects the privacy and data integrity of existing users in various financial institutions, but also completes the inquiry of multiparty borrowing.

8. Platforms

With the growth and development of federated learning, there are many companies or research teams carried out kinds of federated learning research oriented to scientific research and product development. In addition to Google’s TensorFlow, another one of the most popular deep learning frameworks in the world, i.e.,PyTorch from Facebook, has also started to adopt the federated learning approach to achieve privacy protection. Facebook’s AI research team launched a free two-month Udacity course at the same time 333https://www.udacity.com/course/secure-and-private-ai–ud185. It specifically mentions how to use federated learning in PyTorch. Particularly, the popular platforms or tools for federated learning research include:

  • PySyft. PySyft is an open source project of OpenMined, which is mainly designed to protect the privacy of deep learning (Ryffel et al., 2018). It decouples private data from model training using federated learning, DP and MPC within PyTorch. Currently, TensorFlow bindings for PySyft is also available (OpenMined, 2019).

  • TFF. TensorFlow Federated (TFF) is also an open source framework for machine learning and other calculations on distributed data (Google, 2019). It is designed based on their experience in developing federated learning technologies at Google, and Google supports machine models for mobile keyboard retrieval and in-device search. With TFF, TensorFlow provides users with a more flexible and open framework through which they can simulate distributed computing locally.

  • FATE. Federated AI Technology Enabler (FATE) is an open source project initiated by Webank’s AI division (AI, 2019). It aims to provide a secure computing framework to support the Federated AI ecosystem, where a secure computing protocol is implemented based on homomorphic encryption and MPC. FATE supports federated learning architectures and secure computing of various machine learning algorithms, including logistic regression, tree-based algorithms, transfer learning and deep learning. Recently, Webank upgraded FATE again and launched the first visual federated learning tool - FATEBoard, as well as federated learning modeling pipeline scheduling and life cycle management tool - FATEFlow. The new version of FATE also includes partial multi-party support. In future versions, Webank’s AI team will further enhance the multi-party support.

  • Tensor/IO. Tensor/IO is a lightweight cross-platform library for on-device machine learning, bringing the power of TensorFlow and TensorFlow Lite to iOS, Android, and React native applications (doc.ai, 2019). Tensor /IO itself does not implement any machine learning algorithms, but works with underlying libraries such as TensorFlow to simplify the process of deploying and using models on mobile phones. It runs on iOS and Android phones, with bridging for React Native. The library will interact with the specific backend you selected in the language of your choice (objective-c, Swift, Java, Kotlin, or JavaScript).

  • Functional Federated Learning in Erlang (ffl-erl). ffl-erl is the first open-source implementation of a framework for federated learning in Erlang (Ulm et al., 2018). Erlang is a structured, dynamically typed programming language with built-in parallel computing support, which is well suited for building distributed, real-time soft parallel computing systems. The ffl-erl project has influenced an ongoing work to develop a real-world system for distributed data analysis for the automotive industry (Ulm et al., 2019).

9. Conclusions and Open Questions

In this survey, we have reviewed the current progress on federated learning including, but not limited to healthcare informatics. We summary the general solutions to the various challenges in federated learning. We briefly summarized the federated optimization algorithms and list the baseline algorithm with/without privacy concern. We also introduced existing federated learning platforms and hope to provide a useful resource for researchers to refer. Besides the summarized general issues in federated learning setting, we list some probably encountered directions or open questions when federated learning is applied in healthcare area in the following.

  • Data Quality. Federated learning has the potential to connect all the isolated medical institutions, hospitals or devices to make them share their experiences with privacy guarantee. However, most health systems suffer from data clutter and efficiency problems. The quality of data collected from multiple sources is uneven and there is no uniform data standard. The analyzed results are apparently worthless when dirty data are accidentally used as samples. The ability to strategically leverage medical data is critical. Therefore, how to clean, correct and complete data and accordingly ensure data quality is a key to improve the machine learning model weather we are dealing with federated learning scenario or not.

  • Incorporating Expert Knowledge. In 2016, IBM introduced Watson for Oncology, a tool that uses the natural language processing system to summarize patients’ electronic health records and search the powerful database behind it to advise doctors on treatments. Unfortunately, some oncologists say they trust their judgment more than Watson tells them what needs to be done 444http://news.moore.ren/industry/158978.htm. Therefore, hopefully doctors will be involved in the training process. Since every data set collected here cannot be of high quality, so it will be very helpful if the standards of evidence-based machine is introduced, doctors will also see the diagnostic criteria of artificial intelligence. If wrong, doctors will give further guidance to artificial intelligence to improve the accuracy of machine learning model during training process.”

  • Incentive Mechanisms. With the internet of things and the variety of third party portals, a growing number of smartphone healthcare apps are compatible with wearable devices. In addition to data accumulated in hospitals or medical centers, another type of data that is of great value is coming from wearable devices not only to the researchers, but more importantly for the owners. However, during federated model training process, the clients suffer from considerable overhead in communication and computation. Without well-designed incentives, self-interested mobile or other wearable devices will be reluctant to participate in federal learning tasks, which will hinder the adoption of federated learning (Kang et al., 2019). How to design an efficient incentive mechanism to attract devices with high-quality data to join federated learning is another important problem.

  • Personalization. Wearable devices are more focus on public health, which means helping people who are already healthy to improve their health, such as helping them exercise, practice meditation and improve their sleep quality. How to assist patients to carry out scientifically designed personalized health management, correct the functional pathological state by examining indicators, and interrupt the pathological change process are very important. Reasonable chronic disease management can avoid emergency visits and hospitalization and reduce the number of visits. Cost and labor savings. Although there are some general work about federated learning personalization (Sim et al., 2019; Jiang et al., 2019), for healthcare informatics, how to combining the medical domain knowledge and make the global model be personalized for every medical institutions or wearable devices is another open question.

  • Model Precision. Federated tries to make isolated institutions or devices share their experiences, and the performance of machine learning model will be significantly improved by the formed large medical dataset. However, the prediction task is currently restricted and relatively simple. Medical treatment itself is a very professional and accurate field. Medical devices in hospitals have incomparable advantages over wearable devices. And the models of Doc.ai could predict the phenome collection of one’s biometric data based on its selfie, such as height, weight, age, sex and BMI555https://doc.ai/blog/do-you-know-how-valuable-your-medical-da/. How to improve the prediction model to predict future health conditions is definitely worth exploring.


  • [1] R. A. and N. V. (2019) Federated AI lets a team imagine together: federated learning of gans. CoRR abs/1906.03595. External Links: Link, 1906.03595 Cited by: §6.
  • [2] M. Abadi, A. Chu, I. Goodfellow, H. B. McMahan, I. Mironov, K. Talwar, and L. Zhang (2016) Deep learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security, pp. 308–318. Cited by: §5.2.2, §5.2.2.
  • [3] N. Agarwal, A. T. Suresh, F. X. X. Yu, S. Kumar, and B. McMahan (2018) CpSGD: communication-efficient and differentially-private distributed sgd. In Advances in Neural Information Processing Systems, pp. 7564–7575. Cited by: §4.2, §5.2.2, Table 2.
  • [4] W. AI (2019) Federated ai technology enabler. Note: https://www.fedai.org/cn/ Cited by: 3rd item.
  • [5] T. T. Anh, N. C. Luong, D. Niyato, D. I. Kim, and L. Wang (2019) Efficient training management for mobile crowd-machine learning: a deep reinforcement learning approach. IEEE Wireless Communications Letters. Cited by: §4.1, §4.3, Table 2.
  • [6] I. S. Association (2018) P3652.1 - guide for architectural framework and application of federated machine learning. Note: https://standards.ieee.org/project/3652_1.html Cited by: §1.
  • [7] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov (2018) How to backdoor federated learning. arXiv preprint arXiv:1807.00459. Cited by: §5.1.1, §5.1.2, §5.1.2, §5.1, §5.2.1, §5.2.2, Table 2.
  • [8] M. F. Balcan, A. Blum, S. Fine, and Y. Mansour (2012) Distributed learning, communication complexity and privacy. In Conference on Learning Theory, pp. 26–1. Cited by: §1.
  • [9] C. Barcelos, J. Gluz, and R. Vicari (2011) An agent-based federated learning object search service. Interdisciplinary journal of e-learning and learning objects 7 (1), pp. 37–54. Cited by: §1.
  • [10] A. N. Bhagoji, S. Chakraborty, P. Mittal, and S. Calo (2018) Analyzing federated learning through an adversarial lens. arXiv preprint arXiv:1811.12470. Cited by: §5.1.2, Table 2.
  • [11] A. Bhowmick, J. Duchi, J. Freudiger, G. Kapoor, and R. Rogers (2018) Protection against reconstruction and its applications in private federated learning. arXiv preprint arXiv:1812.00984. Cited by: §5.2.2, Table 2.
  • [12] K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman, V. Ivanov, C. Kiddon, J. Konecny, S. Mazzocchi, H. B. McMahan, et al. (2019) Towards federated learning at scale: system design. arXiv preprint arXiv:1902.01046. Cited by: Table 2, §7.2.
  • [13] K. Bonawitz, V. Ivanov, B. Kreuter, A. Marcedone, H. B. McMahan, S. Patel, D. Ramage, A. Segal, and K. Seth (2017) Practical secure aggregation for privacy-preserving machine learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, pp. 1175–1191. Cited by: §5.2.1, §5.2, Table 2.
  • [14] T. S. Brisimi, R. Chen, T. Mela, A. Olshevsky, I. C. Paschalidis, and W. Shi (2018) Federated learning of predictive models from federated electronic health records. International journal of medical informatics 112, pp. 59–67. Cited by: §7.1, §7.1.
  • [15] T. D. Bui, C. V. Nguyen, S. Swaroop, and R. E. Turner (2018) Partitioned variational inference: a unified framework encompassing federated and continual learning. arXiv preprint arXiv:1811.11206. Cited by: §4.1, Table 2.
  • [16] S. Caldas, J. Konečny, H. B. McMahan, and A. Talwalkar (2018) Expanding the reach of federated learning by reducing client resource requirements. arXiv preprint arXiv:1812.07210. Cited by: §4.2, §4.2, Table 2.
  • [17] S. Caldas, P. Wu, T. Li, J. Konečnỳ, H. B. McMahan, V. Smith, and A. Talwalkar (2018) Leaf: a benchmark for federated settings. arXiv preprint arXiv:1812.01097. Cited by: §2.
  • [18] D. Chai, L. Wang, K. Chen, and Q. Yang (2019-06) Secure federated matrix factorization. pp. . Cited by: §5.2.1, Table 2, §6.
  • [19] K. Chaudhuri, A. D. Sarwate, and K. Sinha (2013) A near-optimal algorithm for differentially-private principal components. The Journal of Machine Learning Research 14 (1), pp. 2905–2943. Cited by: §5.2.2.
  • [20] F. Chen, Z. Dong, Z. Li, and X. He (2018) Federated meta-learning for recommendation. arXiv preprint arXiv:1802.07876. Cited by: §3.2.
  • [21] M. Chen, R. Mathews, T. Ouyang, and F. Beaufays (2019) Federated learning of out-of-vocabulary words. arXiv preprint arXiv:1903.10635. Cited by: §1, §1, §7.2, §7.2, §7.
  • [22] Q. Chen, C. Xiang, M. Xue, B. Li, N. Borisov, D. Kaarfar, and H. Zhu (2018) Differentially private data generative models. arXiv preprint arXiv:1812.02274. Cited by: §5.2.2, Table 2.
  • [23] X. Chen, T. Chen, H. Sun, Z. S. Wu, and M. Hong (2019) Distributed training with heterogeneous data: bridging median and mean based algorithms. arXiv preprint arXiv:1906.01736. Cited by: §3.1, Table 2.
  • [24] Y. Chen, X. Sun, and Y. Jin (2019) Communication-efficient federated deep learning with asynchronous model update and temporally weighted aggregation. arXiv preprint arXiv:1903.07424. Cited by: §4.2, Table 2.
  • [25] K. Cheng, T. Fan, Y. Jin, Y. Liu, T. Chen, and Q. Yang (2019) SecureBoost: a lossless federated learning framework. arXiv preprint arXiv:1901.08755. Cited by: §5.2.2, Table 2.
  • [26] A. Cheu, A. Smith, J. Ullman, D. Zeber, and M. Zhilyaev (2019) Distributed differential privacy via shuffling. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 375–403. Cited by: §5.2.3.
  • [27] L. Corinzia and J. M. Buhmann (2019) Variational federated multi-task learning. arXiv preprint arXiv:1906.06268. Cited by: §3.2, Table 2, §6.
  • [28] W. Dai, S. Wang, H. Xiong, and X. Jiang (2018) Privacy preserving federated big data analysis. In Guide to Big Data Applications, pp. 49–82. Cited by: §1.
  • [29] S. Dash, S. K. Shakyawar, M. Sharma, and S. Kaushik (2019) Big data in healthcare: management, analysis and future prospects. Journal of Big Data 6 (1), pp. 54. Cited by: §1.
  • [30] C. De Sa, M. Leszczynski, J. Zhang, A. Marzoev, C. R. Aberger, K. Olukotun, and C. Ré (2018) High-accuracy low-precision training. arXiv preprint arXiv:1803.03383. Cited by: §4.2.
  • [31] O. Dekel, R. Gilad-Bachrach, O. Shamir, and L. Xiao (2012) Optimal distributed online prediction using mini-batches. Journal of Machine Learning Research 13 (Jan), pp. 165–202. Cited by: §6.2.
  • [32] doc.ai (2019) Declarative, on-device machine learning for ios, android, and react native. Note: https://github.com/doc-ai/tensorio Cited by: 4th item.
  • [33] S. Dua, U. R. Acharya, and P. Dua (2014) Machine learning in healthcare informatics. Vol. 56, Springer. Cited by: §1.
  • [34] M. Duan (2019) Astraea: self-balancing federated learning for improving classification accuracy of mobile deep learning applications. arXiv preprint arXiv:1907.01132. Cited by: §3.1, Table 2.
  • [35] C. Dwork, K. Kenthapadi, F. McSherry, I. Mironov, and M. Naor (2006) Our data, ourselves: privacy via distributed noise generation. In Annual International Conference on the Theory and Applications of Cryptographic Techniques, pp. 486–503. Cited by: §5.2.2, Definition 5.1.
  • [36] C. Dwork, G. N. Rothblum, and S. Vadhan (2010) Boosting and differential privacy. In 2010 IEEE 51st Annual Symposium on Foundations of Computer Science, pp. 51–60. Cited by: §5.2.2.
  • [37] H. Eichner, T. Koren, H. B. McMahan, N. Srebro, and K. Talwar (2019) Semi-cyclic stochastic gradient descent. arXiv preprint arXiv:1904.10120. Cited by: §3.2, Table 2, Table 3.
  • [38] S. Feng, D. Niyato, P. Wang, D. I. Kim, and Y. Liang (2018) Joint service pricing and cooperative relay communication for federated learning. arXiv preprint arXiv:1811.12082. Cited by: §4.1, Table 2.
  • [39] O. Fercoq, Z. Qu, P. Richtárik, and M. Takáč (2014) Fast distributed coordinate descent for non-strongly convex losses. In 2014 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. Cited by: §1.
  • [40] C. Fontaine and F. Galand (2007) A survey of homomorphic encryption for nonspecialists. EURASIP Journal on Information Security 2007, pp. 15. Cited by: §5.2.1.
  • [41] C. Fung, J. Koerner, S. Grant, and I. Beschastnikh (2018) Dancing in the dark: private multi-party machine learning in an untrusted setting. arXiv preprint arXiv:1811.09712. Cited by: §5.2.3, Table 2.
  • [42] C. Fung, C. J. Yoon, and I. Beschastnikh (2018) Mitigating sybils in federated learning poisoning. arXiv preprint arXiv:1808.04866. Cited by: §5.2.3, Table 2.
  • [43] R. C. Geyer, T. Klein, and M. Nabi (2017) Differentially private federated learning: a client level perspective. arXiv preprint arXiv:1712.07557. Cited by: §5.2.2, Table 2, §6.1.2.
  • [44] B. Ghazi, R. Pagh, and A. Velingker (2019) Scalable and differentially private distributed aggregation in the shuffled model. arXiv preprint arXiv:1906.08320. Cited by: §5.2.2, §5.2.3, Table 2.
  • [45] A. Ghosh, J. Hong, D. Yin, and K. Ramchandran (2019) Robust federated learning in a heterogeneous environment. arXiv preprint arXiv:1906.06629. Cited by: §3.1, Table 2, Table 3, Assumption 4.
  • [46] Google (2019) TensorFlow federated. Note: https://www.tensorflow.org/federated Cited by: 2nd item.
  • [47] P. Groves, B. Kayyali, D. Knott, and S. V. Kuiken (2016) The’big data’revolution in healthcare: accelerating value and innovation. Cited by: §1.
  • [48] N. Guha, A. Talwalkar, and V. Smith (2019) One-shot federated learning. arXiv preprint arXiv:1902.11175. Cited by: §4.3, Table 2.
  • [49] O. Gupta and R. Raskar (2018) Distributed learning of deep neural network over multiple agents. Journal of Network and Computer Applications 116, pp. 1–8. Cited by: §7.1.
  • [50] S. Han, H. Mao, and W. J. Dally (2015) Deep compression: compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv preprint arXiv:1510.00149. Cited by: §4.2.
  • [51] S. Han, J. Pool, J. Tran, and W. Dally (2015) Learning both weights and connections for efficient neural network. In Advances in neural information processing systems, pp. 1135–1143. Cited by: §4.2.
  • [52] Y. Han and X. Zhang (2019) Robust federated training via collaborative machine teaching using trusted instances. arXiv preprint arXiv:1905.02941. Cited by: §3.1, Table 2.
  • [53] A. Hard, K. Rao, R. Mathews, F. Beaufays, S. Augenstein, H. Eichner, C. Kiddon, and D. Ramage (2018) Federated learning for mobile keyboard prediction. arXiv preprint arXiv:1811.03604. Cited by: §1, §1, §7.2, §7.
  • [54] C. Hardy, E. L. Merrer, and B. Sericola (2018) Md-gan: multi-discriminator generative adversarial networks for distributed datasets. arXiv preprint arXiv:1811.03850. Cited by: §6.
  • [55] S. Hardy, W. Henecka, H. Ivey-Law, R. Nock, G. Patrini, G. Smith, and B. Thorne (2017) Private federated learning on vertically partitioned data via entity resolution and additively homomorphic encryption. arXiv preprint arXiv:1711.10677. Cited by: §5.2.1, §5.2.1, Table 2.
  • [56] C. He, C. Tan, H. Tang, S. Qiu, and J. Liu (2019) Central server free federated learning over single-sided trust social networks. arXiv preprint arXiv:1910.04956. Cited by: §4.4.
  • [57] P. Hill (1985) The rationale for learning communities and learning community models.. Cited by: §1.
  • [58] G. Hinton, O. Vinyals, and J. Dean (2015) Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531. Cited by: §4.2.
  • [59] L. Huang and D. Liu (2019) Patient clustering improves efficiency of federated machine learning to predict mortality and hospital stay time using distributed electronic medical records. arXiv preprint arXiv:1903.09296. Cited by: §7.1, §7.1.
  • [60] L. Huang, Y. Yin, Z. Fu, S. Zhang, H. Deng, and D. Liu (2018) LoAdaBoost: loss-based adaboost federated machine learning on medical data. arXiv preprint arXiv:1811.12629. Cited by: §6.1.3.
  • [61] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, and Y. Bengio (2016) Binarized neural networks. In Advances in neural information processing systems, pp. 4107–4115. Cited by: §4.2.
  • [62] S. Ickin, K. Vandikas, and M. Fiedler (2019) Privacy preserving qoe modeling using collaborative learning. arXiv preprint arXiv:1906.09248. Cited by: §4.3.
  • [63] Y. Jiang, J. Konečnỳ, K. Rush, and S. Kannan (2019) Improving federated learning personalization via model agnostic meta learning. arXiv preprint arXiv:1909.12488v1. Cited by: 4th item.
  • [64] M. Kamp, L. Adilova, J. Sicking, F. Hüger, P. Schlicht, T. Wirtz, and S. Wrobel (2018) Efficient decentralized deep learning by dynamic model averaging. In Joint European Conference on Machine Learning and Knowledge Discovery in Databases, pp. 393–409. Cited by: §4.3, Table 2.
  • [65] J. Kang, Z. Xiong, D. Niyato, H. Yu, Y. Liang, and D. I. Kim (2019) Incentive design for efficient federated learning in mobile networks: a contract theory approach. arXiv preprint arXiv:1905.07479. Cited by: §4.1, Table 2, 3rd item.
  • [66] K. Kellogg (1999) Learning communities. eric digest.. Cited by: §1.
  • [67] M. Khodak, M. Florina-Balcan, and A. Talwalkar (2019) Adaptive gradient-based meta-learning methods. arXiv preprint arXiv:1906.02717. Cited by: §3.2.
  • [68] H. Kim, J. Park, M. Bennis, and S. Kim (2019) Blockchained on-device federated learning. IEEE Communications Letters. Cited by: §4.1, Table 2.
  • [69] Y. Kim, J. Sun, H. Yu, and X. Jiang (2017) Federated tensor factorization for computational phenotyping. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 887–895. Cited by: §6, §7.1, §7.1.
  • [70] J. Konečnỳ, B. McMahan, and D. Ramage (2015) Federated optimization: distributed optimization beyond the datacenter. arXiv preprint arXiv:1511.03575. Cited by: §2, Table 2, §6.1.1.
  • [71] J. Konečnỳ, H. B. McMahan, D. Ramage, and P. Richtárik (2016) Federated optimization: distributed machine learning for on-device intelligence. arXiv preprint arXiv:1610.02527. Cited by: §1, §2, Table 2, §6.1.1, §6.
  • [72] J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and D. Bacon (2016) Federated learning: strategies for improving communication efficiency. arXiv preprint arXiv:1610.05492. Cited by: §1, §4.2, §4, Table 2.
  • [73] A. Koskela and A. Honkela (2019) Learning rate adaptation for federated and differentially private learning. arXiv preprint arXiv:1809.03832v3. Cited by: §3.1, Table 2.
  • [74] A. Lalitha, O. C. Kilinc, T. Javidi, and F. Koushanfar (2019) Peer-to-peer federated learning on graphs. rXiv preprint arXiv:1901.11173. Cited by: §4.4, §4.4, Table 2, §6.
  • [75] A. Lalitha, X. Wang, O. Kilinc, Y. Lu, T. Javidi, and F. Koushanfar (2019) Decentralized bayesian learning over graphs. pp. arXiv preprint arXiv:1905.10466. Cited by: §4.4, §4.4, Table 2, §6.
  • [76] J. Lee, J. Sun, F. Wang, S. Wang, C. Jun, and X. Jiang (2018) Privacy-preserving patient similarity learning in a federated environment: development and analysis. JMIR medical informatics 6 (2), pp. e20. Cited by: §7.1, §7.1.
  • [77] D. Leroy, A. Coucke, T. Lavril, T. Gisselbrecht, and J. Dureau (2019) Federated learning for keyword spotting. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 6341–6345. Cited by: §7.2.
  • [78] T. Li, A. K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, and V. Smith1 (2019) Federated optimization for heterogeneous networks. arXiv preprint arXiv:1812.06127. Cited by: Table 2, §6.1.3, Table 3, Assumption 3.
  • [79] T. Li, M. Sanjabi, and V. Smith (2019) Fair resource allocation in federated learning. arXiv preprint arXiv:1905.10497. Cited by: §3.1, Table 2, §6.1.3, Table 3.
  • [80] X. Li, K. Huang, W. Yang, S. Wang, and Z. Zhang (2019) On the convergence of fedavg on non-iid data. arXiv preprint arXiv:1907.02189. Cited by: Table 2, §6.1.1, Table 3.
  • [81] X. Lian, C. Zhang, H. Zhang, C. Hsieh, W. Zhang, and J. Liu (2017) Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent. In Advances in Neural Information Processing Systems, pp. 5330–5340. Cited by: §4.4.
  • [82] X. Lin, C. Zhao, and W. Pan (2017) Towards accurate binary convolutional neural network. In Advances in Neural Information Processing Systems, pp. 345–353. Cited by: §4.2.
  • [83] B. Liu, L. Wang, M. Liu, and C. Xu (2019) Lifelong federated reinforcement learning: a learning architecture for navigation in cloud robotic systems. arXiv preprint arXiv:1901.06455. Cited by: §4.3.
  • [84] D. Liu, D. Dligach, and T. Miller (2019) Two-stage federated phenotyping and patient representation learning. arXiv preprint arXiv:1908.05596. Cited by: §7.1, §7.1.
  • [85] Cited by: §3.2, §5.2.1, Table 2.
  • [86] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas (2017) Communication-efficient learning of deep networks from decentralized data. In Artificial Intelligence and Statistics, pp. 1273–1282. Cited by: §2, §3, §5.2.2, Table 2, §6.1.1, §6.1.1, §6.1.2, Table 3.
  • [87] H. B. McMahan, E. Moore, D. Ramage, S. Hampson, et al. (2016) Communication-efficient learning of deep networks from decentralized data. arXiv preprint arXiv:1602.05629. Cited by: §1, Table 2.
  • [88] H. B. McMahan, D. Ramage, K. Talwar, and L. Zhang (2017) Learning differentially private recurrent language models. arXiv preprint arXiv:1710.06963. Cited by: §1, §1, §5.2.2, §5.2.2, Table 2, §6.1.2, §7.2, §7.
  • [89] M. Mohri, G. Sivek, and A. T. Suresh (2019-09–15 Jun) Agnostic federated learning. In Proceedings of the 36th International Conference on Machine Learning, K. Chaudhuri and R. Salakhutdinov (Eds.), Proceedings of Machine Learning Research, Vol. 97, Long Beach, California, USA, pp. 4615–4625. Cited by: §3.1, §3.1, Table 2.
  • [90] R. Mukherjee and H. Jaffe (2005-July 7) System and method for dynamic context-sensitive federated search of multiple information repositories. Google Patents. Note: US Patent App. 10/743,196 Cited by: §1.
  • [91] M. Nasr, R. Shokri, and A. Houmansadr (2018) Comprehensive privacy analysis of deep learning: stand-alone and federated learning under passive and active white-box inference attacks. arXiv preprint arXiv:1812.00910. Cited by: §5.1.2, §5.1.2, Table 2.
  • [92] T. Nishio and R. Yonetani (2018) Client selection for federated learning with heterogeneous resources in mobile edge. arXiv preprint arXiv:1804.08333. Cited by: §3.1, §4.1, Table 2.
  • [93] OpenMined (2019) PySyft-tensorflow. Note: https://github.com/OpenMined/PySyft-TensorFlow Cited by: 1st item.
  • [94] M. Pathak, S. Rane, and B. Raj (2010) Multiparty differential privacy via aggregation of locally trained classifiers. In Advances in Neural Information Processing Systems, pp. 1876–1884. Cited by: §5.2.1, §5.2.1, Table 2.
  • [95] J. Qian, S. Sengupta, and L. K. Hansen (2019) Active learning solution on distributed edge computing. arXiv preprint arXiv:1906.10718. Cited by: §3.2.
  • [96] W. Raghupathi and V. Raghupathi (2014) Big data analytics in healthcare: promise and potential. Health information science and systems 2 (1), pp. 3. Cited by: §1.
  • [97] S. Ramaswamy, R. Mathews, K. Rao, and F. Beaufays (2019) Federated learning for emoji prediction in a mobile keyboard. arXiv preprint arXiv:1906.04329. Cited by: §1, §1, §7.2, §7.2, §7.
  • [98] D. Rehak, P. Dodds, and L. Lannom (2005) A model and infrastructure for federated learning content repositories. In Interoperability of Web-Based Educational Systems Workshop, Vol. 143. Cited by: §1.
  • [99] J. Ren, G. Yu, and G. Ding (2019) Accelerating dnn training in wireless federated edge learning system. arXiv preprint arXiv:1905.09712. Cited by: §4.3, Table 2.
  • [100] P. Richtárik and M. Takáč (2016) Distributed coordinate descent method for learning with big data. The Journal of Machine Learning Research 17 (1), pp. 2657–2681. Cited by: §1.
  • [101] A. G. Roy, S. Siddiqui, S. Pölsterl, N. Navab, and C. Wachinger (2019) BrainTorrent: a peer-to-peer environment for decentralized federated learning. arXiv preprint arXiv:1905.06731. Cited by: §4.4, §4.4, Table 2.
  • [102] B. I. Rubinstein, P. L. Bartlett, L. Huang, and N. Taft (2009) Learning in a large function space: privacy-preserving mechanisms for svm learning. arXiv preprint arXiv:0911.5708. Cited by: §5.2.2.
  • [103] T. Ryffel, A. Trask, M. Dahl, B. Wagner, J. Mancuso, D. Rueckert, and J. Passerat-Palmbach (2018) A generic framework for privacy preserving deep learning. arXiv preprint arXiv:1811.04017. Cited by: 1st item.
  • [104] S. Samarakoon, M. Bennis, W. Saad, and M. Debbah (2018) Federated learning for ultra-reliable low-latency v2v communications. In 2018 IEEE Global Communications Conference (GLOBECOM), pp. 1–7. Cited by: §1, §7.2, §7.
  • [105] F. Sattler, S. Wiedemann, K. Müller, and W. Samek (2019) Robust and communication-efficient federated learning from non-iid data. arXiv preprint arXiv:1903.02891. Cited by: §4.2, §4, Table 2.
  • [106] M. Shayan, C. Fung, C. J. Yoon, and I. Beschastnikh (2018) Biscotti: a ledger for private and secure peer-to-peer machine learning. arXiv preprint arXiv:1811.09904. Cited by: §4.4, §4.4, Table 2.
  • [107] R. Shokri and V. Shmatikov (2015) Privacy-preserving deep learning. In Proceedings of the 22nd ACM SIGSAC conference on computer and communications security, pp. 1310–1321. Cited by: §4.1, §5.2.2, §5.2.2, §5.2, Table 2.
  • [108] S. Silva, B. Gutman, E. Romero, P. M. Thompson, A. Altmann, and M. Lorenzi (2018) Federated learning in distributed medical databases: meta-analysis of large-scale subcortical brain data. arXiv preprint arXiv:1810.08553. Cited by: §7.1.
  • [109] K. C. Sim, P. Zadrazil, and F. Beaufays (2019)

    An investigation into on-device personalization of end-to-end automatic speech recognition models

    arXiv preprint arXiv:1909.06678. Cited by: 4th item.
  • [110] V. Smith, C. Chiang, M. Sanjabi, and A. S. Talwalkar (2017) Federated multi-task learning. In Advances in Neural Information Processing Systems, pp. 4424–4434. Cited by: §2, §3.1, §3.2, §4, Table 2, §6.1, Table 3.
  • [111] J. M. Such and N. Criado (2018) Multiparty privacy in social media.. Commun. ACM 61 (8), pp. 74–81. Cited by: §5.2.
  • [112] O. Thakkar, G. Andrew, and H. B. McMahan (2019) Differentially private learning with adaptive clipping. arXiv preprint arXiv:1905.03871. Cited by: §5.2.2, Table 2.
  • [113] K. Thomas, C. Grier, and D. M. Nicol (2010) Unfriendly: multi-party privacy risks in social networks. In International Symposium on Privacy Enhancing Technologies Symposium, pp. 236–252. Cited by: §5.2.
  • [114] S. Truex, N. Baracaldo, A. Anwar, T. Steinke, H. Ludwig, and R. Zhang (2018) A hybrid approach to privacy-preserving federated learning. arXiv preprint arXiv:1812.03224. Cited by: §5.2.2, Table 2, §5.
  • [115] G. Ulm, E. Gustavsson, and M. Jirstrand (2018) Functional federated learning in erlang (ffl-erl). In

    International Workshop on Functional and Constraint Logic Programming

    pp. 162–178. Cited by: 5th item.
  • [116] G. Ulm, E. Gustavsson, and M. Jirstrand (2019) OODIDA: on-board/off-board distributed data analytics for connected vehicles. arXiv preprint arXiv:1902.00319. Cited by: 5th item.
  • [117] H. Van Hasselt, A. Guez, and D. Silver (2016) Deep reinforcement learning with double q-learning. In Thirtieth AAAI conference on artificial intelligence, Cited by: §4.1, Table 2.
  • [118] P. Vepakomma, O. Gupta, T. Swedish, and R. Raskar (2018) Split learning for health: distributed deep learning without sharing raw patient data. arXiv preprint arXiv:1812.00564. Cited by: §7.1.
  • [119] P. Voigt and A. Von dem Bussche (2017) The eu general data protection regulation (gdpr). A Practical Guide, 1st Ed., Cham: Springer International Publishing. Cited by: §1.
  • [120] G. Wang (2019) Interpret federated learning with shapley values. arXiv preprint arXiv:1905.04519. Cited by: §4.1, Table 2.
  • [121] X. Wang, Y. Han, C. Wang, Q. Zhao, X. Chen, and M. Chen (2018) In-edge ai: intelligentizing mobile edge computing, caching and communication by federated learning. arXiv preprint arXiv:1809.07857. Cited by: §4.3.
  • [122] Z. Wang, M. Song, Z. Zhang, Y. Song, Q. Wang, and H. Qi (2019) Beyond inferring class representatives: user-level privacy leakage from federated learning. In IEEE INFOCOM 2019-IEEE Conference on Computer Communications, pp. 2512–2520. Cited by: §6.
  • [123] B. E. Woodworth, J. Wang, A. Smith, B. McMahan, and N. Srebro (2018) Graph oracle models, lower bounds, and gaps for parallel stochastic optimization. In Advances in Neural Information Processing Systems, pp. 8496–8506. Cited by: Table 2.
  • [124] C. Xie, S. Koyejo, and I. Gupta (2019) Asynchronous federated optimization. arXiv preprint arXiv::1903.03934. Cited by: Table 2, §6.1.3, Table 3.
  • [125] M. Yaghini, B. Kulynych, and C. Troncoso (2019) Disparate vulnerability: on the unfairness of privacy attacks against machine learning. arXiv preprint arXiv:1906.00389. Cited by: §5.1.2.
  • [126] K. Yang, T. Jiang, Y. Shi, and Z. Ding (2018) Federated learning via over-the-air computation. arXiv preprint arXiv:1812.11750. Cited by: §4.1, Table 2.
  • [127] Q. Yang, Y. Liu, T. Chen, and Y. Tong (2019-01) Federated machine learning: concept and applications. ACM Trans. Intell. Syst. Technol. 10 (2), pp. 12:1–12:19. External Links: ISSN 2157-6904, Link, Document Cited by: §1, Table 2, §5.
  • [128] T. Yang, G. Andrew, H. Eichner, H. Sun, W. Li, N. Kong, D. Ramage, and F. Beaufays (2018) Applied federated learning: improving google keyboard query suggestions. arXiv preprint arXiv:1812.02903. Cited by: §1, §1, §7.2, §7.2, §7.
  • [129] N. Yoshida, T. Nishio, M. Morikura, K. Yamamoto, and R. Yonetani (2019) Hybrid-fl: cooperative learning mechanism using non-iid data in wireless networks. arXiv preprint arXiv:1905.07210. Cited by: §3.1, Table 2.
  • [130] M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, T. N. Hoang, and Y. Khazaeni (2019) Bayesian nonparametric federated learning of neural networks. arXiv preprint arXiv:1905.12022. Cited by: §6.
  • [131] Y. Zhao, J. Zhao, L. Jiang, R. Tan, and D. Niyato (2019) Mobile edge computing, blockchain and reputation-based crowdsourcing iot federated learning: a secure, decentralized and privacy-preserving system. arXiv preprint arXiv:1906.10893. Cited by: §1, §7.2, §7.
  • [132] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, and V. Chandra (2018) Federated learning with non-iid data. arXiv preprint arXiv:1806.00582. Cited by: §3.1, §3.1, §3.1, §3, Table 2.
  • [133] H. Zhu and Y. Jin (2019) Multi-objective evolutionary federated learning. IEEE transactions on neural networks and learning systems. Cited by: §4.2, Table 2.
  • [134] H. H. Zhuo, W. Feng, Q. Xu, Q. Yang, and Y. Lin (2019) Federated reinforcement learning. rXiv preprint arXiv:1901.08277. Cited by: §4.3.

Appendix A Preliminaries

We recall some standard definitions and assumptions for stochastic optimization, with some specific assumptions adopted in individual federated learning studies.

Definition 1 ().

(-Lipschitz) A function is -Lipschitz if for any in its domain,

Remark 1 ().

If a function is -Lipschitz then its dual will be -bounded, i.e., for any such that , then

Definition 2 ().

(-smooth) A differentiable function is -smooth if for ,

Definition 3 ().

(Convex) A differentiable function is convex if for ,

Definition 4 ().

(-strongly convex) A differentiable function is -strongly convex with positive coefficient if for ,

Definition 5 ().

(-weakly convex) A differentiable function is -weakly convex if the function with is convex, where .

Remark 2 ().

Note that when is -weakly convex, then is convex if , and potentially non-convex if .

Assumption 1 ().

(Bounded Second Moment)

Assumption 2 ().

(Bounded Gradient Variance)

Assumption 3 ().

(Bounded Dissimilarity [78]). For some , where exists a such that for all the points , we have


where for .

Assumption 4 ().

In [45], central server cluster to obtain . are separated:


where and represent that is Lipschitz and -strongly convex.