Federated learning has been a hot research area in enabling the collaborative training of machine learning models among different organizations under the privacy restrictions. As researchers try to support more machine learning models with different privacy-preserving approaches, there is a requirement in developing systems and infrastructures to ease the development of various federated learning algorithms. Just like deep learning systems such as Caffe, PyTorch, and Tensorflow that boost the development of deep learning algorithms, federated learning systems are equivalently important, and face challenges from various issues such as unpractical system assumptions, scalability and efficiency. Inspired by federated systems in other fields such as databases and cloud computing, we investigate the existing characteristics of federated learning systems. We find that two important features for federated systems in other fields, i.e., heterogeneity and autonomy, are rarely considered in the existing federated learning systems. Moreover, we provide a thorough categorization for federated learning systems according to six different aspects, including data partition, machine learning model, privacy mechanism, communication architecture, scale of federation and motivation of federation. The categorization can help the design of federated learning systems as shown in our case studies. Lastly, we take a systematic comparison among the existing federated learning systems and present future research opportunities and directions.READ FULL TEXT VIEW PDF
Many machine learning algorithms are data hungry, and in reality, data are dispersed over different organizations under the protection of privacy restrictions. Due to these factors, federated learning (FL)  has become a hot research topic in machine learning and data mining. For example, data of different hospitals are isolated and become “data islands”. Since the size or the characteristic of data in each data island has limitations, a single hospital may not be able to train a high quality model that has a good predictive accuracy for a specific task. Ideally, hospitals can benefit more if they can collaboratively train a machine learning model with the union of their data. However, the data cannot simply be shared among the hospitals due to various policies and regulations. Such phenomena on “data islands” are commonly seen in many areas such as finance, government, and supply chains. Policies such as General Data Protection Regulation (GDPR)  stipulate rules on data sharing among different organizations. Thus, it is challenging to develop a federated learning system which has a good predictive accuracy while protecting data privacy.
Many efforts have been devoted to implementing federated learning algorithms to support effective machine learning models under the context of federated learning. Specifically, researchers try to support more machine learning models with different privacy-preserving approaches, including deep neutral networks [96, 172, 16, 123, 99]175, 35], logistics regression [110, 33] and support vector machine (SVM)
and support vector machine (SVM). For instance, Nikolaenko et al.  and Chen et al. 
proposed approaches to conduct FL based on linear regression. Hardyet al. 
implemented an FL framework to train a logistic regression model. Since GBDTs have become very successful in recent years[31, 159], the corresponding FLSs have also been proposed by Zhao et al.  and Cheng et al. 
. Moreover, there are also many neural network based FLSs. Google has proposed a scalable production system which enables tens of millions of devices to train a deep neural network. Yurochkin et al. 
developed a probabilistic FL framework for neural networks by applying Bayesian nonparametric machinery. Several methods try to combine FL with machine learning techniques such as multi-task learning and transfer learning. Smithet al.  combined FL with multi-task learning to allow multiple parties to complete separate tasks. To address the scenario where the label information only exists in one party, Yang et al.  adopted transfer learning to collaboratively learn a model. Among the studies on customizing machine learning algorithms under the federated context, we have identified a few commonly used methods to provide privacy guarantees. One common method is to use cryptographic techniques  such as secure multi-party computation  and homomorphic encryption. The other popular method is differential privacy , which adds noises to the model parameters to protect the individual record. For example, Google’s federated learning system  adopts both secure aggregation and differential privacy to enhance privacy guarantee. As there are common methods and building blocks for building federated learning algorithms, it is possible to develop systems and infrastructures to ease the development of various federated learning algorithms. Systems and infrastructures allow algorithm developers to reuse the common building blocks, and avoid building algorithms every time from scratch. Just like deep learning systems such as Caffe , PyTorch , and TensorFlow  that boost the development of deep learning algorithms, federated learning systems (FLSs) are equivalently important for the success of federated learning. However, building federated learning systems face challenges from various issues such as unpractical system assumptions, scalability and efficiency. In this paper, we take a survey on the existing FLSs with a focus on drawing the analogy and differences to traditional federated systems in other fields such as databases  and cloud computing . First, we consider heterogeneity and autonomy as two important characteristics of FLSs, which are often ignored in the existing designs in federated learning. Second, we categorize FLSs based on six different aspects: data distribution, machine learning model, privacy mechanism, communication architecture, scale of federation, and motivation of federation. These aspects can direct the design of a federated learning system. Furthermore, based on these aspects, we compare the existing FLSs and discover the key limitations of them. Last, to make FL more practical and powerful, we present future research directions to work on. We believe that systems and infrastructures are essential for the success of federated machine learning. More work has to be carried out to address the system research issues in security and privacy, efficiency and scalability. There have been several surveys and white papers on federated learning. A seminal survey written by Yang et al.  introduced the basics and concepts in federated learning, and further proposed a comprehensive secure federated learning framework. Later, WeBank  has published a white paper in introducing the background and related work in federated learning and most importantly presented a development roadmap including establishing local and global standards, building use cases and forming industrial data alliance. The survey and the white paper mainly target at a relatively small parties which are typically enterprise data owners. Lim et al.  conducted a survey of federated learning specific to mobile edge computing. Li et al.  summarized challenges and future directions of federated learning in massive networks of mobile and edge devices. In comparison with the previous surveys, the main contributions of this paper are as follows. (1) By analogy with the previous federated systems, we analyze two dimensions, including heterogeneity and autonomy, of federated learning systems. These dimensions can play an important role in the design of FLSs. (2) We provide a comprehensive taxonomy against federated learning systems on six different aspects, including data distribution, machine learning model, privacy mechanism, communication architecture, scale of federation, and motivation of federation, which can be used to direct the design of FLSs. (3) We summary the characteristics of existing FLSs and present the limitation of existing FLSs and vision for future generations of FLSs.
In this section, we review key conventional federated systems and present the federated learning systems.
The concept of federation can be found with its counterparts in the real world such as business and sports. The main characteristic of federation is cooperation. Federation not only commonly appears in society, but also plays an important role in computing. In computer science, federated computing systems have been an attractive area of research. Around 1990, there were many studies on federated database systems (FDBSs) . An FDBS is a collection of autonomous databases cooperating for mutual benefit. As pointed out in a previous study , three important components of an FDBS are autonomy, heterogeneity, and distribution. First, a database system (DBS) that participates in an FDBS is autonomous, which means it is under separate and independent control. The parties can still manage the data without the FDBS. Second, differences in hardware, system software, and communication systems are allowed in an FDBS. A powerful FDBS can run in heterogeneous hardware or software environments. Last, due to the existence of multiple DBSs before an FDBS is built, the data distribution may differ in different DBSs. An FDBS can benefit from the data distribution if designed properly. Generally, FDBSs focus on the management of distributed data. More recently, with the development of cloud computing, many studies have been done for federated cloud computing . A federated cloud (FC) is the deployment and management of multiple external and internal cloud computing services. The concept of cloud federation enables further reduction of costs due to partial outsourcing to more cost-efficient regions. Resource migration and resource redundancy are two basic features of federated clouds . First, resources may be transferred from one cloud provider to another. Migration enables the relocation of resources. Second, redundancy allows concurrent usage of similar service features in different domains. For example, the data can be broken down and processed at different providers following the same computation logic. Overall, the scheduling of different resources is a key factor in the design of a federated cloud system.
While machine learning, especially deep learning, has attracted many attentions again recently, the combination of federation and machine learning is emerging as a new and hot research topic. When it comes to federated learning, the goal is to conduct collaborative machine learning techniques among different organizations under the restrictions on user privacy. Here we give a formal definition of federated learning systems. We assume that there are different parties, and each party is denoted by , where . We use to denote the data of . For the non-federated setting, each party uses only its local data to train a machine learning model . The performance of is denoted as . For the federated setting, all the parties jointly train a model while each party protects its data according to its specific privacy restrictions. The performance of is denoted as . Then, for a valid federated learning system, there exists such that . Note that, in the above definition, we only require that there exists any party that can achieve a higher model utility from the FLS. Even though some parties may not get a better model from an FLS, they may make an agreement with the other parties to ask for the other kinds of benifits (e.g., money).
Figure 1 shows the frameworks of federated database systems, federated clouds, and federated learning systems. There are some similarities and differences between federated learning systems and conventional federated systems. On the one hand, the concept of federation still applies. The common and basic idea is about the cooperation of multiple independent parties. Therefore, the perspective of considering heterogeneity and autonomy among the parties can still be applied in FLSs. Furthermore, some factors in the design of distributed systems are still important for FLSs. For example, how the data are shared between the parties can influence the efficiency of the systems. On the other hand, these federated systems have differences. While FDBSs focus on the management of distributed data and FCs focus on the scheduling of the resources, FLSs care more about the secure computation among multiple parties. FLSs induce new challenges such as the algorithm designs of the distributed training and the data protection under the privacy restrictions. With these findings, we analyze the existing FLSs and figure out the potential future directions of FLSs in the following sections.
While existing FLSs take a lot of concerns on user privacy and machine learning models, two important characteristics of previous federated systems (i.e., heterogeneity and autonomy) are rarely addressed.
We consider heterogeneities between different parties in three aspects: data, privacy requirements and tasks.
parties always have different data distributions. For example, due to the ozone hole, the countries in the Southern Hemisphere may have more skin cancer patients than the Northern Hemisphere. Thus, hospitals in different countries tend to have very different distributions of patients records. The difference in data distributions may be a very important factor in the design of FLSs. The parties can potentially gain a lot from FL if they have various and partially representative distributions towards a specific task. Furthermore, if party Alice has fully representative data for task A and party Bob has fully representative data for task B, Alice and Bob can make a deal to conduct FLs for both tasks A and B to improve the performance for task B and task A, respectively. Besides data distributions, the size of data may also differ in different parties. FL should enable collaboration among parties with different scales. Furthermore, for fairness, the parties that provide more data should benefit more from FL.
Different parties always have different policies and regulation of data sharing restrictions. For example, the companies in the EU have to comply with the General Data Protection Regulation (GDPR) , while China recently issued a new regulation namely the Personal Information Security Specification (PISS). Furthermore, even in the same region, different parties still have different detailed privacy rules. The privacy restrictions play an important role in the design of FLSs. Generally, the parties are able to gain more from FL if the privacy restrictions are looser. Many studies assume that the parties have the same privacy level [33, 35]. The scenario where the parties have different privacy restrictions is more complicated and meaningful. It is very challenging to design an FLS which can maximize the utilization of data of each party while not violating their respective privacy restrictions.
The tasks of different parties may also vary. A bank may want to know whether a person can repay the loan but an insurance company may want to know whether the person will buy their products. The bank and the insurance company can also adopt FL although they want to perform different tasks. Multiple machine learning models may be learned during the FL process. Techniques like multi-task learning can also be adopted in FL .
The parties are often autonomous and under independent control. These parties are willing to share the information with the others only if they retain control. It is important to address the autonomy property when designing an FLS.
A party can decide whether to associate or disassociate itself from FL and can participate in one or more FLSs. Ideally, an FLS should be robust enough to tolerate the entry or departure of any party. Thus, the FLS should not fully depend on any single party. Since this goal is hard to achieve, in practice, the parties can make an agreement to regularize the entry or departure to ensure that the FLS works properly.
A party should have the ability to decide how much information to communicate with others. The party can also choose the size of data to participate in the FL. An FLS should have the ability to handle different communication scale during the learning process. As we have mentioned in Section 3.1.1, the benefit of the party should be relevant to its contribution. The party may gain more if it is willing to share more information, while the risk of exposing user privacy may also be higher.
Through analysis of many application scenarios in building FLSs, we can classify FLSs by six aspects: data distribution, machine learning model, privacy mechanism, communication architecture, scale of federation, and motivation of federation. These aspects include common factors (e.g., data distribution, communication architecture) in previous federated systems and unique consideration (e.g., machine learning model and privacy mechanism) for FLSs. Furthermore, these aspects can be used to direct the design of FLSs. Figure2 shows the summary of the taxonomy of FLSs.
Let us explain the six aspects with an intuitive example. The hospitals in different regions want to conduct federated learning to improve the performance of prediction task on lung cancer. Then, the six aspects have to be considered to design such a federated learning system. First, we should consider how the patient records are distributed among hospitals. While the hospitals may have different patients, they may also have different knowledge for a common patient. Thus, we have to utilize both the non-overlapped instances and features in federated learning. Second, we should figure out which machine learning model should be adopted for such a task. For example, we may adopt gradient boosting decision trees which show a good performance on many classification problems. Third, we have to decide what techniques to use for privacy protection. Since the patient records cannot be exposed to the public, differential privacy is an option to achieve the privacy guarantee. Fourth, the communication architecture also matters. We may need a centralized server to control the updates of the model. Moreover, the number of hospitals and the computation power in each hospital should also be studied. Unlike federated learning on mobile devices, we have a relatively small scale and well stability of federation in this scenario. Last, we should consider the incentive for each party. A clear and straightforward motivation for the hospitals is to increase the accuracy of lung cancer prediction. Then, it is important to achieve a highly accurate machine learning model by federated learning. Next, we discuss these aspects in details.
Based on how data are distributed over the sample and feature spaces, FLSs can be typically categorized to horizontal, vertical, and hybrid FLSs .
In horizontal FL, the datasets of different organizations have the same feature space but little intersection on the sample space. The system uses a server to aggregate the information from the devices and adopts differential privacy  and secure aggregation to enhance privacy guarantee. Wake-word recognition , such as ‘Hey Siri’ and ‘OK Google’, is a typical application of horizontal partition because each user speaks the same sentence with a different voice.
In vertical FL, the datasets of different organizations have the same sample space but differ in the feature space. Vaidya et al. has proposed multiple secure models on vertically partitioned data, including association rule mining  , k-means
, k-means147] and decision tree . For the vertical FLS, it usually adopts entity alignment techniques to collect the overlapped samples of the organizations. Then the overlapped data are used to train the machine learning model using encryption methods. Cheng et al.  proposed a lossless vertical FLS to enable parties to collaboratively train gradient boosting decision trees. They use privacy-preserving entity alignment to find common users among two parties, whose gradients are used to jointly train the decision trees. Cooperation among government agencies can be treated as a situation of vertical partition. Suppose the department of taxation requires the housing data of residents, which is stored in the department of housing, to formulate tax policies. Meanwhile, the department of housing also needs the tax information of residents, which is kept by the department of taxation, to adapt their housing policies. These two departments share the same sample space (i.e. all the residents) but each of them only has one part of features (e.g. housing data and tax data). In many other applications, while existing FLSs mostly focus on one kind of partition, the partition of data among the parties may be a hybrid of horizontal partition and vertical partition. Let us take cancer diagnosis system as an example. A group of hospitals wants to build a federated system for cancer diagnosis but each hospital has different patients as well as different kinds of medical examination results. Transfer learning  is a possible solution for such scenarios. Liu et al.  proposed a secure federated transfer learning system which can learn a representation among the features of parties using common instances.
While there are many kinds of machine learning models, here we consider three different kinds of models that current FLSs mainly support: linear models, decision trees, and neural networks. Linear models include some basic statistical methods such as linear regression and logistic regression . There are many well developed systems for linear regression and logistic regression [110, 63]
. These linear models are basically easy to learn compared with other complex models (e.g., neural networks). A tree based FLS is designed for the training for a single or multiple decision trees (i.e., gradient boosting decision trees and random forests). GBDTs are especially popular recently and it has a very good performance in many classification and regression tasks. Zhaoet al.  and Cheng et al. 
proposed FLSs for GBDTs on horizontally and vertically partitioned data, respectively. A neural network based system aims to train neural networks, which are an extremely hot topic in current machine learning area. There are many studies on federated stochastic gradient descent[99, 16], which can be used to learn the parameters of neural networks. Generally, for different machine learning models, the designs of the FLSs usually differ. It is still challenging to propose a practical tree based or neural network based FLS. Moreover, due to the fast developing of machine learning, there is a gap for FLSs to well support the state-of-the-art models.
Privacy has been shown to be an important issue in machine learning and there have been many attacks against machine learning models [152, 49, 24, 134, 107, 102]. Also, there are many privacy mechanisms nowadays such as differential privacy  and -anonymity , which provide different privacy guarantees. The characteristics of existing privacy mechanisms are summarized in the survey . Here we introduce three popular approaches that are adopted in current FLSs for data protection: model aggregation, cryptographic methods, and differential privacy. Model aggregation (or model averaging) [99, 101] is a widely used framework to avoid the communication of raw data in federated learning. Specifically, a global model is trained by aggregating the model parameters from local parties. A typical algorithm is federated averaging  based on stochastic gradient descent (SGD), which aggregates the locally-computed models and then update the global model in each round. PATE  combines multiple black-box local models to learn a global model, which predicts an output chosen by noisy voting among all of the local models. Yurochkin et al. 
developed a probabilistic FL framework by applying Bayesian nonparametric machinery. They use a Beta-Bernoulli process informed matching procedure to general a global model by matching the neurons in the local models. Smithet al.  combined FL with multi-task learning to allow multiple parties to locally learn models against different tasks. A challenge of model aggregation methods is to ensure the better utility of the global model than the local models. Cryptographic methods such as homomorphic encryption [8, 63, 19, 27, 60, 121, 122, 171, 173, 94], and secure multi-party computation [130, 29, 17, 43, 18, 85, 11, 52, 77, 153, 32, 54] are widely used in privacy-preserving machine learning algorithms. Basically, the parties have to encrypt their messages before sending, operate on the encrypted messages, and decrypt the encrypted output to get the result. Applying the above methods, the user privacy of federated learning systems can usually be well protected [75, 168, 76, 113, 169]. For example, secure multi-party computation  guarantees that all parties cannot learn anything except the output. However, such systems are usually not efficient and have a large computation and communication overhead. Differential privacy [45, 46] guarantees that one single record does not influence much on the output of a function. Many systems adopt differential privacy [28, 12, 3, 160, 175, 67, 87, 141] for data privacy protection, where the parties cannot know whether an individual record participates in the learning or not. By adding random noise to the data or the model parameters [3, 87, 136], differential privacy provides statistical privacy for individual records and protection against the inference attack on the model. Due to the noises in the learning process, such systems tend to produce less accurate models. Note that the above methods are independent of each other, and a federated learning system can adopt multiple methods to enhance the privacy guarantees . While most of the existing FLSs adopt cryptographic techniques or differential privacy to achieve well privacy guarantee, the limitations of these approaches seem hard to overcome currently. While trying to minimize the side effects brought by these methods, it may also be a good choice to look for novel approaches to protect data privacy and flexible privacy requirements. For example, Liu et al.  adopts a weaker security model , which can make the system more practical. Related to privacy level, the threat models also vary in FLSs. A common assumption is that all parties are honest-but-curious [163, 33, 41], meaning that they follow the protocol but try to find out as much as possible about the data of the other parties. In such scenario, inference attacks may be conducted to extract user information. For example, membership inference attack [134, 102, 107] is an interesting kind of attack, where the attacker can infer whether a given record was used as part of the training dataset given accesses to the machine learning model. Also, there may be malicious parties [64, 64] in federated learning. One threat model is that the parties may conduct poison and adversarial attacks [10, 13, 51] to backdoor federated learning. Another threat model is that the parties may suffer Byzantine faults [26, 15, 34, 139], where the parties behave arbitrarily badly against the system.
There are two major ways of communications in FLSs: centralized design and distributed design. In the centralized design, the data flow is often asymmetric, which means one server or a specific party is required to aggregate the information (e.g., gradients) from the other parties and send back training result . The parameter updates on the global model are always done in this server. The communication between the server and the local parties can be synchronous  or asynchronous [161, 137]. In a distributed design, the communications are performed among the parties  and every party is able to update the global parameters directly. Google Keyboard  is a case of centralized architecture. Google collects data from users’ android phones, collaboratively trains models using the collected data and returns the prediction result to users. How to reduce the communication cost is a vital problem in this kind of architectures. Some algorithms like deep gradient compression  has been proposed to solve this problem. While the centralized design is widely used in existing FLSs, the distributed design is preferred some aspects since concentrating information on one server may bring potential risks or unfairness. Recently, blockchain  is a popular distributed platform for consideration. It is still challenging to design a distributed system for FL while each party is treated nearly equally in terms of communication during the learning process and no trusted server is needed. The distributed cancer diagnosis system among hospitals is an example of distributed architecture. Each hospital shares the model trained with data from their patients and gets the global model for diagnosis. In distributed design, the major challenge is that it is hard to design a protocol that treats every member fairly.
The FLSs can be categorized into two typical types by the scale of federation: private FLS and public FLS. The differences between them lie on the number of parties and the amount of data stored in each party. In private FLS, there are usually a relatively small number of parties and each of them has a relatively large amount of data as well as computational power. For example, Amazon wants to recommend items for users by training the shopping data collected from hundreds of data centers around the world. Each data center possesses a huge amount of data as well as sufficient computational resources. One challenge that private FLS faces is how to efficiently distribute computation to data centers under the constraint of privacy models . In public FLS, on the contrary, there are a relatively large number of parties and each party has a relatively small amount of data as well as computational power . Google Keyboard  is a good example for public FLS. Google tries to improve the query suggestions of Google Keyboard with the help of federated learning. There are millions of Android devices and each device only has its users’ data. Meanwhile, due to the energy consumption concern, the devices cannot be asked to conduct complex training tasks. Under this occasion, the system should be powerful enough to manage huge number of parties and deal with possible issues such as the unstable connection between the device and the server.
In real-world applications of federated learning, individual parties need the motivation to get involved in the federated system. The motivation can be regulations or incentives. Federated learning inside a company or an organization is usually motivated by regulations. But in some sorts of cooperation, parties cannot be forced to provide their data by regulations. Taking Google Keyboard  as an example, Google cannot prevent users who do not provide data from using their app. But those who agree to upload input data may enjoy a higher accuracy of word prediction. This kind of incentives can encourage every user providing their data to improve the performance of the overall model. However, how to design such a reasonable protocol remains a challenge.
Incentive mechanism design can be very important for the success of a federated learning system. There have been some successful cases for incentive designs in blockchain [181, 48]. The parties inside the system can be collaborators at the same time competitors. Other incentive designs like [74, 73] are proposed to attract participants with high-quality data for federated learning. We expect different game theory models
are proposed to attract participants with high-quality data for federated learning. We expect different game theory models[128, 72, 106] and their equilibrium designs should be revisited under the federated learning systems. Even in the case of Google Keyboard, the users need to be motivated to participate this collaborative learning process.
In this section, we compare the existing studies on FLSs according to the aspects considered in Section 4.
To discover the existing studies, we search keyword “federated learning” in Google Scholar111https://scholar.google.com/ and arXiv222https://arxiv.org/. Here we only consider the published studies in computer science community. Since the scale of federation and the motivation of federation are problem dependent, we do not compare the studies by these two aspects. For ease of presentation, we use “NN”, “DT” and “LM” to denote neural networks, decision trees and linear models, respectively. Also, we use “CM”, “DP” and “MA” to denote cryptographic methods, differential privacy and model aggregation, respectively. Note that the algorithms (e.g., federated stochastic gradient descent) in some studies can be used to learn many machine learning models (e.g., logistic regression, neural networks). Thus, in the “model implementation” column, we present the models that are already implemented in the corresponding papers. Moreover, in the “main focus” column, we indicate the major area that the papers study on.
Table I shows a summary of comparison among existing published studies on federated learning, which mainly focus on individual algorithms. From Table I, we have the following findings. First, most of the existing studies consider a horizontal data partitioning. We suspect a part of the reason is that the experimental studies and benchmarking in horizontal data partitioning is relatively ready than vertical data partitioning. Also, in vertical data partitioning, how to align data sets with different features is problem dependent and thus can be very challenging. The scenarios with the vertical data partitioning need to be further investigated. Second, most approaches of the existing studies can only be applied to one kind of machine learning model. While a particularly designed algorithm for one specific model may achieve higher model utility, a general federated learning framework may be more practical and easy-to-use. Third, due to the property of stochastic gradient descent, the model aggregation method can be easily applied to SGD and is currently the most popular approaches to implement federated learning without directly exposing the user data. We look forward to more novel FL frameworks. Lastly, the centralized design is the mainstream of current implementations. A trusted server is needed in their assumptions. It is more challenging to propose a framework without the demand of a centralized server.
|Linear Regression FL ||
|Logistic Regression FL ||LM|
|Federated Transfer Learning ||NN|
|Tree-based FL ||DP||distributed|
|Federated GBDTs ||hashing|
|Ridge Regression FL ||LM||CM||centralized|
|Federated Collaborative Filtering |
|Federated SVRG |
|Byzantine Gradient Descent |
|Federated MTL |
|Variatinoal Federated MTL ||NN|
|Federated Averaging |
|Federated Meta-Learning |
|Bayesian Nonparametric FL |
|Federated Generative Privacy ||GAN|
|Secure Aggregation FL ||CM, MA|
|DSSGD ||DP, MA|
|Client-Level DP FL |
|Protection Against Reconstruction ||LM, NN|
|Agnostic FL ||MA|
|Fair FL ||MA|
|PATE ||LM, DT, NN||DP, MA|
|Hybrid FL ||LM, DT, NN||CM, DP, MA|
|Communication Efficient FL ||
|Multi-Objective Evolutionary FL |
|On-Device ML |
|Sparse Ternary Compression |
|FL for Keyboard Prediction |
|FL for Vehicular Communication ||GPD|
|Resource-Constrained MEC ||LM, NN|
|FLs Performance Evaluation ||
In the following, we review those studies with the categories on the main focus: algorithm design, efficiency improvement, application, and benchmark.
Sanil et al.  presented a secure regression model on vertical partitioned data. They focus on the linear regression model and secret sharing is applied to ensure privacy in their solution. Hardy et al.  presented a solution for two-party vertical federated logistic regression. They used entity resolution and additively homomorphic encryption. They also study the impact of entity resolution errors on learning. Liu et al.  introduced an FL framework combined with transfer learning for neural networks. They addressed a specific scenario where two parties have a part of common samples and all the label information are in one party. They used additively homomorphic encryption to encrypt the model parameters to protect data privacy. Cheng et al.  implemented a vertical tree-based FLS called SecureBoost. In their assumptions, only one party has the label information. They used the entity alignment technique to get the common data and then build the decision trees. Additively homomorphic encryption is used to protect the gradients. Liu et al.  proposed a federated extreme boosting learning framework for mobile crowdsensing. They adopted secret sharing to achieve privacy-preserving learning of GBDTs. Zhao et al.  proposed a horizontal tree-based FLS. Each decision tree is trained locally without the communications between parties. The trees trained in a party are sent to the next party to continuous train a number of trees. Differential privacy is used to protect the decision trees. Li et al.  exploited similarity information in the building of federated GBDTs by using locality-sensitive hashing . They utilize the data distribution of local parties by aggregating gradients of similar instances. Within a weaker privacy model compared with secure multi-party computation, their approach is effective and efficient. Nikolaenko et al.  proposed a system for privacy-preserving ridge regression. Their approaches combine both homomorphic encryption and Yao’s garbled circuit to achieve privacy requirements. An extra evaluator is needed to run the algorithm. Chen et al.  proposed a system for privacy-preserving ridge regression. Their approaches combine both secure summation and homomorphic encryption to achieve privacy requirements. They provided a complete communication and computation overhead comparison among their approach and the previous state-of-the-art approaches. Kim et al.  combined blockchain architecture with federated learning. On the basis of federated averaging, they used a blockchain network to exchange the devices’ local model updates. Ammad et al.  formulated the first federated collaborative filter method. Based on a stochastic gradient approach, the item-factor matrix is trained in a global server by aggregating the local updates. Konevcny et al. 
proposed federated SVRG algorithm, which is based on stochastic variance reduced gradient. They compared their algorithm with the other baselines like CoCoA+  and simple distributed gradient descent. Their method can achieve better accuracy with the same communication rounds for the logistic regression model. Smith et al.  combined federated learning with multi-task learning (MTL) [25, 174]. Their method considers the issues of high communication cost, stragglers, and fault tolerance for MTL in the federated environment. Corinzia et al. 
proposed a federaetd MTL method with non-convex models. They treated the central server and the local parties as a Bayesian network and the inference is performed using variational methods. Chenet al.  and Blanchard et al.  studied the scenario where the parties may be Byzantine and try to compromise the FLS. The former proposed an aggregation rule based on the geometric median of means of the gradients. The later proposed Krum, which selects the gradients vector closest to the barycenter among the proposed parameter vectors. McMahan et al.  implemented federated averaging on TensorFlow, focusing on improving communication efficiency. Methods they use on deep networks are effective on reducing communication costs compared to synchronized stochastic gradient descent method. Chen et al.  designed a federated meta-learning framework. Specifically, the users’ information are shared at the algorithm level, where a server updates the algorithm with feedback from the users. Liu et al. 
proposed a lifelong federated reinforcement learning framework. Adopting transfer learning techniques, a global model is trained to effectively remember what the robots have learned. Yurochkinet al.  developed a probabilistic FL framework by applying Bayesian nonparametric machinery. They used an Beta-Bernoulli process informed matching procedure to combine the local models into a federated global model. Triastcyn et al.  used generative adversarial networks (GAN)  to generate artificial data, which are then directly used to train machine learning models. Their privacy guarantee is weaker than differential privacy. Bonawitz et al.  applied secure aggregation to protect the local parameters on the basis of federated averaging. Shokri et al.  proposed a distributed selective SGD algorithm, where a fraction of local parameters are used to update the global parameters each round. Differential privacy is applicable to protect the uploaded parameters. Geyer et al.  applied differential privacy in federated averaging on a client level perspective. They used the Gaussian mechanism to distort the sum of updates of gradients to protect a whole client’s dataset instead of a single data point. McMahan et al. 
deployed federated averaging in the training of Long Short-Term Memory (LSTM) recurrent neural networks (RNNs). In addition, they used user-level differential privacy to protect the parameters. Bhowmicket al.  applied local differential privacy to protect the parameters in federated learning. To increase the model utility, they considered a practical threat model that wishes to decode individuals’ data but has little prior information on them. Withing this assumption, they could get a much larger privacy budget. Mohri et al. 
proposed a new framework named agnostic federated learning. Instead of minimizing the loss with respect to the uniform distribution, which is an average distribution among the data distributions from local clients, they tried to train a centralized model optimized for any possible target distribution formed by a mixture of the client distributions. Liet al.  proposed a new objective taking the fairness into consideration. Specifically, if the variance of the performance of the model on the devices is smaller, then the model is more fair. Based on their objective, they proposed an extension of federated SGD, which uses a dynamic step-size instead of a fixed step-size. Papernot et al.  designed a general framework for federated learning, which can be applied to any model. In a black-box setting, they considered the local trained models as teacher models. Then, they learned a student model by noisy voting among all of the teachers. Truex et al.  combined both secure multiparty computation and differential privacy for privacy-preserving federated learning. They used differential privacy to inject noises to the local updates. Then the noisy updates will be encrypted using the Paillier cryptosystem  before sent to the central server.
Konevcny et al.  proposed two ways, structured updates and sketched updates, to reduce the communication costs in federated averaging. Their methods can reduce the communication cost by two orders of magnitude with a slight degradation in convergence speed. Zhu and Jin 
designed a multi-objective evolutionary algorithm to minimize the communication costs and the global model test errors simultaneously. Considering the minimization of the communication cost and the maximization of the global learning accuracy as two objectives, they formulated federated learning as a bi-objective optimization problem and solve it by the multi-objective evolutionary algorithm. Jeonget al.  proposed a federated learning framework for devices with non-i.i.d. local data. They designed federated distillation, whose communication size depends on the output dimension but not on the model size. Also, they proposed a data augmentation scheme using a generative adversarial network (GAN) to make the training dataset become i.i.d.. Many other studies also design specialize approach for non-i.i.d. data. [176, 90, 95, 167] Sattler et al.  proposed a new compression framework named sparse ternary compression (STC). Specifically, STC compresses the communication using sparsification, ternarization, error accumulation and optimal Golomb encoding. Their method is robust to non-i.i.d. data and large numbers of parties. There are also other studies working on reducing communication cost of federated learning. Yao et al.  adopted a two-stream model with MMD (Maximum Mean Discrepancy) constraint instead of the single model to be trained on devices in standard federated learning settings to reduce the communication cost. Zhu et al.  proposed a multi-access Broadband Analog Aggregation (BAA) design for communication-latency reduction in federated learning based on the concept of over-the-hair computation . Later, Amiri et al.  added error accumulation and gradient sparsification with over-the- air computation to get a faster convergence speed. Caldas et al.  proposed lossy compression and federated dropout to reduce server-to-participant communication costs.
Nishio et al.  implemented federated averaging in practical mobile edge computing (MEC) frameworks. They used an operator of MEC framworks to manage the resources of heterogeneous clients. Federated learning is promising in edge computing [109, 170, 119, 44]. Wang et al.  adopted federated averaging to implement distributed deep reinforcement learning (DRL) in mobile edge computing system. The usage of DRL and FL can effectively optimize the mobile edge computing, caching, and communication. Hard et al.  applied federated learning in mobile keyboard prediction. They adopted the federated averaging method to learn a variant of LSTM. Ulm et al.  implemented federated learning in Erlang (FFL-ERL), which is a functional programming language. Based on federated averaging, they created a functional implementation of an artificial neural network in Erlang. Samarakoon et al.  first adopted federated learning in the context of ultra reliable low latency communication. To model reliability in terms of probabilistic queue lengths, they used model averaging to learn a generalized Pareto distribution (GPD). Wang et al.  performed federated learning on resource-constrained MEC systems. They address the problem of how to efficiently utilize the limited computation and communication resources at the edge. Using federated averaging, they implemented many machine learning algorithms including linear regression, SVM, and CNN. While edge computing is an appropriate scenario to apply federated learning, federated learning has also been applied in many other areas. Brisimi et al.  developed models to predict hospitalizations for cardiac events using a federated distributed algorithm. They developed a general decentralized optimization framework enabling multiple data holders to collaborate and converge to a common predictive model, without explicitly exchanging raw data. Other applications about health AI are shown in . Verma et al.  help to foster collaboration across multiple Government agencies with federated learning algorithms.
Nilsson et al.  conducted performance comparison among three different federated learning algorithms, including federated averaging , federated stochastic variance reduced gradient , and CO-OP 
. They executed experiments using a multi-layer perceptron on the MNIST dataset with both i.i.d. and non-i.i.d. partitions of the data. Their experiments showed that federated averaging can achieve better performance on MNIST than the other two algorithms. Caldaset al.  proposed a LEAF, a modular benchmark for federated learning. LEAF includes public federated datasets, an array of statistical and systems metrics, and a set of reference implementations.
|Google TensorFlow Federated (TFF) [16, 140]||horizontal||LM, NN||CM, DP, MA||centralized|
|Federated AI Technology Enabler (FATE)444https://github.com/FederatedAI/FATE||hybrid||LM, DT, NN||CM||distributed|
shows a summary of comparison among some representative FLSs. Here we only consider open source systems that supports data protection. From TableII, we have the following findings. First, most of the current open-sourced systems only implemented one kind of partition methods and one or two kinds of machine learning models. We think that many systems are still in an early stage, and we expect more development efforts will be put in from the community. A general and complete FLS which can support multiple kinds of data partitioning or machine learning models is still on the way. Second, despite the costly processing of cryptographic methods, they seem to be the most popular technique to be used to provide privacy guarantees. However, there is no final conclusion as to which approach is better in terms of system performance and model utility. It should depend on the privacy restrictions. Last, many systems still adopt a centralized communication design since they need a server to aggregate model parameters. From the system perspective, this introduce single point of failure/control, and we believe that a more advanced mechanism to avoid this centralized design is needed. Now we give more details on these FLSs. Google proposed a scalable FLS which enables over tens of millions of Android devices learning a deep neural network based on TensorFlow . In their design, they use a server to aggregate the model updates with federated averaging , which are computed by the devices locally in synchronous rounds. Differential privacy and secure aggregation are used to enhance privacy guarantees. The OpenMined community introduced an FL system named PySyft  built on PyTorch, which applies both differential privacy and multi-party computation (MPC) to provide privacy guarantees. They adopts SPDZ 
and moment accountant methods respectively for MPC and DP in a federated learning context. Corbacho implemented PhotoLabeller, which gives a practical use case of FLS. It uses Android phones to train models locally, and uses federated averaging on the server to aggregate the model. Finally the trained model is shared across every client for photo labeling. WeBankFinTech company implemented FL platform Federated AI Technology Enabler (FATE), which supports multiple kinds of data partitioning and algorithms. The secure computation protocols are based on homomorphic encryption and multi-party computation. It has supported many machine learning algorithms including logistic regression, gradient boosting decision trees, etc.
There are many interesting applications for federated learning systems. We review two case studies to examine the system aspects surveyed above.
Here we analyze the keyboard word suggestion application and identify which type of FLS is suitable for such applications. Keyboard word suggestion aims to predict which word users input next according to their previous input words . Keyboard word suggestion models can be trained in the federated learning manner, such that more user input data can be exploited to
improve the model quality, while obeying the user data regulations such as GDPR [150, 38] in Europe, PDPA  in Singapore and CCPA  in the US. For training such word suggestion models, the training data (i.e., users’ input records) is “partitioned” horizontally, where each user has her input data on her own device (e.g., mobile phone). Furthermore, many of the word suggestions models are based on neural networks  , so the desired FLS should support neural networks. Privacy protection mechanisms need to be enforced in the word suggestion model training as well, because the user data or the trained model is synchronized with those of other users. In keyboard word suggestion applications, the keyboard service providers cannot prevent users from using their services for users’ refusal of sharing data. So, incentive mechanisms need to be adopted in the FLS, such that more users are willing to participate to improve the word suggestion model. Finally, the training of the word suggestion model is performed in a public FLS, since the user data is distributed globally. Next, we take a concrete example, Google Keyboard (GBoard), of word suggestion applications to identify which FLS is well suited to. According to the analysis above and existing federated learning systems shown in Table II, Google TensorFlow Federated (TFF) may be the underlying FLS for GBoard. The communication architecture of the federated learning for GBoard is centralized, since Google cloud collects data from Android phones all around the world and then provides prediction service to users. TFF can handle horizontally partitioned data and can support neural networks for training models. Meanwhile, TFF provides differential privacy to ensure privacy guarantee. TFF is centralized and can sustain millions of users in the public FL setting. Finally, Google can encourage users to share their data by granting higher predictive accuracy to them as incentives. In summary, based on our analysis, TFF tends to be the underlying FL model for GBoard.
|No specific Models|
In this case study, we discuss the application of FL in building health care systems. The vision of a nationwide learning health system has been introduced since 2010 . This system aims to exploit data from research institutes, hospitals, federal agencies and many other parties to improve health care of the nation. In such scenario, the health care data is partitioned both horizontally and vertically: each party contains health care data of residents for a specific purpose (e.g., patient treatment), but the features used in each party are different. Moreover, the health care data is strictly protected by laws. In health care systems, besides neural networks, a wide range of machine learning algorithms are commonly used . The learning process should be distributed, because each party shares data, trains the model on the data cooperatively, and gets the final learned model. Due to the strict privacy protection and the nature of health care systems, it is a private federated learning system where a few parties possess a huge amount of data, while the others may hold only a small amount. Finally, once the system is established, the data sharing can be guaranteed by policies of the national government (i.e., nationwide learning health systems can be governmental policy motivated). Based on the analysis above, Federated AI Technology Enabler (FATE) is the potential choice for this application. It can support both horizontal and vertical data partitioning. FATE provides multiple machine learning algorithms such as decision trees and linear models. Meanwhile, FATE has cryptographic techniques to guarantee the protection of user data, and supports distributed communication architecture as well. These properties of FATE well match the requirements of this application.
In this section, we show interesting directions to work on in the future, and conclude this paper.
When designing an FLS, many existing studies have tried to support more machine learning models and come up with new efficient approaches to protect privacy while not sacrificing the accuracy of the learned model much.
As we discussed in Section 3.2, the number of parties may not be fixed during the learning process. However, the number of parties is fixed in many existing systems and they do not consider the situations where there are entries of new parties or departures of the current parties. The system should support dynamic scheduling and have the ability to adjust its strategy when there is a change in the number of parties. There are some studies addressing this issue. For example, Google TensorFlow Federated  can tolerate the drop-outs of the devices. Also, the emergence of blockchain  can be an ideal and transparent platform for multi-party learning. More efforts need to be done in this direction.
Little work has considered the privacy heterogeneity of FLSs, as shown in Section 3.1.2. The existing systems adopt techniques to protect the model parameters or gradients for all the parties on the same level. However, the privacy restrictions of the parties usually differ in reality. It would be interesting to design an FLS which treats the parties differently according to their privacy restrictions. The learned model should have a better performance if we can maximize the utilization of data of each party while not violating their privacy restrictions. The heterogeneous differential privacy  may be useful in such settings.
Intuitively, one party can gain more from the FLS if it contributes more information. A simple solution is to make agreements among the parties such that some parties pay for the other parties which contribute more information. Representative incentive mechanisms need to be developed.
As more FLSs are being developed, a benchmark with representative data sets and workloads is quite important to evaluate the existing systems and direct future development. Caldas et al.  proposed LEAF, which is a benchmark including federated datasets, an evaluation framework, and reference implementations. Hao et al.  presented a computing testbed named Edge AIBench with federated learning support, and discussed four typical scenarios and six components for measurement included in the benchmark suite. Still, more applications and scenarios are the key to the success of FLSs.
Like the parameter server in deep learning which controls the parameter synchronization, some common system architectures are needed to be investigated for FL. Although Yang et al.  presented three architectures for different partition methods of data, we need more complete architectures in terms of learning models or privacy levels. Communication costs can be a significant issue in the performance of training a federated learning model [154, 65].
Learning is simply one aspects of a federated system. A data life cycle consists of multiple stages including data creation, storage, use, share, archive and destroy. For the data security and privacy of the entire application, we need to invent new data life cycles under federated learning context. Although data sharing is clearly one of the focused stage, the design of federated learning system also affects other stages. For example, data creation may help to prepare the data and features that are suitable for federated learning.
Most existing studies have focused on labelled data sets. However, in practice, training data sets may not have labels, or have poisoned and mistaken labels, which can lead to runtime mispredictions. The poisoned and mislabels can come from unreliable data collection process such as in mobile and edge environments, and malicious parties. There are still many challenges to address those issues of data poisoning and backdoor attacks. Along this line, CalTrain  presents a multi-party collaborative learning system to fulfill modle accountability in Trusted Execution Environment (TEE) environments. Ghosh et al.  considers the model robustness upon Byzantine parties (or abnormal and adversarial parties). Another potential approach can be blockchain [118, 78]. Preuveneers et al.  proposed a permissioned blockchain-based federated learning method to monitor the incremental updates to an anomaly detection machine learning model.
proposed a permissioned blockchain-based federated learning method to monitor the incremental updates to an anomaly detection machine learning model.
Internet-of-thing: Security and privacy issues have been a hot research area in fog computing and edge computing, due to the increasing deployment of Internet-of-thing applications. For more details, readers can refer to some recent surveys [138, 166, 105]. Federated learning can be one potential approach in addressing the data privacy issues, while still offering reasonable good machine learning models [91, 108]. The additional key challenges come from the computation and energy constraints. The mechanisms of privacy and security introduces runtime overhead. For example, Jiang et al.  applies independent Gaussian random projection to improve the data privacy, and then the training of a deep network can be too costly. The author needs to develop new resource scheduling algorithm to moves the workload to the nodes with more computation power. Similar issues happen on other environments such as vehicle-to-vehicle networks [124, 127].
Many efforts have been devoted to developing federated learning systems (FLSs). A complete overview and summary for existing FLSs is important and meaningful. Inspired by the previous federated systems, we have shown that heterogeneity and autonomy are two important factors in the design of practical FLSs. Moreover, with six different aspects, we provide a comprehensive categorization for FLSs. Based on these aspects, we also present the comparison on features and designs among existing FLSs. More importantly, we have pointed out a number of opportunities, ranging from more benchmarks to integration of emerging platforms such as blockchain. Federated learning systems will be an exciting research journey, which call for the effort from machine learning, system and data privacy communities.
This work is supported by a MoE AcRF Tier 1 grant (T1 251RES1824), an SenseTime Young Scholars Research Fund, and a MOE Tier 2 grant (MOE2017-T2-1-122) in Singapore.
International Workshop on Functional and Constraint Logic Programming, pp. 162–178. Cited by: §5.2.3, TABLE I.