Challenges and approaches for mitigating byzantine attacks in federated learning

by   Shengshan Hu, et al.
Deakin University

Recently emerged federated learning (FL) is an attractive distributed learning framework in which numerous wireless end-user devices can train a global model with the data remained autochthonous. Compared with the traditional machine learning framework that collects user data for centralized storage, which brings huge communication burden and concerns about data privacy, this approach can not only save the network bandwidth but also protect the data privacy. Despite the promising prospect, byzantine attack, an intractable threat in conventional distributed network, is discovered to be rather efficacious against FL as well. In this paper, we conduct a comprehensive investigation of the state-of-the-art strategies for defending against byzantine attacks in FL. We first provide a taxonomy for the existing defense solutions according to the techniques they used, followed by an across-the-board comparison and discussion. Then we propose a new byzantine attack method called weight attack to defeat those defense schemes, and conduct experiments to demonstrate its threat. The results show that existing defense solutions, although abundant, are still far from fully protecting FL. Finally, we indicate possible countermeasures for weight attack, and highlight several challenges and future research directions for mitigating byzantine attacks in FL.


Defense Strategies Toward Model Poisoning Attacks in Federated Learning: A Survey

Advances in distributed machine learning can empower future communicatio...

Privacy and Robustness in Federated Learning: Attacks and Defenses

As data are increasingly being stored in different silos and societies b...

FedCom: A Byzantine-Robust Local Model Aggregation Rule Using Data Commitment for Federated Learning

Federated learning (FL) is a promising privacy-preserving distributed ma...

Byzantine-Resilient Federated Machine Learning via Over-the-Air Computation

Federated learning (FL) is recognized as a key enabling technology to pr...

Vertical Federated Learning: Challenges, Methodologies and Experiments

Recently, federated learning (FL) has emerged as a promising distributed...

Cybersecurity Threats in Connected and Automated Vehicles based Federated Learning Systems

Federated learning (FL) is a machine learning technique that aims at tra...

Towards Understanding Quality Challenges of the Federated Learning: A First Look from the Lens of Robustness

Federated learning (FL) is a widely adopted distributed learning paradig...

I Introduction

Ubiquitous intelligent devices equipped with advanced sensors (e.g., smartwatches, environmental monitoring devices) have brought us into the Internet of Things (IoTs) era, which connects the dispersive world into an interconnected system of intelligent networks. To make use of the data generated by these distributed devices, machine learning as a service (MLaaS) is becoming popular to assist users in refining their businesses. However, MLaaS usually needs to collect data from those devices and perform data analysis jobs in a centralized manner, which inevitably incurs two severe problems: high communication cost and privacy leakage. In IoTs, data is explosively generated every day, uploading all the raw data to a central server will bring a high burden to the bandwidth, especially in the wireless communication network. Besides, end-user devices usually contain a large amount of private information, such as location, identity, personal profiles, etc. Directly uploading local data to the server will raise great concerns on user privacy.

To address these issues, recently emerged federated learning (FL) is a new computing paradigm that allows users to collaboratively compute a global machine learning model without revealing their local data. By distributing the model learning process to the end users (e.g., intelligent devices), FL constructs a global model from user-specific local models, ensuring that the users’ private data never leaves the devices. In this way, the bandwidth cost is significantly reduced and user privacy is well protected.

Despite the promising prospect, recent studies show that FL is highly susceptible to byzantine attacks, where malicious users can falsify real models or gradients to damage the learning process, or directly poison the training data to make the global model learn the wrong information. Blanchard et al. [1] have shown that just one baleful user can compromise the convergence of the training and damage the performance of the ultimate global model. To address this issue, a mounting number of defense strategies against byzantine attacks have been proposed to further safeguard FL [11, 6, 14, 5, 9]. Although these research efforts have demonstrated their preliminary success in defeating byzantine attacks, we emphasize that it is still far from practice to provide a full protection for FL. Protecting FL from byzantine attacks that simultaneously considers the issues including efficiency, privacy, data distribution is an extremely challenging problem in the literature, especially when there still exist unknown attack surface in its standard process. Recently Mothukuri et al. [8] presented a comprehensive survey on the security and privacy of federated learning. However, it neither conducted experiments to evaluate existing schemes simultaneously, which is important to give a fair comparison, nor proposed any novel ideas to support its findings on the further work.

In order to clearly demonstrate the vulnerability of existing byzantine-robust FL schemes, this article first conducts a concise overview where an in-depth taxonomy is provided for the state-of-the-art defense strategies. We divide the existing defense solutions into four categories according to the principals they relied on for anomaly detection,

i.e., the distance based solution, the performance based solution, the statistics based solution, and the target optimization based solution. Then a comprehensive comparison is provided in terms of their advantages and disadvantages. After reviewing the literature, we propose a new kind of byzantine attack called weight attack to defeat those defense schemes. By making use of the flaw in the existing weight assignment strategy, our attack is much easier to put into practice while enjoying a high attack success rate. We further conduct experiments to validate the threat of our weight attack. Finally, we discuss the possible countermeasures, and highlight several stubborn challenges and future research directions for hardening the security of FL.

In summary, we make the following contributions:

  • We provide a systematic review and comparison for state-of-the-art byzantine-resilient federated learning schemes.

  • We propose a new kind of byzantine attack to show the feasibility of disabling existing defense methods, followed by experimental validations.

  • We give an in-depth discussion for future work on enhancing the security of federated learning when facing byzantine attacks.

Fig. 1: Architecture of federated learning

Ii Preliminaries

Ii-a Federated Learning

In the conventional collaborative deep learning training framework, a powerful central server is usually required to gather users’ training data. After receiving the data from users, the central server iteratively trains a deep neural network (DNN) model until it converges. In the end, users can download the DNN model and enjoy intelligent services. However, such training framework can easily lead to the leakage of user privacy, as users’ private data is handed over to a third party.

To address this issue, recently proposed federated learning (FL) is a distributed and privacy-protected architecture in which users collaboratively train and maintain a shared model under the architecture of a central server. Fig. 1 depicts an overview of the standard FL architecture. In each iteration, the server first broadcasts a global model to a set of randomly chosen distributed users, each of which will then re-train a local model using their own data. After completing the local training, the users send the update (i.e., the local model) to the server for aggregation and generating a new global model. The iteration repeats until the global model converges.

However, FL still faces many technical challenges, such as vulnerability to Byzantine attack, high communication overhead, dependence on the assumption of iid (i.e., independently and identically distributed data). In this paper, we mainly focus on defense schemes against Byzantine attacks.

Ii-B Byzantine Attack

Recent works show that standard federated learning is vulnerable to byzantine attacks carried out by faulty or malicious clients. Even if there is only one attacker, the model accuracy can drop from 100 to 0. For example, in an extreme case where an attacker knows the local updates of all benign clients, it only needs to set its update to the opposite of the linear combination of other normal updates to offset the effect of benign clients, then the accuracy of the aggregated global model can be reduced to 0

with a high possibility. We classify malicious attacks into two types based on which step in FL the malicious clients aim to breach:

  1. Training data based attack: This kind of attack is also known as the data poisoning attack, which aims to mislead the global model by manipulating the local training data. In general, there are three main approaches for this attack:

    • Label flipping: The attacker “flips” the labels of its training data to arbitrary ones (e.g., via a permutation function).

    • Adding noise: An attacker contaminates the dataset by adding noises to degrade the quality of models.

    • Backdoor trigger: An attacker injects a trigger into a small area of the original dataset to cause the classifier misclassifying into the target category.

  2. Parameter based attack: This attack method involves altering local parameters (i.e., gradient or model) so that the central server aggregates a corrupted global model. There are two ways to modify the parameters:

    • Modifying the direction and size of the parameter learned from the local dataset, e.g., flipping the signs of local iterates and gradients, or enlarging the magnitudes.

    • Modifying the parameter directly, e.g.

      , randomly sampling a number from the Gaussian distribution and treating it as one of the parameters of the local model.

Iii Existing defense schemes

Depending on the principles that the server relied on for detecting or evading anomalous updates, the existing defense schemes can be divided into four categories: the distance based defense schemes, the performance based defense schemes, the statistics based defense schemes and the target optimization based defense schemes.

Iii-a The Distance based Defense Schemes

This kind of defense schemes aims to detect and discard bad or malicious updates by comparing the distances between the updates. The update which is apparently far away from the others is regarded as malicious. These schemes are usually easy to implement .

Blanchard et al. [1] proposed Krum and its variant, called Multi-Krum. In Krum, the central server chooses only one update that is closest to its neighbors to update the global model, and discards all the other updates, whereas Multi-Krum chooses multiple updates and computes the mean to update the global model. Similar to Multi-Krum, FABA [11]

aims to remove the outliers in the uploaded gradients by discarding the gradients that are far away from the mean gradient. However, both Multi-Krum and FABA need to know the number of malicious clients in advance, which makes them difficult to be applied to practical applications. To get rid of this limitation, FoolsGold


uses cosine similarity to identify malicious updates and assigns them a low weight to reduce their impact on the global model. In their viewpoint, the updates from attackers have nearly the same direction, thus the cosine similarities between abnormal updates should be extraordinarily large. Based on this observation, the central server can find an abnormal update and assigns them low weights. Cao

et al. [2] proposed Sniper, which utilizes Euclidean distances between local models to construct a graph, based on which a set of updates will be selected for aggregation.

Iii-B The Performance based Defense Schemes

In this category, each update will be evaluated over a clean dataset provided by the server, such that any update that performs poorly will be assigned low weights or removed directly.

Li et al. [6]

leveraged a pre-trained autoencoder to evaluate the performance. For a benign model update, the autoencoder will output a vector that is similar to the input, but an abnormal update will generate a large gap. However, training an autoencoder is time-consuming, and it is difficult to get the training set that includes sufficient benign model updates. In contrast, Zeno


only requires a small validation set on the server-side. Specifically, Zeno computes a score for each candidate gradient with the validation set. The score is composed of two parts: the estimated descendant of the loss function, and the magnitude of the update. A higher score of the update implies better performance, indicating a higher probability of being reliable. Nevertheless, Zeno requires the knowledge about the number of attackers. To address this problem, Cao

et al. [3] proposed a byzantine-robust distributed gradient algorithm, which can filter out information received from the compromised clients by computing noisy gradient with a small clean dataset. The update whose distance with the noisy gradient satisfies a pre-defined condition will be accepted.

Iii-C The Statistics based Defense Schemes

Schemes in this category exploit the statistical characteristics of uploaded updates, such as the median or mean, to circumvent abnormal updates to get a robust one.

Yin et al. [15] proposed two robust distributed gradient descent algorithms by computing coordinate-wise median and the coordinate-wise trimmed mean of all local updates in each dimension, respectively. Meanwhile, Xie et al. [13] proposed three aggregation rules: geometric median, marginal median, and “mean around median”. The geometric median intends to find a new update that minimizes the summation of the distances between the update and each local update. The marginal median is similar to the coordinate-wise median proposed in [15]. The “mean around median” takes the average of the values near to the median for each dimension of local update to obtain a new global update. However, the scheme in [13] needs to call a secure average oracle many times, which incurs expensive computational overhead. In light of this, Pillutla et al. [10] proposed RFA (Robust Aggregation for Federated Learning) by computing the geometric median with a alternating minimization approach, which calls the secure average oracle for only three times.

Bulyan [7], which is modified based on [15], executes a robust detection algorithm, such as Multi-Krum, before the aggregation with trimmed mean. The experimental results show that Bulyan performs better than using Krum alone. Mu˜noz-Gonzàlez et al. [9] proposed AFA (Adaptive Federated Averaging), which separately computes the cosine similarity between each local model and the global model, and discards bad models based on the statistical distribution of the median and the average of these cosine similarities. Xie et al. [12] also used trimmed mean as the aggregation rule, and they further proposed a moving-average method, which considers global models in two successive rounds.

Iii-D The Target Optimization based Defense Schemes

The target optimization based defense schemes refer to optimizing a different objective function to improve the robustness of the global model.

Li et al. [5] proposed RSA (Byzantine-Robust Stochastic Aggregation), which adds a regularization term to the objective loss function, such that each regular local model is forced to be close to the global model. As far as we know, this is the only work in this category so far.

Solution Category Target attack The number of attackers Model accuracy Data distribution Time complexity
Multi-Krum [1] Distance based Data/parameter based Less than 50 Medium IID
FABA [11] Distance based Data/parameter based Less than 50 Medium IID
Sniper [2] Distance based Data based Less than 30 Medium IID
FoolsGold [4] Distance based Data/parameter based No limitation Medium (FoolsGold) High (FoolsGold+Multi-Krum) IID/Non-IID
Li et al.[6] Performance based Data/parameter based No greater than 50 High IID/Non-IID
Zeno [14] Performance based Data/parameter based At least one honest user High IID/Non-IID
Cao et al.[3] Performance based Data/parameter based No limitation High IID
AFA [9] Statistic based Data/parameter based Less than 50 High IID
GeoMed [13]
MarMed [15, 13]
Trimmed mean [15, 13]
Statistic based Data/parameter based Less than 50 High IID/Non-IID
Bulyan [7] Statistic based Data/parameter based Less than 50 Medium IID
SLSGD [12] Statistic based Data/parameter based Less than 50 Medium IID/Non-IID
RFA [10] Statistic based Data/parameter based Less than 50 Medium IID
RSA [5] Target optimization based Data based No limitation High IID/Non-IID
TABLE I: A comprehensive comparison of byzantine-robust FL method. The model accuracy represents the prediction accuracy of the defense scheme, and “Medium” and “high” indicate that the accuracy is below and close to that of non-attacker case, respectively. “IID” means that the users’ local data sets are independently and identically distributed, while “Non-IID” indicates otherwise. denotes the number of users and denotes the model size.

Iv A comprehensive comparison

Based on the above discussion, we know that the distance based defense schemes usually rely on the assumption that the parameter distribution of a malicious attacker is scattered and deviates from the benign ones, so it is only suitable to resist the attacks that generate evident abnormal parameters. For example, the label flipping attack can easily cause significant changes in parameters. However, it is obvious that such defense schemes perform poorly when the attack causes faint changes.

The performance based defense schemes detect anomalous updates by directly verifying their performance, which is much more reliable than other solutions. For example, Zeno [14] is superior to Krum under both the data based attack (e.g., sign flipping) and the parameter based attack (e.g., random gradient). However, it relies on a clean auxiliary data set for examination, which hampers its practicability. Besides, Zeno [14] has a high time complexity since the time-consuming pre-training is required when using the auto-encoder.

The statistics based defense schemes rely on computing median or mean to evade abnormal parameters, making it only suitable to the situation where the number of malicious users is less than half of the total users, otherwise legitimate updates will be left out when malicious updates dominate. In comparison, the target optimization based defense scheme enjoys a high efficiency. For instance, according to the experimental results in [5]

conducted on the MNIST dataset, the time cost of RSA is around 45s, while Median, Krum, and GeoMed cost about 50s, 62s, and 127s, respectively.

TABLE I presents a comprehensive comparison among existing solutions. We can see that various defense strategies have already been proposed, each of which has its own merits and demerits, and many tough issues have been discussed in detail, such as non-iid condition, more than 50% attackers, etc. Nevertheless, we emphasize that it is still far from practice to deploy a secure framework for FL in the presence of byzantine attacks. Fully protecting FL that integrally considers the issues including efficiency, data privacy, and data distribution is an extremely challenging problem, especially when there still exist many attack surfaces in its standard process. In the next section, we will present a newly found attack approach to support our observation.

V Weight attack

In this section, we propose a new attack approach called weight attack to circumvent those defense schemes. The key idea lies in manipulating the drawbacks of the weight assignment strategy that have not received enough attention yet. Our attack is simple and easy to carry out in practice, and performs well even when those defense schemes were deployed.

V-a System Overview

In the standard federated learning setting, when aggregating updates in each iteration, the weight assigned to each update totally depends on the size of the local training data set [1, 11, 14, 12, 2, 3]. The central server has no authority or effective means to check the sizes and quality of clients’ training data, due to the privacy reasons. Therefore, the local data set size is declared by the clients themselves without any verification.

Based on this observation, any malicious client can arbitrarily lie about its data set size for gaining a high weight. According to the way the attackers declare their training data set sizes, we consider the following two simple misreport cases:

  1. The attackers’ training data set sizes are much smaller than that of the regular clients, but they declare that they have similar sizes with the regular clients.

  2. The attackers and the regular clients have similar training data set sizes, but the attackers declare that their training set sizes are much larger than that of the regular clients.

Obviously, it is very easy for attackers to initiate such attack since the server cannot examine the clients’ declarations. Next, we briefly introduce the specific process of the weight attack.

V-B Algorithm Design

Step 1: The central server broadcasts the global model to each selected client.

Step 2: Each client, including the attackers, re-trains the global model based on its local training data set.

Step 3: The clients send the updates to the server. And the attackers misreport their data set sizes, while the regular clients faithfully report their training set sizes.

Step 4: The central server aggregates the received updated models to obtain a new global model and repeats steps 1 to 4 until the global model converges.

Fig. 2: The accuracy of existing defense schemes under weight attack with: (a) 20 attackers, (b) 30 attackers, (c) 40 attackers, (d) 50 attackers

Vi Experiments and evaluation

In this section, we conduct experiments to show the effectiveness of the weight attack. The experiments are implemented with Tensorflow on CIFAR-10 image classification dataset, which is composed of 50K images for training and 10K images for testing. We use CNN with 2 convolutional layers followed by 2 fully connected layers. Since our purpose is to examine the effectiveness of the weight attack, we omit the client selection process and assume that there are 20 clients in total and all of them are selected in each round.

We only consider the first misreport case defined in Section V-A since it is more practical than the second case. This is because if a bad node claims to have a larger dataset size than that of regular clients, the central server is easier to find abnormality and thus require the node re-upload the local update or prove that the update is indeed derived from the claimed dataset size. In the first misreport case, however, all the dataset sizes are similar, the central server cannot tell which nodes might be malicious just from the dataset sizes.

We test four typical defense schemes (i.e., Multi-Krum, FABA, Zeno, and Median), and evaluate the weight attack with 4, 6, 8, and 10 attackers among 20 clients, i.e., there are 20, 30, % and 50 attackers, respectively. We set the training data set size (i.e., number of images) to be 2500 and 100 for regular clients and attackers, respectively. But they gain equal weight for each update on the server side. And it should be noted that the clients’ data is allocated in an iid way. Specifically, we randomly scramble all the data and allocate the appropriate amount of data based on the dataset size of each client. As a comparison, we also consider the case without attackers.

In Fig. 2, we observe that with 20 attackers, Multi-Krum, FABA, and Median have similar performance, and their accuracy is lower than the case without attacker, while Zeno performs slightly worse than the other three schemes. In the case of 30 attackers, the accuracy of Multi-Krum and FABA is significantly affected. In comparison, Median and Zeno performs better than Multi-Krum and FABA, and their accuracy is about 52. When there are 40 attackers, Zeno performs similar to the case of 30 attackers, while the accuracy of the other three schemes is further reduced. In the case of 50 attackers, Multi-Krum and FABA cannot converge, and both of them have low accuracy (i.e., 20). Although Zeno and Median perform better, their accuracy is still 10 lower than the case without attacker.

We also give a comparison between our weight attack with two typical byzantine attacks: label flipping attack and sign flipping attack. In the sign flipping attack, after obtaining the local model, the attacker multiplies it with a negative number. We set the negative number to be in our experiments, which is also adopted in existing works. The experiments are conducted on CIFAR-10 data set, and 40 participants are malicious among 20 clients. Note that Multi-Krum and FABA are used as the defense. As shown in Fig. 3, label flipping attack and sign flipping attack have little effect on the accuracy when Multi-Krum or FABA is deployed, and both of the attacks reduce the accuracy by 2-5. On the contrary, the weight attack has a high attack success rate, which can reduce the accuracy by 20.

From the above experiments, we can conclude that the weight attack can indeed decay the existing defense schemes, especially in reducing the prediction accuracy of the global model or even preventing it from converging.

Vii Possible solutions to weight attack

Existing defense solutions are not able to mitigate the weight attack. The main difficult lies in the fact that the server cannot directly examine the quality of the clients’ local data sets. Next, we discuss some possible countermeasures.

Although the distance based schemes such as Multi-Krum and FABA fail to resist the weight attack, we think it is still a promising solution. The reason why both Multi-Krum and FABA fail is that they are inclined to exclude updates that are far from the overall distribution. We hold the viewpoint that by analyzing the distribution of local updates, designing a new distance based strategy is able to directly evade the “bad” updates.

Besides, as shown in our experiments, the performance based defense scheme, such as Zeno, performs much better than other schemes. We think this kind of defense scheme can do better in the future, because the most straightforward way to determine whether an update is benign or malicious is to examine its performance. The “bad” updates generated from the weight attack are sure to act differently when the clean test data set is well designed for some experiments.

As for the statistics based and the target optimization based defense schemes, we firmly believe that by fully exploiting the statistical characteristics of local updates or selecting a good loss to optimize the objective function, they can mitigate the weight attack effectively as well.

Fig. 3: The comparison between weight attack and existing byzantine attacks under (a) Multi-Krum, (b) FABA

Viii Challenges and research directions

Although the byzantine attacks in FL have attracted much research interest and great efforts have been devoted to designing a secure FL scheme, there are many open problems that need to be further investigated. In this section, we outline some challenges and research directions which we believe are of great significance for defending against byzantine attacks.

Viii-a Fair Reward Distribution in FL

The success of the weight attack relies on the assumption that the server assigns the weight to updates according to the local training data set size of each client. It enables attackers to arbitrarily claim their workload in the local training by falsely reporting their data set size. Moreover, even if the server has deployed effective detection methods, if any, to discard bad updates generated from the weight attack, the clients may also tend to misbehave (e.g., being lazy) since they will finally share the same global model, no matter how many data sets are used for local training or how much computing power are devoted.

In light of this, we emphasize that it is of significant importance to design a fair incentive mechanism before putting FL into practice. Apart from the training data set size, more metrics (e.g., model quality, computing resources, past behavior) should be included to evaluate the contribution of each client such that they will be fairly rewarded. As a result, each client is incentivized to do correct computations. This can not only discourage the weight attack, but also motivate clients to contribute more resources to speed-up the local training.

Viii-B Defending Attacks with Privacy Protection

The primary goal of federated learning is to protect users’ privacy by requiring them to upload local updates instead of their training data. However, recent research has shown that deep models will also reveal private information about the training data. Existing defense schemes mainly focus on mitigating byzantine attacks, but ignore the possible privacy leakage through the updates.

Hence, it is necessary to consider privacy protection while defending against byzantine attackers simultaneously. The most straightforward solution is letting the client encrypt local updates before uploading them to the central server, who can carry out a defense scheme to detect anomaly over encrypted data. Then the central server broadcasts the encrypted global model to clients for decryption and proceeds to the next iteration. The main challenge for this method lies in designing a secure computation protocol that can effectively detect anomaly while protecting the privacy of update, without affecting the performance of the final global model at the same time. Cryptographic tools such as homomorphic encryption or garbled circuit can provide accurate computations over encrypted data, but they will bring a high computation overhead. Other privacy-enhancing techniques like differential privacy, or hardware-based trusted execution environment enjoy a high efficiency, but they may cause loss on the model accuracy or cannot fully protect the private information. A trade-off between efficiency, security, and privacy needs to be carefully considered for specific application scenarios.

Ix Conclusion

The advance of federated learning has given researchers a new direction in addressing the security and privacy issues of distributed training. Mitigating byzantine attacks is important for securing federated learning. In this article, we review existing solutions for defending against byzantine attacks. After a comprehensive comparison and discussion, we propose a new attack method that can pose threats to existing defense schemes, supported by our experimental results. Finally, we indicate several challenges and the future research direction of FL.


  • [1] P. Blanchard, E. Mahdi, R. Guerraoui, and J. Stainer (2017) Machine learning with adversaries: byzantine tolerant gradient descent. In Proc. of NIPS, pp. 119–129. Cited by: §I, §III-A, TABLE I, §V-A.
  • [2] D. Cao, S. Chang, Z. Lin, G. Liu, and D. Sun (2019) Understanding distributed poisoning attack in federated learning. In Proc. of IEEE ICPADS, pp. 233–239. Cited by: §III-A, TABLE I, §V-A.
  • [3] X. Cao and L. Lai (2019) Distributed gradient descent algorithm robust to an arbitrary number of byzantine attackers. IEEE Trans. Signal Process. 67 (22), pp. 5850–5864. Cited by: §III-B, TABLE I, §V-A.
  • [4] C. Fung, C. J. M. Yoon, and I. Beschastnikh (14 August, 2018) Mitigating sybils in federated learning poisoning. External Links: Link Cited by: §III-A, TABLE I.
  • [5] L. Li, W. Xu, T. Chen, G. B. Giannakis, and Q. Ling (2019) RSA: byzantine-robust stochastic aggregation methods for distributed learning from heterogeneous datasets. In Proc. of AAAI, pp. 1544–1551. Cited by: §I, §III-D, TABLE I, §IV.
  • [6] S. Li, Y. Cheng, Y. Liu, W. Wang, and T. Chen (22 October, 2019) Abnormal client behavior detection in federated learning. CoRR abs/1910.09933. External Links: Link Cited by: §I, §III-B, TABLE I.
  • [7] E. Mhamdi, R. Guerraoui, and S. Rouault (2018) The hidden vulnerability of distributed learning in byzantium. In Proc. of ICML, pp. 3521–3530. Cited by: §III-C, TABLE I.
  • [8] V. Mothukuri, R. M. Parizi, S. Pouriyeh, Y. Huang, A. Dehghantanha, and G. Srivastava (2020) A survey on security and privacy of federated learning. Future Generation Computer Systems 115, pp. 619–640. Cited by: §I.
  • [9] L. Muñoz-González, K. T. Co, and E. C. Lupu (11 September, 2019) Byzantine-robust federated machine learning through adaptive model averaging. External Links: Link Cited by: §I, §III-C, TABLE I.
  • [10] V. K. Pillutla, S. M. Kakade, and Z. Harchaoui (31 December, 2019) Robust aggregation for federated learning. External Links: Link Cited by: §III-C, TABLE I.
  • [11] Q. Xia, Z. Tao, Z. Hao, and Q. Li (2019) FABA: an algorithm for fast aggregation against byzantine attacks in distributed neural networks. In Proc. of IJCAI, pp. 4824–4830. Cited by: §I, §III-A, TABLE I, §V-A.
  • [12] C. Xie, O. Koyejo, and I. Gupta (2019) SLSGD: secure and efficient distributed on-device machine learning. In Proc. of JECML KDD, pp. 213–228. Cited by: §III-C, TABLE I, §V-A.
  • [13] C. Xie, O. Koyejo, and I. Gupta (27 February, 2018) Generalized byzantine-tolerant SGD. External Links: Link Cited by: §III-C, TABLE I.
  • [14] C. Xie, S. Koyejo, and I. Gupta (2019)

    Zeno: distributed stochastic gradient descent with suspicion-based fault-tolerance

    In Proc. of ICML, pp. 6893–6901. Cited by: §I, §III-B, TABLE I, §IV, §V-A.
  • [15] D. Yin, Y. Chen, K. Ramchandran, and P. L. Bartlett (2018) Byzantine-robust distributed learning: towards optimal statistical rates. In Proc. of ICML, pp. 5636–5645. Cited by: §III-C, §III-C, TABLE I.