Local Model Poisoning Attacks to Byzantine-Robust Federated Learning

11/26/2019 ∙ by Minghong Fang, et al. ∙ Iowa State University of Science and Technology Duke University 0

In federated learning, multiple client devices jointly learn a machine learning model: each client device maintains a local model for its local training dataset, while a master device maintains a global model via aggregating the local models from the client devices. The machine learning community recently proposed several federated learning methods that were claimed to be robust against Byzantine failures (e.g., system failures, adversarial manipulations) of certain client devices. In this work, we perform the first systematic study on local model poisoning attacks to federated learning. We assume an attacker has compromised some client devices, and the attacker manipulates the local model parameters on the compromised client devices during the learning process such that the global model has a large testing error rate. We formulate our attacks as optimization problems and apply our attacks to four recent Byzantine-robust federated learning methods. Our empirical results on four real-world datasets show that our attacks can substantially increase the error rates of the models learnt by the federated learning methods that were claimed to be robust against Byzantine failures of some client devices. We generalize two defenses for data poisoning attacks to defend against our local model poisoning attacks. Our evaluation results show that one defense can effectively defend against our attacks in some cases, but the defenses are not effective enough in other cases, highlighting the need for new defenses against our local model poisoning attacks to federated learning.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Byzantine-robust federated learning:  In federated learning (also known as collaborative learning[Konen16, McMahan17], the training dataset is decentralized among multiple client devices (e.g., desktops, mobile phones, IoT devices), which could belong to different users or organizations. These users/organizations do not want to share their local training datasets, but still desire to jointly learn a model. For instance, multiple hospitals may desire to learn a healthcare model without sharing their sensitive data to each other. Each client device (called worker device) maintains a local model for its local training dataset. Moreover, the service provider has a master device (e.g., cloud server), which maintains a global model. Roughly speaking, federated learning repeatedly performs three steps: the master device sends the current global model to worker devices; worker devices update their local models using their local training datasets and the global model, and send the local models to the master device; and the master device computes a new global model via aggregating the local models according to a certain aggregation rule.

For instance, the mean aggregation rule that takes the average of the local model parameters as the global model is widely used under non-adversarial settings. However, the global model can be arbitrarily manipulated for mean even if just one worker device is compromised [Blanchard17, Yin18]. Therefore, the machine learning community recently proposed multiple aggregation rules (e.g., Krum [Blanchard17], Bulyan [Mhamdi18], trimmed mean [Yin18], and median [Yin18]), which aimed to be robust against Byzantine failures of certain worker devices.

Figure 1: Data vs. local model poisoning attacks.

Existing data poisoning attacks are insufficient:  We consider attacks that aim to manipulate the training phase

of machine learning such that the learnt model (we consider the model to be a classifier) has a high testing error rate indiscriminately for testing examples, which makes the model unusable and eventually leads to denial-of-service attacks. Figure 

1 shows the training phase, which includes two components, i.e., training dataset collection and learning process. The training dataset collection component is to collect a training dataset, while the learning process component produces a model from a given training dataset. Existing attacks mainly inject malicious data into the training dataset before the learning process starts, while the learning process is assumed to maintain integrity. Therefore, these attacks are often called data poisoning attacks [rubinstein2009antidote, biggio2012poisoning, xiao2015feature, poisoningattackRecSys16, Jagielski18, Suciu18]. In federated learning, an attacker could only inject the malicious data into the worker devices that are under the attacker’s control. As a result, these data poisoning attacks have limited success to attack Byzantine-robust federated learning (see our experimental results in Section 4.4).

Our work:  We perform the first study on local model poisoning attacks to Byzantine-robust federated learning. Existing studies [Blanchard17, Yin18] only showed local model poisoning attacks to federated learning with the non-robust mean aggregation rule.

Threat model. Unlike existing data poisoning attacks that compromise the integrity of training dataset collection, we aim to compromise the integrity of the learning process in the training phase (see Figure 1). We assume the attacker has control of some worker devices and manipulates the local model parameters sent from these devices to the master device during the learning process. The attacker may or may not know the aggregation rule used by the master device. To contrast with data poisoning attacks, we call our attacks local model poisoning attacks as they directly manipulate the local model parameters.

Local model poisoning attacks. A key challenge of local model poisoning attacks is how to craft the local models sent from the compromised worker devices to the master device. To address this challenge, we formulate crafting local models as solving an optimization problem in each iteration of federated learning. Specifically, the master device could compute a global model in an iteration if there are no attacks, which we call before-attack global model. Our goal is to craft the local models on the compromised worker devices such that the global model deviates the most towards the inverse of the direction along which the before-attack global model would change. Our intuition is that the deviations accumulated over multiple iterations would make the learnt global model differ from the before-attack one significantly. We apply our attacks to four recent Byzantine-robust federated learning methods including Krum, Bulyan, trimmed mean, and median.

Our evaluation results on the MNIST, Fashion-MNIST, CH-MNIST, and Breast Cancer Wisconsin (Diagnostic) datasets show that our attacks can substantially increase the error rates of the global models under various settings of federated learning. For instance, when learning a deep neural network classifier for MNIST using Krum, our attack can increase the error rate from 0.11 to 0.75. Moreover, we compare with data poisoning attacks including

label flipping attacks and back-gradient optimization based attacks [munoz2017towards] (state-of-the-art untargeted data poisoning attacks for multi-class classifiers), which poison the local training datasets on the compromised worker devices. We find that these data poisoning attacks have limited success to attack the Byzantine-robust federated learning methods.

Defenses. Existing defenses against data poisoning attacks essentially aim to sanitize the training dataset. One category of defenses [Cretu08, barreno2010security, Suciu18, Tran18] detects malicious data based on their negative impact on the error rate of the learnt model. For instance, Reject on Negative Impact (RONI) [barreno2010security] measures the impact of each training example on the error rate of the learnt model and removes the training examples that have large negative impact. Another category of defenses [Feng14, Liu17AiSec, Jagielski18]

leverages new loss functions, solving which detects malicious data and learns a model simultaneously. For instance, Jagielski et al. 

[Jagielski18] proposed TRIM, which aims to jointly find a subset of training dataset with a given size and model parameters that minimize the loss function. The training examples that are not in the selected subset are treated as malicious data. However, these defenses are not directly applicable for our local model poisoning attacks because our attacks do not inject malicious data into the training dataset.

To address the challenge, we generalize RONI and TRIM to defend against our local model poisoning attacks. Both defenses remove the local models that are potentially malicious before computing the global model using a Byzantine-robust aggregation rule in each iteration. One defense removes the local models that have large negative impact on the error rate of the global model (inspired by RONI that removes training examples that have large negative impact on the error rate of the model), while the other defense removes the local models that result in large loss (inspired by TRIM that removes the training examples that have large negative impact on the loss), where the error rate and loss are evaluated on a validation dataset. We call the two defenses Error Rate based Rejection (ERR) and Loss Function based Rejection (LFR), respectively. Moreover, we combine ERR and LFR, i.e., we remove the local models that are removed by either ERR or LFR. Our empirical evaluation results show that LFR outperforms ERR; and the combined defense is comparable to LFR in most cases. Moreover, LFR can defend against our attacks in certain cases, but LFR is not effective enough in other cases. For instance, LFR can effectively defend against our attacks that craft local models based on the trimmed mean aggregation rule, but LFR is not effective against our attacks that are based on the Krum aggregation rule. Our results show that we need new defense mechanisms to defend against our local model poisoning attacks.

Our key contributions can be summarized as follows:

  • We perform the first systematic study on attacking Byzantine-robust federated learning.

  • We propose local model poisoning attacks to Byzantine-robust federated learning. Our attacks manipulate the local model parameters on compromised worker devices during the learning process.

  • We generalize two defenses for data poisoning attacks to defend against local model poisoning attacks. Our results show that, although one of them is effective in some cases, they have limited success in other cases.

2 Background and Problem Formulation

2.1 Federated Learning

Suppose we have worker devices and the th worker device has a local training dataset . The worker devices aim to collaboratively learn a classifier. Specifically, the model parameters of the classifier are often obtained via solving the following optimization problem: , where is the objective function for the local training dataset on the th device and characterizes how well the parameters model the local training dataset on the

th device. Different classifiers (e.g., logistic regression, deep neural networks) use different objective functions. In federated learning, each worker device maintains a local model for its local training dataset. Moreover, we have a master device to maintain a global model via aggregating local models from the

worker devices. Specifically, federated learning performs the following three steps in each iteration:

Step I. The master device sends the current global model parameters to all worker devices.

Step II. The worker devices update their local model parameters using the current global model parameters and their local training datasets in parallel. In particular, the th worker device essentially aims to solve the optimization problem with the global model parameters as an initialization of the local model parameters . A worker device could use any method to solve the optimization problem, though stochastic gradient descent is the most popular one. Specifically, the th worker device updates its local model parameters as , where is the learning rate and is a randomly sampled batch from the local training dataset

. Note that a worker device could apply stochastic gradient descent multiple rounds to update its local model. After updating the local models, the worker devices send them to the master device.

Step III. The master device aggregates the local models from the worker devices to obtain a new global model according to a certain aggregation rule. Formally, we have .

The master device could also randomly pick a subset of worker devices and send the global model to them; the picked worker devices update their local models and send them to the master device; and the master device aggregates the local models to obtain the new global model [McMahan17]. We note that, for the aggregation rules we study in this paper, sending local models to the master device is equivalent to sending gradients to the master device, who aggregates the gradients and uses them to update the global model.

2.2 Byzantine-robust Aggregation Rules

A naive aggregation rule is to average the local model parameters as the global model parameters. This mean aggregation rule is widely used under non-adversarial settings [Dean12, Konen16, McMahan17]. However, mean is not robust under adversarial settings. In particular, an attacker can manipulate the global model parameters arbitrarily for this mean aggregation rule when compromising only one worker device [Blanchard17, Yin18]. Therefore, the machine learning community has recently developed multiple aggregation rules that aim to be robust even if certain worker devices exhibit Byzantine failures. Next, we review several such aggregation rules.

Krum [Blanchard17] and Bulyan [Mhamdi18]:  Krum selects one of the local models that is similar to other models as the global model. The intuition is that even if the selected local model is from a compromised worker device, its impact may be constrained since it is similar to other local models possibly from benign worker devices. Suppose at most worker devices are compromised. For each local model , the master device computes the local models that are the closest to with respect to Euclidean distance. Moreover, the master device computes the sum of the distances between and its closest local models. Krum selects the local model with the smallest sum of distance as the global model. When , Krum has theoretical guarantees for the convergence for certain objective functions.

Euclidean distance between two local models could be substantially influenced by a single model parameter. Therefore, Krum could be influenced by some abnormal model parameters [Mhamdi18]. To address this issue, Mhamdi et al. [Mhamdi18] proposed Bulyan, which essentially combines Krum and a variant of trimmed mean (trimmed mean will be discussed next). Specifically, Bulyan first iteratively applies Krum to select () local models. Then, Bulyan uses a variant of trimmed mean to aggregate the local models. In particular, for each th model parameter, Bulyan sorts the th parameters of the local models, finds the () parameters that are the closest to the median, and computes their mean as the th parameter of the global model. When , Bulyan has theoretical guarantees for the convergence under certain assumptions of the objective function.

Since Bulyan is based on Krum, our attacks for Krum can transfer to Bulyan (see Appendix A). Moreover, Bulyan is not scalable because it executes Krum many times in each iteration and Krum computes pairwise distances between local models. Therefore, we will focus on Krum in the paper.

Trimmed mean [Yin18]:  This aggregation rule aggregates each model parameter independently. Specifically, for each th model parameter, the master device sorts the th parameters of the local models, i.e., , where is the th parameter of the th local model, removes the largest and smallest of them, and computes the mean of the remaining parameters as the th parameter of the global model. Suppose at most worker devices are compromised. This trimmed mean aggregation rule achieves order-optimal error rate when and the objective function to be minimized is strongly convex. Specifically, the order-optimal error rate is ,111 is a variant of the notation, which ignores the logarithmic terms. where is the number of training data points on a worker device (worker devices are assumed to have the same number of training data points).

Median [Yin18]:  In this median aggregation rule, for each th model parameter, the master device sorts the th parameters of the local models and takes the median as the th parameter of the global model. Note that when is an even number, median is the mean of the middle two parameters. Like the trimmed mean aggregation rule, the median aggregation rule also achieves an order-optimal error rate when the objective function is strongly convex.

2.3 Problem Definition and Threat Model

Attacker’s goal:  Like many studies on poisoning attacks [rubinstein2009antidote, biggio2012poisoning, biggio2013poisoning, xiao2015feature, Jagielski18, poisoningattackRecSys16, YangRecSys17], we consider an attacker’s goal is to manipulate the learnt global model such that it has a high error rate indiscriminately for testing examples. Such attacks are known as untargeted poisoning attacks, which make the learnt model unusable and eventually lead to denial-of-service attacks. For instance, an attacker may perform such attacks to its competitor’s federated learning system. Some studies also considered other types of poisoning attacks (e.g., targeted poisoning attacks [Suciu18]), which we will review in Section 6.

We note that the Byzantine-robust aggregation rules discussed above can asymptotically bound the error rates of the learnt global model under certain assumptions of the objective functions, and some of them (i.e., trimmed mean and median) even achieve order-optimal error rates. These theoretical guarantees seem to imply the difficulty of manipulating the error rates. However, the asymptotic guarantees do not precisely characterize the practical performance of the learnt models. Specifically, the asymptotic error rates are quantified using the notation. The notation ignores any constant, e.g., =. However, such constant significantly influences a model’s error rate in practice. As we will show, although these asymptotic error rates still hold for our local model poisoning attacks since they hold for Byzantine failures, our attacks can still significantly increase the testing error rates of the learnt models in practice.

Attacker’s capability:  We assume the attacker has control of worker devices. Specifically, like Sybil attacks [sybil] to distributed systems, the attacker could inject fake worker devices into the federated learning system or compromise benign worker devices. However, we assume the number of worker devices under the attacker’s control is less than 50% (otherwise, it would be easy to manipulate the global models). We assume the attacker can arbitrarily manipulate the local models sent from these worker devices to the master device. For simplicity, we call these worker devices compromised worker devices no matter whether they are fake devices or compromised benign ones.

Attacker’s background knowledge:  The attacker knows the code, local training datasets, and local models on the compromised worker devices. We characterize the attacker’s background knowledge along the following two dimensions:

Aggregation rule. We consider two scenarios depending on whether the attacker knows the aggregation rule or not. In particular, the attacker could know the aggregation rule in various scenarios. For instance, the service provider may make the aggregation rule public in order to increase transparency and trust of the federated learning system [McMahan17]. When the attacker does not know the aggregation rule, we will craft local model parameters for the compromised worker devices based on a certain aggregation rule. Our empirical results show that such crafted local models could also attack other aggregation rules. In particular, we observe different levels of transferability of our local model poisoning attacks between different aggregation rules.

Training data. We consider two cases (full knowledge and partial knowledge

) depending on whether the attacker knows the local training datasets and local models on the benign worker devices. In the full knowledge scenario, the attacker knows the local training dataset and local model on every worker device. We note that the full knowledge scenario has limited applicability in practice for federated learning as the training dataset is decentralized on many worker devices, and we use it to estimate the

upper bound of our attacks’ threats for a given setting of federated learning. In the partial knowledge scenario, the attacker only knows the local training datasets and local models on the compromised worker devices.

Our threat model is inspired by multiple existing studies [Papernot16, Papernot16Distillation, Jagielski18, Suciu18]

on adversarial machine learning. For instance, Suciu et al. 

[Suciu18] recently proposed to characterize an attacker’s background knowledge and capability for data poisoning attacks with respect to multiple dimensions such as Feature, Algorithm, and Instance. Our aggregation rule and training data dimensions are essentially the Algorithm and Instance dimensions, respectively. We do not consider the Feature dimension because the attacker controls some worker devices and already knows the features in our setting.

Some Byzantine-robust aggregation rules (e.g., Krum [Blanchard17] and trimmed mean [Yin18]) need to know the upper bound of the number of compromised worker devices in order to set parameters appropriately. For instance, trimmed mean removes the largest and smallest local model parameters, where is at least the number of compromised worker devices (otherwise trimmed mean can be easily manipulated). To calculate a lower bound for our attack’s threat, we consider a hypothetical, strong service provider who knows the number of compromised worker devices and sets parameters in the aggregation rule accordingly.

3 Our Local Model Poisoning Attacks

We focus on the case where the aggregation rule is known. When the aggregation rule is unknown, we craft local models based on an assumed one. Our empirical results in Section 4.3 show that our attacks have different levels of transferability between aggregation rules.

3.1 Optimization Problem

Our idea is to manipulate the global model via carefully crafting the local models sent from the compromised worker devices to the master device in each iteration of federated learning. We denote by the changing direction of the th global model parameter in the current iteration when there are no attacks, where or . (or ) means that the th global model parameter increases (or decreases) upon the previous iteration. We consider the attacker’s goal (we call it directed deviation goal) is to deviate a global model parameter the most towards the inverse of the direction along which the global model parameter would change without attacks. Suppose in an iteration, is the local model that the th worker device intends to send to the master device when there are no attacks. Without loss of generality, we assume the first worker devices are compromised. Our directed deviation goal is to craft local models for the compromised worker devices via solving the following optimization problem in each iteration:

subject to
(1)

where

is a column vector of the changing directions of all global model parameters,

is the before-attack global model, and is the after-attack global model. Note that , , and all depend on the iteration number. Since our attacks manipulate the local models in each iteration, we omit the explicit dependency on the iteration number for simplicity.

In our preliminary exploration of formulating poisoning attacks, we also considered a deviation goal, which does not consider the global model parameters’ changing directions. We empirically find that our attacks based on both the directed deviation goal and the deviation goal achieve high testing error rates for Krum. However, the directed deviation goal substantially outperforms the deviation goal for trimmed mean and median aggregation rules. Appendix B shows our deviation goal and the empirical comparisons between deviation goal and directed deviation goal.

3.2 Attacking Krum

Recall that Krum selects one local model as the global model in each iteration. Suppose is the selected local model in the current iteration when there are no attacks. Our goal is to craft the compromised local models such that the local model selected by Krum has the largest directed deviation from . Our idea is to make Krum select a certain crafted local model (e.g., without loss of generality) via crafting the compromised local models. Therefore, we aim to solve the optimization problem in Equation 3.1 with and the aggregation rule is Krum.

Full knowledge:  The key challenge of solving the optimization problem is that the constraint of the optimization problem is highly nonlinear and the search space of the local models is large. To address the challenge, we make two approximations. Our approximations represent suboptimal solutions to the optimization problem, which means that the attacks based on the approximations may have suboptimal performance. However, as we will demonstrate in our experiments, our attacks already substantially increase the error rate of the learnt model.

First, we restrict as follows: , where is the global model received from the master device in the current iteration (i.e., the global model obtained in the previous iteration) and . This approximation explicitly models the directed deviation between the crafted local model and the received global model. We also explored the approximation , which means that we explicitly model the directed deviation between the crafted local model and the local model selected by Krum before attack. However, we found that our attacks are less effective using this approximation.

Second, to make more likely to be selected by Krum, we craft the other compromised local models to be close to . In particular, when the other compromised local models are close to , only needs to have a small distance to benign local models in order to be selected by Krum. In other words, the other compromised local models “support” the crafted local model . In implementing our attack, we first assume the other compromised local models are the same as , then we solve , and finally we randomly sample vectors, whose distance to is at most , as the other compromised local models. With our two approximations, we transform the optimization problem as follows:

subject to
(2)

More precisely, the objective function in the above optimization problem should be . However, is a constant and where is the number of parameters in the global model. Therefore, we simplify the objective function to be just . After solving in the optimization problem, we can obtain the crafted local model . Then, we randomly sample vectors whose distance to is at most as the other compromised local models. We will explore the impact of on the effectiveness of our attacks in experiments.

Solving . Solving in the optimization problem in Equation 3.2 is key to our attacks. First, we derive an upper bound of the solution to the optimization problem. Formally, we have the following theorem.

Theorem 1.

Suppose is a solution to the optimization problem in Equation 3.2. is upper bounded as follows:

(3)

where is the number of parameters in the global model, is the Euclidean distance between and , is the set of benign local models that have the smallest Euclidean distance to .

Proof.

See Appendix C. ∎

Given the upper bound, we use a binary search to solve . Specifically, we initialize as the upper bound and check whether Krum selects as the global model; if not, then we half ; we repeat this process until Krum selects or is smaller than a certain threshold (this indicates that the optimization problem may not have a solution). In our experiments, we use as the threshold.

Partial knowledge:  In the partial knowledge scenario, the attacker does not know the local models on the benign worker devices, i.e., . As a result, the attacker does not know the changing directions and cannot solve the optimization problem in Equation 3.2. However, the attacker has access to the before-attack local models on the compromised worker devices. Therefore, we propose to craft compromised local models based on these before-attack local models. First, we compute the mean of the before-attack local models as . Second, we estimate the changing directions using the mean local model. Specifically, if the mean of the th parameter is larger than the th global model parameter received from the master device in the current iteration, then we estimate the changing direction for the th parameter to be , otherwise we estimate it to be . For simplicity, we denote by the vector of estimated changing directions.

Third, we treat the before-attack local models on the compromised worker devices as if they were local models on benign worker devices, and we aim to craft local model such that, among the crafted local model and the before-attack local models, Krum selects the crafted local model. Formally, we have the following optimization problem:

subject to
(4)

Similar to Theorem 1, we can also derive an upper bound of for the optimization problem in Equation 3.2. Moreover, similar to the full knowledge scenario, we use a binary search to solve . However, unlike the full knowledge scenario, if we cannot find a solution until is smaller than a threshold (i.e., ), then we add one more crafted local model such that among the crafted local models , , and the before-attack local models, Krum selects the crafted local model . Specifically, we solve the optimization problem in Equation 3.2 with added into the Krum aggregation rule. Like the full knowledge scenario, we assume . If we still cannot find a solution until is smaller than the threshold, we add another crafted local model. We repeat this process until finding a solution . We find that such iterative searching process makes our attack more effective for Krum in the partial knowledge scenario. After solving , we obtain the crafted local model . Then, like the full knowledge scenario, we randomly sample vectors whose distance to is at most as the other compromised local models.

3.3 Attacking Trimmed Mean

Suppose is the th before-attack local model parameter on the th worker device and is the th before-attack global model parameter in the current iteration. We discuss how we craft each local model parameter on the compromised worker devices. We denote by and the maximum and minimum of the th local model parameters on the benign worker devices, i.e., = and =.

Full knowledge:  Theoretically, we can show that the following attack can maximize the directed deviations of the global model (i.e., an optimal solution to the optimization problem in Equation 3.1): if , then we use any numbers that are larger than as the th local model parameters on the compromised worker devices, otherwise we use any numbers that are smaller than as the th local model parameters on the compromised worker devices.

Intuitively, our attack crafts the compromised local models based on the maximum or minimum benign local model parameters, depending on which one deviates the global model towards the inverse of the direction along which the global model would change without attacks. The sampled numbers should be close to or

to avoid being outliers and being detected easily. Therefore, when implementing the attack, if

, then we randomly sample the numbers in the interval [] (when ) or [] (when ), otherwise we randomly sample the numbers in the interval [] (when ) or [] (when ). Our attack does not depend on once . In our experiments, we set .

Partial knowledge:  An attacker faces two challenges in the partial knowledge scenario. First, the attacker does not know the changing direction variable because the attacker does not know the local models on the benign worker devices. Second, for the same reason, the attacker does not know the maximum and minimum of the benign local model parameters. Like Krum, to address the first challenge, we estimate the changing direction variables using the local models on the compromised worker devices.

One naive strategy to address the second challenge is to use a very large number as or a very small number as . However, if we craft the compromised local models based on or that are far away from their true values, the crafted local models may be outliers and the master device may detect the compromised local models easily. Therefore, we propose to estimate and using the before-attack local model parameters on the compromised worker devices. In particular, the attacker can compute the mean

and standard deviation

of each th parameter on the compromised worker devices.

Based on the assumption that each

th parameters of the benign worker devices are samples from a Gaussian distribution with mean

and standard deviation , we can estimate that is smaller than or

with large probabilities; and

is larger than or with large probabilities. Therefore, when is estimated to be , we sample numbers from the interval as the th parameter of the compromised local models, which means that the crafted compromised local model parameters are larger than the maximum of the benign local model parameters with a high probability (e.g., 0.898 – 0.998 when and under the Gaussian distribution assumption). When is estimated to be , we sample numbers from the interval as the th parameter of the compromised local models, which means that the crafted compromised local model parameters are smaller than the minimum of the benign local model parameters with a high probability. The th model parameters on the benign worker devices may not accurately follow a Gaussian distribution. However, our attacks are still effective empirically.

3.4 Attacking Median

We use the same attacks for trimmed mean to attack the median aggregation rule. For instance, in the full knowledge scenario, we randomly sample the numbers in the interval [] or [] if , otherwise we randomly sample the numbers in the interval [] or [].

4 Evaluation

We evaluate the effectiveness of our attacks using multiple datasets in different scenarios, e.g., the impact of different parameters and known vs. unknown aggregation rules. Moreover, we compare our attacks with existing attacks.

4.1 Experimental Setup

Datasets:  We consider four datasets: MNIST, Fashion-MNIST, CH-MNIST [kather2016multi]222We use a pre-processed version from https://www.kaggle.com/kmader/colorectal-histology-mnist#hmnist_64_64_L.csv. and Breast Cancer Wisconsin (Diagnostic) [Dua:2019]. MNIST and Fashion-MNIST each includes 60,000 training examples and 10,000 testing examples, where each example is an 2828 grayscale image. Both datasets are 10-class classification problems. The CH-MNIST dataset consists of 5000 images of histology tiles from patients with colorectal cancer. The dataset is an 8-class classification problem. Each image has 6464 grayscale pixels. We randomly select 4000 images as the training examples and use the remaining 1000 as the testing examples. The Breast Cancer Wisconsin (Diagnostic) dataset is a binary classification problem to diagnose whether a person has breast cancer. The dataset contains 569 examples, each of which has 30 features describing the characteristics of a person’s cell nuclei. We randomly select 455 (80%) examples as the training examples, and use the remaining 114 examples as the testing examples.

Machine learning classifiers:  We consider the following classifiers.

Multi-class logistic regression (LR). The considered aggregation rules have theoretical guarantees for the error rate of LR classifier.

Deep neural networks (DNN). For MNIST, Fashion-MNIST, and Breast Cancer Wisconsin (Diagnostic), we use a DNN with the architecture described in Table (a)a in Appendix. We use ResNet20 [he2016deep] for CH-MNIST. Our DNN architecture does not necessarily achieve the smallest error rates for the considered datasets, as our goal is not to search for the best DNN architecture. Our goal is to show that our attacks can increase the testing error rates of the learnt DNN classifiers.

Compared attacks:  We compare the following attacks.

Gaussian attack. This attack randomly crafts the local models on the compromised worker devices. Specifically, for each th model parameter, we estimate a Gaussian distribution using the before-attack local models on all worker devices. Then, for each compromised worker device, we sample a number from the Gaussian distribution and treat it as the th parameter of the local model on the compromised worker device. We use this Gaussian attack to show that crafting compromised local models randomly can not effectively attack the Byzantine-robust aggregation rules.

Label flipping attack. This is a data poisoning attack that does not require knowledge of the training data distribution. On each compromised worker device, this attack flips the label of each training instance. Specifically, we flip a label as , where is the number of classes in the classification problem and .

Back-gradient optimization based attack [munoz2017towards]. This is the state-of-the-art untargeted data poisoning attack for multi-class classifiers. We note that this attack is not scalable and thus we compare our attacks with this attack on a subset of MNIST separately. The results are shown in Section 4.4.

Full knowledge attack or partial knowledge attack. Our attack when the attacker knows the local models on all worker devices or the compromised ones.

Parameter Description Value
Number of worker devices. 100
Number of compromised worker devices. 20
Degree of Non-IID. 0.5
Distance parameter for Krum attacks. 0.01
Parameter of trimmed mean.
Table 1: Default setting for key parameters.

Parameter setting:  We describe parameter setting for the federated learning algorithms and our attacks. Table 1 summarizes the default setting for key parameters. We use MXNet [chen2015mxnet]

to implement federated learning and attacks. We repeat each experiment for 50 trials and report the average results. We observed that the variances are very small, so we omit them for simplicity.

Federated learning algorithms. By default, we assume worker devices; each worker device applies one round of stochastic gradient descent to update its local model; and the master device aggregates local models from all worker devices. One unique characteristic of federated learning is that the local training datasets on different devices may not be independently and identically distributed (i.e., non-IID[McMahan17]. We simulate federated learning with different non-IID training data distributions. Suppose we have classes in the classification problem, e.g., for the MNIST and Fashion-MNIST datasets, and for the CH-MNIST dataset. We evenly split the worker devices into groups. We model non-IID federated learning by assigning a training instance with label to the th group with probability , where . A higher indicates a higher degree of non-IID. For convenience, we call the probability degree of non-IID. Unless otherwise mentioned, we set .

We set 500 iterations for the LR classifier on MNIST; we set 2,000 iterations for the DNN classifiers on all four datasets; and we set the batch size to be 32 in stochastic gradient descent, except that we set the batch size to be 64 for Fashion-MNIST as such setting leads to a more accurate model. The trimmed mean aggregation rule prunes the largest and smallest parameters, where . Pruning more parameters leads to larger testing error rates without attacks. By default, we consider as the authors of trimmed mean did [Yin18].

Our attacks. Unless otherwise mentioned, we consider 20 worker devices are compromised. Our attacks to Krum have a parameter , which is related to the distance between the crafted compromised local models. We set (we will study the impact of on our attack). We do not set because makes the compromised local models exactly the same, making the compromised local models easily detected by the master device. Our attacks to trimmed mean and median have a parameter in the full knowledge scenario, where . Our attacks do not depend on once . We set . Unless otherwise mentioned, we assume that attacker manipulates the local models on the compromised worker devices in each iteration.

NoAttack Gaussian LabelFlip Partial Full
Krum 0.14 0.13 0.13 0.72 0.80
Trimmed mean 0.12 0.11 0.13 0.23 0.52
Median 0.13 0.13 0.15 0.19 0.29
(a) LR classifier, MNIST
NoAttack Gaussian LabelFlip Partial Full
Krum 0.11 0.10 0.10 0.75 0.77
Trimmed mean 0.06 0.07 0.07 0.14 0.23
Median 0.06 0.06 0.16 0.28 0.32
(b) DNN classifier, MNIST
NoAttack Gaussian LabelFlip Partial Full
Krum 0.16 0.16 0.16 0.90 0.91
Trimmed mean 0.10 0.10 0.12 0.26 0.28
Median 0.09 0.12 0.12 0.21 0.29
(c) DNN classifier, Fashion-MNIST
NoAttack Gaussian LabelFlip Partial Full
Krum 0.29 0.30 0.43 0.73 0.81
Trimmed mean 0.17 0.25 0.37 0.69 0.69
Median 0.17 0.20 0.17 0.57 0.63
(d) DNN classifier, CH-MNIST
NoAttack Gaussian LabelFlip Partial Full
Krum 0.03 0.04 0.14 0.17 0.17
Trimmed mean 0.02 0.03 0.05 0.14 0.15
Median 0.03 0.03 0.04 0.17 0.18
(e) DNN classifier, Breast Cancer Wisconsin (Diagnostic)
Table 2: Testing error rates of various attacks.

4.2 Results for Known Aggregation Rule

(a) Krum
(b) Trimmed mean
(c) Median
(d) Krum
(e) Trimmed mean
(f) Median
Figure 2: Testing error rates for different attacks as we have more compromised worker devices on MNIST. (a)-(c): LR classifier and (d)-(f): DNN classifier.

Our attacks are effective: Table 2 shows the testing error rates of the compared attacks on the four datasets. First, these results show that our attacks are effective and substantially outperform existing attacks, i.e., our attacks result in higher error rates. For instance, when dataset is MNIST, classifier is LR, and aggregation rule is Krum, our partial knowledge attack increases the error rate from 0.14 to 0.72 (around 400% relative increase). Gaussian attacks only increase the error rates in several cases, e.g., median aggregation rule for Fashion-MNIST, and trimmed mean and median for CH-MNIST. Label flipping attacks can increase the error rates for DNN classifiers in some cases but have limited success for LR classifiers.

Second, Krum is less robust to our attacks than trimmed mean and median, except on Breast Cancer Wisconsin (Diagnostic) where Krum is comparable to median. A possible reason why trimmed mean and median outperform Krum is that Krum picks one local model as the global model, while trimmed mean and median aggregate multiple local models to update the global model (the median selects one local model parameter for each model parameter, but the selected parameters may be from different local models). Trimmed mean is more robust to our attacks in some cases while median is more robust in other cases. Third, we observe that the error rates may depend on the data dimension. For instance, MNIST and Fashion-MNIST have 784 dimensions, CH-MNIST has 4096 dimensions, and Breast Cancer Wisconsin (Diagnostic) has 30 dimensions. For the DNN classifiers, the error rates are higher on CH-MNIST than on other datasets in most cases, while the error rates are lower on Breast Cancer Wisconsin (Diagnostic) than on other datasets in most cases.

We note that federated learning may have higher error rate than centralized learning, even if robustness feature is not considered (i.e., mean aggregation rule is used). For instance, the DNN classifiers respectively achieve testing error rates 0.01, 0.08, 0.07, and 0.01 in centralized learning on the four datasets, while they respectively achieve testing error rates 0.04, 0.09, 0.09, and 0.01 in federated learning with the mean aggregation rule on the four datasets. However, in the scenarios where users’ training data can only be stored on their edge/mobile devices, e.g., for privacy purposes, centralized learning is not applicable and federated learning may be the only option even though its error rate is higher. Compared to the mean aggregation rule, Byzantine-robust aggregation rule increases the error rate without attacks. However, if Byzantine-robust aggregation rule is not used, a single malicious device can make the learnt global model totally useless [Blanchard17, Yin18]. To summarize, in the scenarios where users’ training data can only be stored on their edge/mobile devices and there may exist attacks, Byzantine-robust federated learning may be the best option, even if its error rate is higher.

(a) Krum
(b) Trimmed mean
(c) Median
(d) Krum
(e) Trimmed mean
(f) Median
Figure 3: Testing error rates for different attacks as we increase the degree of non-IID on MNIST. (a)-(c): LR classifier and (d)-(f): DNN classifier.

Impact of the percentage of compromised worker devices:  Figure 2 shows the error rates of different attacks as the percentage of compromised worker devices increases on MNIST. Our attacks increase the error rates significantly as we compromise more worker devices; label flipping only slightly increases the error rates; and Gaussian attacks have no notable impact on the error rates. Two exceptions are that Krum’s error rates decrease when the percentage of compromised worker devices increases from 5% to 10% in Figure (a)a and from 10% to 15% in Figure (d)d. We suspect the reason is that Krum selects one local model as a global model in each iteration. We have similar observations on the other datasets. Therefore, we omit the corresponding results for simplicity.

Impact of the degree of non-IID in federated learning:  Figure 3 shows the error rates for the compared attacks for different degrees of non-IID on MNIST. Error rates of all attacks including no attacks increase as we increase the degree of non-IID, except that the error rates of our attacks to Krum fluctuate as the degree of non-IID increases. A possible reason is that as the local training datasets on different worker devices are more non-IID, the local models are more diverse, leaving more room for attacks. For instance, an extreme example is that if the local models on the benign worker devices are the same, it would be harder to attack the aggregation rules, because their aggregated model would be more likely to depend on the benign local models.

(a)
(b)
(c)
Figure 4: (a) Impact of the number of rounds of stochastic gradient descent worker devices use to update their local models in each iteration on our attacks. (b) Impact of the number of worker devices on our attacks. (c) Impact of the number of worker devices selected in each iteration on our attacks. MNIST, LR classifier, and median are used.
(a)
(b)
(c)
Figure 5: (a) Testing error rates of the trimmed mean aggregation rule when using different . (b) Testing error rates of the Krum aggregation rule when our attack uses different . (c) Testing error rates of the median aggregation rule when our attacks poison a certain fraction of randomly selected iterations of federated learning. MNIST and LR classifier are used.

Impact of different parameter settings of federated learning algorithms:  We study the impact of various parameters in federated learning including the number of rounds of stochastic gradient descent each worker device performs, number of worker devices, number of worker devices selected to update the global model in each iteration, and in trimmed mean. In these experiments, we use MNIST and the LR classifier for simplicity. Unless otherwise mentioned, we consider median, as median is more robust than Krum and does not require configuring extra parameters (trimmed mean requires configuring ). Moreover, for simplicity, we consider partial knowledge attacks as they are more practical.

Worker devices can perform multiple rounds of stochastic gradient descent to update their local models. Figure (a)a shows the impact of the number of rounds on the testing error rates of our attack. The testing error rates decrease as we use more rounds of stochastic gradient descent for both no attack and our partial knowledge attack. This is because more rounds of stochastic gradient descent lead to more accurate local models, and the local models on different worker devices are less diverse, leaving a smaller attack space. However, our attack still increases the error rates substantially even if we use more rounds. For instance, our attack still increases the error rate by more than 30% when using 10 rounds of stochastic gradient descent. We note that a large number of rounds result in large computational cost for worker devices, which may be unacceptable for resource-constrained devices such as mobile phones and IoT devices.

Figure (b)b shows the testing error rates of our attack as the number of worker devices increases, where 20% of worker devices are compromised. Our attack is more effective (i.e., testing error rate is larger) as the federated learning system involves more worker devices. We found a possible reason is that our partial knowledge attacks can more accurately estimate the changing directions with more worker devices. For instance, for trimmed mean of the DNN classifier on MNIST, our partial knowledge attacks can correctly estimate the changing directions of 72% of the global model parameters on average when there are 50 worker devices, and this fraction increases to 76% when there are 100 worker devices.

In federated learning [McMahan17], the master device could randomly sample some worker devices and send the global model to them; the sampled worker devices update their local models and send the updated local models to the master device; and the master device updates the global model using the local models from the sampled worker devices. Figure (c)c shows the impact of the number of worker devices selected in each iteration on the testing error rates of our attack, where the total number of worker devices is 100. Since the master device randomly selects a subset of worker devices in each iteration, a smaller number of compromised worker devices are selected in some iterations, while a larger number of compromised worker devices are selected in other iterations. On average, among the selected worker devices, of them are compromised ones, where is the total number of compromised worker devices and is the total number of worker devices. Our Figure 2 shows that our attacks become effective when is larger than 10%-15%. Note that an attacker can inject a large number of fake devices to a federated learning system, so can be large.

The trimmed mean aggregation rule has a parameter , which should be at least the number of compromised worker devices. Figure (a)a shows the testing error rates of no attack and our partial knowledge attack as increases. Roughly speaking, our attack is less effective (i.e., testing error rates are smaller) as more local model parameters are trimmed. This is because our crafted local model parameters on the compromised worker devices are more likely to be trimmed when the master device trims more local model parameters. However, the testing error of no attack also slightly increases as increases. The reason is that more benign local model parameters are trimmed and the mean of the remaining local model parameters becomes less accurate. The master device may be motivated to use a smaller to guarantee performance when there are no attacks.

Impact of the parameter in our attacks to Krum:  Figure (b)b shows the error rates of the Krum aggregation rule when our attacks use different , where MNIST dataset and LR classifier are considered. We observe that our attacks can effectively increase the error rates using a wide range of . Moreover, our attacks achieve larger error rates when is smaller. This is because when is smaller, the distances between the compromised local models are smaller, which makes it more likely for Krum to select the local model crafted by our attack as the global model.

Impact of the number of poisoned iterations:  Figure (c)c shows the error rates of the median aggregation rule when our attacks poison the local models on the compromised worker devices in a certain fraction of randomly selected iterations of federated learning. Unsurprisingly, the error rate increases when poisoning more iterations.

NoAttack Gaussian LabelFlip Partial Full
Krum 0.10 0.10 0.09 0.69 0.70
Trimmed mean 0.06 0.06 0.07 0.12 0.18
Median 0.06 0.06 0.06 0.11 0.32
Table 3: Testing error rates of attacks on the DNN classifier for MNIST when the master device chooses the global model with the lowest testing error rate.

Alternative training strategy:  Each iteration results in a global model. Instead of selecting the last global model as the final model, an alternative training strategy is to select the global model that has the lowest testing error rate.333We give advantages to the alternative training strategy since we use testing error rate to select the global model. Table 3 shows the testing error rates of various attacks on the DNN classifier for MNIST, when such alternative training strategy is adopted. In these experiments, our attacks attack each iteration of federated learning, and the column “NoAttack” corresponds to the scenarios where no iterations are attacked. Compared to Table (b)b, this alternative training strategy is slightly more secure against our attacks. However, our attacks are still effective. For instance, for the Krum, trimmed mean, and median aggregation rules, our partial knowledge attacks still increase the testing error rates by 590%, 100%, and 83%, respectively. Another training strategy is to roll back to a few iterations ago if the master device detects an unusual increase of training error rate. However, such training strategy is not applicable because the training error rates of the global models still decrease until convergence when we perform our attacks in each iteration. In other words, there are no unusual increases of training error rates.

4.3 Results for Unknown Aggregation Rule

We craft local models based on one aggregation rule and show the attack effectiveness for other aggregation rules. Table 4 shows the transferability between aggregation rules, where MNIST and LR classifier are considered. We observe different levels of transferability between aggregation rules. Specifically, Krum based attack can well transfer to trimmed mean and median, e.g., Krum based attack increases the error rate from 0.12 to 0.15 (25% relative increase) for trimmed mean, and from 0.13 to 0.18 (38% relative increase) for median. Trimmed mean based attack does not transfer to Krum but transfers to median well. For instance, trimmed mean based attack increases the error rates from 0.13 to 0.20 (54% relative increase) for median.

Krum Trimmed mean Median
No attack 0.14 0.12 0.13
Krum attack 0.70 0.15 0.18
Trimmed mean attack 0.14 0.25 0.20
Table 4: Transferability between aggregation rules. “Krum attack” and “Trimmed mean attack” mean that we craft the compromised local models based on the Krum and trimmed mean aggregation rules, respectively. Partial knowledge attacks are considered. The numbers are testing error rates.

4.4 Comparing with Back-gradient Optimization based Attack

Back-gradient optimization based attack (BGA) [munoz2017towards] is state-of-the-art untargeted data poisoning attack for multi-class classifiers such as multi-class LR and DNN. BGA formulates a bilevel optimization problem, where the inner optimization is to minimize the training loss on the poisoned training data and the outer optimization is to find poisoning examples that maximize the minimal training loss in the inner optimization. BGA iteratively finds the poisoned examples by alternately solving the inner minimization and outer maximization problems. We implemented BGA and verified that our implementation can reproduce the results reported by the authors. However, BGA is not scalable to the entire MNIST dataset. Therefore, we uniformly sample 6,000 training examples in MNIST, and we learn a 10-class LR classifier. Moreover, we assume 100 worker devices, randomly distribute the 6,000 examples to them, and assume 20 worker devices are compromised.

Generating poisoned data:  We assume an attacker has full knowledge about the training datasets on all worker devices. Therefore, the attacker can use BGA to generate poisoned data based on the 6,000 examples. In particular, we run the attack for 10 days on a GTX 1080Ti GPU, which generates 240 () poisoned examples. We verified that these poisoned data can effectively increase the testing error rate if the LR classifier is learnt in a centralized environment. In particular, the poisoned data can increase the testing error rate of the LR classifier from 0.10 to 0.16 (60% relative increase) in centralized learning. However, in federated learning, the attacker can only inject the poisoned data to the compromised worker devices. We consider two scenarios on how the attacker distributes the poisoned data to the compromised worker devices:

Single worker. In this scenario, the attacker distributes the poisoned data on a single compromised worker device.

Uniform distribution. In this scenario, the attacker distributes the poisoned data to the compromised worker devices uniformly at random.

We consider the two scenarios because they represent two extremes for distributing data (concentrated or evenly distributed) and we expect one extreme to maximize attack effectiveness. Table 5 compares BGA with our attacks. We observe that BGA has limited success at attacking Byzantine-robust aggregation rules, while our attacks can substantially increase the testing error rates. We note that if the federated learning uses the mean aggregation rule BGA is still successful. For instance, when the mean aggregation rule is used, BGA can increase the testing error rate by 50% when distributing the poisoned data to the compromised worker devices uniformly at random. However, when applying our attacks for trimmed mean to attack the mean aggregation rule, we can increase the testing error rates substantially more (see the last two cells in the second row of Table 5).

NoAttack SingleWorker Uniform Partial Full
Mean 0.10 0.11 0.15 0.54 0.69
Krum 0.23 0.24 0.25 0.85 0.89
Trimmed mean 0.12 0.12 0.13 0.27 0.32
Median 0.13 0.13 0.14 0.19 0.21
Table 5: Testing error rates of back-gradient optimization based attacks (SingleWorker and Uniform) and our attacks (Partial and Full).

5 Defenses

We generalize RONI [barreno2010security] and TRIM [Jagielski18], which were designed to defend against data poisoning attacks, to defend against our local model poisoning attacks. Both generalized defenses remove the local models that are potentially malicious before computing the global model in each iteration of federated learning. One generalized defense removes the local models that have large negative impact on the error rate of the global model (inspired by RONI that removes training examples that have large negative impact on the error rate of the model), while the other defense removes the local models that result in large loss (inspired by TRIM that removes the training examples that have large negative impact on the loss). In both defenses, we assume the master device has a small validation dataset. Like existing aggregation rules such as Krum and trimmed mean, we assume the master device knows the upper bound of the number of compromised worker devices. We note that our defenses make the global model slower to learn and adapt to new data as that data may be identified as from potentially malicious local models.

Error Rate based Rejection (ERR):  In this defense, we compute the impact of each local model on the error rate for the validation dataset and remove the local models that have large negative impact on the error rate. Specifically, suppose we have an aggregation rule. For each local model, we use the aggregation rule to compute a global model when the local model is included and a global model when the local model is excluded. We compute the error rates of the global models and on the validation dataset, which we denote as and , respectively. We define as the error rate impact of a local model. A larger error rate impact indicates that the local model increases the error rate more significantly if we include the local model when updating the global model. We remove the local models that have the largest error rate impact, and we aggregate the remaining local models to obtain an updated global model.

Loss Function based Rejection (LFR):  In this defense, we remove local models based on their impact on the loss instead of error rate for the validation dataset. Specifically, like the error rate based rejection, for each local model, we compute the global models and . We compute the cross-entropy loss function values of the models and on the validation dataset, which we denote as and , respectively. Moreover, we define as the loss impact of the local model. Like the error rate based rejection, we remove the local models that have the largest loss impact, and we aggregate the remaining local models to update the global model.

Union (i.e., ERR+LFR):  In this defense, we combine ERR and LFR. Specifically, we remove the local models that are removed by either ERR or LFR.

No attack Krum Trimmed mean
Krum 0.14 0.72 0.13
Krum + ERR 0.14 0.62 0.13
Krum + LFR 0.14 0.58 0.14
Krum + Union 0.14 0.48 0.14
Trimmed mean 0.12 0.15 0.23
Trimmed mean + ERR 0.12 0.17 0.21
Trimmed mean + LFR 0.12 0.18 0.12
Trimmed mean + Union 0.12 0.18 0.12
Median 0.13 0.17 0.19
Median + ERR 0.13 0.21 0.25
Median + LFR 0.13 0.20 0.13
Median + Union 0.13 0.19 0.14
Table 6: Defense results. The numbers are testing error rates. The columns “Krum” and “Trimmed mean” indicate the attacker’s assumed aggregation rule when performing attacks, while the rows indicate the actual aggregation rules and defenses. Partial knowledge attacks are considered.

Defense results:  Table 6 shows the defense results of ERR, FLR, and Union, where partial knowledge attacks are considered. We use the default parameter setting discussed in Section 1, e.g., 100 worker devices, 20% of compromised worker devices, MNIST dataset, and LR classifier. Moreover, we sample 100 testing examples uniformly at random as the validation dataset. Each row of the table corresponds to a defense, e.g., Krum + ERR means that the master device uses ERR to remove the potentially malicious local models and uses Krum as the aggregation rule. Each column indicates the attacker’s assumed aggregation rule when performing attacks, e.g., the column “Krum” corresponds to attacks that are based on Krum. We have several observations.

First, LFR is comparable to ERR or much more effective than ERR, i.e., LFR achieves similar or much smaller testing error rates than ERR. For instance, Trimmed mean + ERR and Trimmed mean + LFR achieve similar testing error rates (0.17 vs. 0.18) when the attacker crafts the compromised local models based on Krum. However, Trimmed mean + LFR achieves a much smaller testing error rate than Trimmed mean + ERR (0.12 vs. 0.21), when the attacker crafts the compromised local models based on trimmed mean. Second, Union is comparable to LFR in most cases, except one case (Krum + LFR vs. Krum and Krum + Union vs. Krum) where Union is more effective.

Third, LFR and Union can effectively defend against our attacks in some cases. For instance, Trimmed mean + LFR (or Trimmed mean + Union) achieves the same testing error rate for both no attack and attack based on trimmed mean. However, our attacks are still effective in other cases even if LFR or Union is adopted. For instance, an attack, which crafts compromised local models based on Krum, still effectively increases the error rate from 0.14 (no attack) to 0.58 (314% relative increase) for Krum + LFR. Fourth, the testing error rate grows in some cases when a defense is deployed. This is because the defenses may remove benign local models, which increases the testing error rate of the global model.

6 Related Work

Security and privacy of federated/collaborative learning are much less explored, compared to centralized machine learning. Recent studies [Hitaj17, Melis19, Nasr19] explored privacy risks in federated learning, which are orthogonal to our study.

Poisoning attacks:  Poisoning attacks aim to compromise the integrity of the training phase of a machine learning system [barreno2006can]. The training phase consists of two components, i.e., training dataset collection and learning process. Most existing poisoning attacks compromise the training dataset collection component, e.g., inject malicious data into the training dataset. These attacks are also known as data poisoning attacks[rubinstein2009antidote, biggio2012poisoning, xiao2015feature, Jagielski18, poisoningattackRecSys16, YangRecSys17, Nelson08poisoningattackSpamfilter, munoz2017towards, Suciu18, Gu17, Chen17, Bagdasaryan18, shafahi2018poison, Fang18, Wang19]. Different from data poisoning attacks, our local model poisoning attacks compromise the learning process.

Depending on the goal of a poisoning attack, we can classify poisoning attacks into two categories, i.e., untargeted poisoning attacks [rubinstein2009antidote, biggio2012poisoning, xiao2015feature, Jagielski18, poisoningattackRecSys16, YangRecSys17] and targeted poisoning attacks [Nelson08poisoningattackSpamfilter, Suciu18, Liu18, Gu17, Chen17, Bagdasaryan18, shafahi2018poison, Bhagoji19]. Untargeted poisoning attacks aim to make the learnt model have a high testing error indiscriminately for testing examples, which eventually result in a denial-of-service attack. In targeted poisoning attacks, the learnt model produces attacker-desired predictions for particular testing examples, e.g., predicting spams as non-spams and predicting attacker-desired labels for testing examples with a particular trojan trigger (these attacks are also known as backdoor/trojan attacks [Gu17]). However, the testing error for other testing examples is unaffected. Our local model poisoning attacks are untargeted poisoning attacks. Different from existing untargeted poisoning attacks that focus on centralized machine learning, our attacks are optimized for Byzantine-robust federated learning. We note that Xie et al. [Xie19] proposed inner product manipulation based untargeted poisoning attacks to Byzantine-robust federated learning including Krum and median, which is concurrent to our work.

Defenses:  Existing defenses were mainly designed for data poisoning attacks to centralized machine learning. They essentially aim to detect the injected malicious data in the training dataset. One category of defenses [Cretu08, barreno2010security, Suciu18, Tran18] detects malicious data based on their (negative) impact on the performance of the learnt model. For instance, Barreno et al. [barreno2010security] proposed Reject on Negative Impact (RONI), which measures the impact of each training example on the performance of the learnt model and removes the training examples that have large negative impact. Suciu et al. [Suciu18] proposed a variant of RONI (called tRONI) for targeted poisoning attacks. In particular, tRONI measures the impact of a training example on only the target classification and excludes training examples that have large impact.

Another category of defenses [Feng14, Liu17AiSec, Jagielski18, Steinhardt17] proposed new loss functions, optimizing which obtains model parameters and detects the injected malicious data simultaneously. For instance, Jagielski et al. [Jagielski18] proposed TRIM, which aims to jointly find a subset of training dataset with a given size and model parameters that minimize the loss function. The training examples that are not in the selected subset are treated as malicious data. These defenses are not directly applicable for our local model poisoning attacks because our attacks do not inject malicious data into the training dataset.

For federated learning, the machine learning community recently proposed several aggregation rules (e.g., Krum [Blanchard17], Bulyan [Mhamdi18], trimmed mean [Yin18], median [Yin18], and others [ChenPOMACS17]) that were claimed to be robust against Byzantine failures of certain worker devices. Our work shows that these defenses are not effective in practice against our optimized local model poisoning attacks that carefully craft local models on the compromised worker devices. Fung et al. [Fung18] proposed to compute weight for each worker device according to historical local models and take the weighted average of the local models to update the global model. However, their method can only defend against label flipping attacks, which can already be defended by existing Byzantine-robust aggregation rules. We propose ERR and LFR, which are respectively generalized from RONI and TRIM, to defend against our local model poisoning attacks. We find that these defenses are not effective enough in some scenarios, highlighting the needs of new defenses against our attacks.

Other security and privacy threats to machine learning:  Adversarial examples [barreno2006can, szegedy2013intriguing] aim to make a machine learning system predict labels as an attacker desires via adding carefully crafted noise to normal testing examples in the testing phase. Various methods (e.g., [szegedy2013intriguing, goodfellow2014explaining, Papernot16, CarliniSP17, laskov2014practical, sharif2016accessorize, liu2016delving, PracticalBlackBox17, Athalye18]) were proposed to generate adversarial examples, and many defenses (e.g., [goodfellow2014explaining, madry2017towards, Papernot16Distillation, detection2, detection1, xu2017feature, region]) were explored to mitigate them. Different from poisoning attacks, adversarial examples compromise the testing phase of machine learning. Both poisoning attacks and adversarial examples compromise the integrity of machine learning. An attacker could also compromise the confidentiality of machine learning. Specifically, an attacker could compromise the confidentiality of users’ private training or testing data via various attacks such as model inversion attacks [fredrikson2014privacy, fredrikson2015model], membership inference attacks [membershipInfer, membershipLocation, Melis19], and property inference attacks [Ateniese15, Ganju18]

. Moreover, an attacker could also compromise the confidentiality/intellectual property of a model provider via stealing its model parameters and hyperparameters 

[liang2016cracking, tramer2016stealing, WangHyper18].

7 Conclusion, Limitations, and Future Work

We demonstrate that the federated learning methods, which the machine learning community claimed to be robust against Byzantine failures of some worker devices, are vulnerable to our local model poisoning attacks that manipulate the local models sent from the compromised worker devices to the master device during the learning process. In particular, to increase the error rates of the learnt global models, an attacker can craft the local models on the compromised worker devices such that the aggregated global model deviates the most towards the inverse of the direction along which the global model would change when there are no attacks. Moreover, finding such crafted local models can be formulated as optimization problems. We can generalize existing defenses for data poisoning attacks to defend against our local model poisoning attacks. Such generalized defenses are effective in some cases but are not effective enough in other cases. Our results highlight that we need new defenses to defend against our local model poisoning attacks.

Our work is limited to untargeted poisoning attacks. It would be interesting to study targeted poisoning attacks to federated learning. Moreover, it is valuable future work to design new defenses against our local model poisoning attacks, e.g., new methods to detect compromised local models and new adversarially robust aggregation rules.

8 Acknowledgements

We thank the anonymous reviewers and our shepherd Nikita Borisov for constructive reviews and comments. This work was supported by NSF grant No.1937786.

References

Appendix A Attacking Bulyan

Bulyan is based on Krum. We apply our attacks for Krum to attack Bulyan. Table (b)b shows results of attacking Bulyan. The dataset is MNIST, the classifier is logistic regression, , , (Bulyan selects local models using Krum), and (Bulyan takes the mean of parameters). Our results show that our attacks to Krum can transfer to Bulyan. Specifically, our partial knowledge attack increases the error rate by around 150%, while our full knowledge attack increases the error rate by 165%.

Appendix B Deviation Goal

The deviation goal is to craft local models for the compromised worker devices via solving the following optimization problem in each iteration:

subject to
(5)

where is norm. We can adapt our attacks based on the directed deviation goal to the deviation goal. For simplicity, we focus on the full knowledge scenario.

Krum:  Similar to the directed deviation goal, we make two approximations, i.e., and the compromised local models are the same. Then, we formulate an optimization problem similar to Equation 3.2, except that is changed to . Like Theorem 1, we can derive an upper bound of , given which we use binary search to solve . After solving , we obtain . Then, we randomly sample vectors whose Euclidean distances to are smaller than as the other compromised local models.

Trimmed mean:  Theoretically, we can show that the following attack can maximize the deviation of the global model: we use any numbers that are larger than or smaller than , depending on which one makes the deviation larger, as the th local model parameters on the compromised worker devices. Like the directed deviation goal, when implementing the attack, we randomly sample the numbers in the interval [] (when ) or [] (when ), or in the interval [] (when ) or [] (when ), depending on which one makes the deviation larger.

Median: We apply the attack for trimmed mean to median.

Experimental results: Table 8 empirically compares the deviation goal and directed deviation goal, where MNIST and LR classifier are used. For Krum, both goals achieve high testing error rates. However, for trimmed mean and median, the directed deviation goal achieves significantly higher testing error rates than the deviation goal.

Appendix C Proof of Theorem 1

We denote by the set of local models among the crafted compromised local models and benign local models that are the closest to the local model with respect to Euclidean distance. Moreover, we denote by the set of benign local models that are the closest to with respect to Euclidean distance. Since is chosen by Krum, we have the following:

(6)

where represents Euclidean distance. The distance between and the other compromised local models is 0, since we assume they are the same in the optimization problem in Equation 3.2 when finding . Therefore, we have:

(7)

According to the triangle inequality , we get:

Since , we have:

(8)

The bound only depends on the before-attack local models.