Recently, parallel and distributed machine learning (ML) algorithms have been developed to scale up computations in large datasets and networked systems [bekkerman2011scaling], such as distributed spam filtering [metzger2003multiagent] and distributed traffic control [camponogara2003distributed]. However, they are inherently vulnerable to adversaries who can exploit them. For instance, nodes of distributed spam filter system can fail to detect spam email messages after an attacker modifies the training data [nelson2009misleading], or disrupts the network services using denial-of-service attacks [wood2002denial].
ML algorithms are often open-source tools and security is usually not the primary concerns of their designers. It is undemanding for an adversary to acquire the complete information of the algorithm, and exploits its vulnerabilities. Also, the growing reliance of ML algorithms on off-the-shelf information and communication technologies (ICTs) such as cloud computing and the wireless networks[drevin2007value] has made it even easier for adversaries to exploit the existing known vulnerabilities of ICTs to achieve their goals. Security becomes a more critical issue in the paradigm of distributed ML, since the learning consists of a large number of nodes that communicate using ICTs, and its attack surface grows tremendously compared to its centralized counterpart.
Hence, it is imperative to design secure distributed ML algorithms against cyber threats. Current research endeavors have focused on two distinct directions. One is to develop robust algorithms despite uncertainties in the dataset [globerson2006nightmare, dekel2010learning, maaten2013learning]. The second one is to improve detection and prevention techniques to defend against cyber threats, e.g., [serpanos2001defense, levine2006survey, kohno2009secure]. These approaches have mainly focused on centralized ML and computer networks, separately. The investigation of security issues in the distributed ML over networks is lacking.
The challenges of developing secure distributed machine learning arise from the complexity of the adversarial behaviors and the network effects of the distributed system. The attacker’s strategies can be multi-stage. For instance, he can reach his target to modify training data by launching multiple successful network attacks. Furthermore, the impact of an attack can propagate over the network. Uncompromised nodes can be affected by the misinformation from compromised nodes, leading to a cascading effect.
Traditional detection and defense strategies for centralized ML and networked systems are not sufficient to protect the distributed ML systems from attacks. To bridge this gap, this work aims to develop a game-theoretic framework to model the interactions of an attacker and a defender to assess and design security strategies in distributed systems. Game theory has been used to address the security issues in centralized ML, e.g., [liu2009game, kantarciouglu2011classifier], and those in computer networks [michiardi2002game, lye2005game, huang2019differential] and cyber-physical networks [chen2019dynamic, chen2019interdependent, chen2019optimal, nugraha2019subgame]. The proposed game-theoretic model captures the network structures of the distributed system and leads to fully distributed and iterative algorithms that can be implemented at each node as defense strategies.
In particular, we use game models to study the impact on consensus-based distributed support vector machines (DSVMs) of an attacker who is capable of modifying training data and labels. In [zhang2018tnnls, zhang2015fusion, zhang2016student], we have built a game-theoretic framework to investigate the impacts of data poisoning attacks to DSVMs, we have further proposed four defense methods and verified their effectiveness with numerical experiments in [zhang2019jaif]. In [zhang2017ciss], we have proposed a game-theoretic framework to model the interactions between a DSVM learner and an attacker who can modify the training labels.
In this paper, we extend our previous works by studying a broad range of attack models and classify them into two types. One is ML attacks, which exploit the vulnerabilities of ML algorithm, and the other one is Net-attacks, which arise from the vulnerabilities of the communication network. We then build the game-theoretic minimax problem to capture the competition between a DSVM learner and an attacker who can modify training data and labels. With alternating direction method of multipliers (ADMM)[boyd2011distributed, zhang2018ciss, zhang2019tsipn], we develop a fully distributed iterative algorithms where each compromised node operates its own sub-max-problem for the attacker and sub-min-problem for the learner. The game between a DSVM learner and an attacker can be viewed as a collection of small sub-games associated with compromised nodes. This unique structure enables the analysis of per-node-behaviors and the transmission of misleading information under the game framework.
Numerical results on Spambase data set [Spambase] are provided to illustrate the impact of different attack models by the attacker. We find that network plays a significant role in the security of DSVM, a balanced network with fewer nodes and higher degrees are less prone to attackers who can control the whole system. We also find that nodes with higher degrees are more vulnerable. The distributed ML systems are found to be prone to network attacks even though the attacker only adds small noise to information transmitted between neighboring nodes. A summary of notations in this paper is provided in the following table.
|Summary of Notations|
|, ,||Set of Nodes, Node , Set of Neighboring Nodes of Node|
|, ,||Decision Variables at Node|
|,||-th Data and Label at Node|
|,||Data Matrix and Label Matrix at Node|
|Consensus Variable between Node and Node|
|Indicator Vector of Flipped Labels at Node|
|Vector of Data Poisoning on the -th Data at Node|
Consider a distributed linear support vector machines learner in the network with representing the set of nodes. Node only communicates with his neighboring nodes . Note that without loss of generality, any two nodes in this network are connected by a path, i.e., there is no isolated node in this network. At every node , a labelled training set of size is available. The goal of the learner is to find a maximum-margin linear discriminant function at every node based on its local training set . Consensus constraints are used to force all local decision variables to agree across neighboring nodes. This approach enables each node to classify any new input to one of the two classes without communicating to other nodes . The discriminant function can be obtained by solving the following optimization problem:
In the above problem, the term
is the hinge loss function,is a tunable positive scalar for the learner. To solve Problem (1), we first define , the augmented matrix , and the diagonal label matrix . With these definitions, it follows readily that , is a matrix with its first
columns being an identity matrix, and itscolumn being a zero vector. Thus, Problem (1) can be rewritten as
where the consensus variable is used to decompose the decision variable to its neighbors . Note that is a identity matrix with its -st entry being . The term , which returns a vector of size . The algorithm of solving Problem (1) can be shown as the following lemma from Proposition 1 in [forero2010consensus].
With arbitrary initialization and , the iterations per node are:
Note that has been solved directly and plugged into each equations. and are Lagrange multipliers. The ADMM-DSVM algorithm is illustrated in Figure 1. Note that at any given iteration of the algorithm, each node computes its own local discriminant function for any vector as . Since we only need the decision variables for the discriminant function, we use the as a short-hand notation to represent iterations (3)-(5) at node :
The iterations stated in Lemma 1 are referred to as ADMM-DSVM. It is a fully decentralized network operation. Each node only shares decision variables to his neighboring nodes . Other DSVM approaches include distributed chunking SVMs [navia2006distributed] and distributed parallel SVMs [lu2008distributed], where support vectors (SVs) are exchanged between each nodes, and distributed semiparametric SVMs, where the centroids of local SVMs are exchanged among neighboring nodes [navia2006distributed]. In comparison, ADMM-DSVM has no restrictions on the network topology or the type of data, and thus, we use it as an example to illustrate the impact of the attacker on distributed learning algorithms.
Iii Attack Models and Related Works
In this section, we summarize and analyze possible attack models from the attacker. We start by identifying the attacker’s goal and his knowledge. Based on the three critical characteristics of information, the attacker’s goal can be captured as damageing the confidentiality, integrity and availability of the systems [mccumber1991information]. Damaging confidentiality indicates that the attacker intends to acquire private information. Damaging integrity means that the data or the models used by the learner are modified or replaced by the attacker, which can not represent real information anymore. Damaging availability indicates that the attacker, which is an unauthorized user, uses the information or services provided by the learner. In this paper, we assume that the attacker intends to damage the integrity of the distributed ML systems by modifying either the data or the models of the learner.
The impacts of the attacker are affected by the attacker’s knowledge of the learner’s systems. For example, an attacker may only know the data used by the learner, but he does not know the algorithm the learner use; or he may only know some nodes of the network. To fully capture the damages caused by the attacker, in this paper, we assume that the attacker has a complete knowledge of the learner, i.e., the attacker knows the learner’s data and algorithm and the network topology.
The attack models on distributed ML learner can be summarized into two main categories. One is the machine learning type attacks (ML-attacks) [barreno2010security], the other one is the network type attacks (Net-attacks) [chi2001network]. In the ML-attacks, the attacker can exploit machine learning systems which produces classification or prediction errors. In the Net-attacks, an adversary attacks a networked system to compromise the security of this system by actions, which leads to the leak of private information or the failure of operations in this network. In this paper, we further divide ML-attacks into two sub-categories, training attacks and testing attacks. Note that the attack models described here are generally applicable to different machine learning algorithms. The focus of this work is to investigate the impact of these attack models on DSVM, which provides fundamental insights on the inherent vulnerability of distributed machine learning.
Iii-a Training Attacks
In the training attacks, an adversary attacks the learner at the time when the learner solves Problem (1). In these attacks, communications in the network may lead to unanticipated results as misleading information from compromised nodes can be spread to and then used by uncompromised nodes. One challenge of analyzing training attacks is that the consequences of attacker’s actions may not be directly visible. For example, assuming that the attacker modifies some training data in node , the learner may not be able to find out which data has been modified, and furthermore, in distributed settings, the learner may not even be able to detect which nodes are under attack. We further divide training attacks into three categories based on the scope of modifications made by the attacker.
Iii-A1 Training Labels Attacks
In this category, the attacker can modify the training labels , where . After training data with flipped labels, the discriminant functions will be prone to give wrong labels to the testing data. In early works [biggio2011support, xiao2012adversarial], centralized SVMs under training label attacks have been studied, and robust SVMs have been brought up to reduce the effects of such attacks. In this work, we further extend such attack models to distributed SVM algorithms, and we use game theory to model the interactions between the DSVM learner and the attacker. We verify the effects of the attacker with numerical experiments.
Iii-A2 Training Data Attacks
In this category, the attacker modifies the training data on compromised nodes . Since the training and testing data are assumed to be generated from the same distribution, the discriminant function found with training data on distribution can be used to classify testing data from the same distribution . However, after modifying training data into , which belongs to a different distribution , the discriminant function with training such crafted data is suitable to classify data of distribution . Thus, the testing data are prone to be misclassified with this discriminant function.
The attacker can delete or craft several features [dekel2010learning, maaten2013learning], change the training data of one class [dalvi2004adversarial], add noise to training data [biggio2011bagging], change the distributions of training data [liu2009game], inject crafted data [biggio2012poisoning], and so on. However, these works aim at centralized machine learning algorithms. In distributed algorithms, the information transmissions between neighboring nodes can make uncompromised nodes to misclassify testing data after training with information from compromised nodes. In [zhang2015fusion], an attacker aims at reducing a DSVM learner’s classification accuracy by changing training data in node into . This work shows that the performances of DSVM under adversarial environments are highly dependent on network topologies.
Iii-A3 Training Models Attacks
The DSVM learner aims to find the discriminant function with the lowest risks by solving Problem (1) with local training sets . However, the attacker may change Problem (1) into a different problem or he can modify parameters in Problem (1). For example, when the attacker changes in Problem (1) into , the learner can only find , which does not depend on the distribution of training data, and thus, the learner will misclassify input testing data in the same distribution.
With training attacks, the DSVM leaner will find wrong decision variables and in compromised node . However, the consensus constraints force all the decision variables to agree on each other. Hence uncompromised nodes with correct decision variables will be affected by misleading decision variables from compromised nodes. As a result, the training process in the network can be damaged even the attacker only attacks a small number of nodes.
Iii-B Testing Attacks
In testing attacks, the attacker attacks at the time when the DSVM learner labels new input into or with and from the solved Problem (1). The attacker can conduct three different operations in testing attacks.
Firstly, the attacker can directly change the given label of the testing data into , and thus a wrong label is generated with this operation. Secondly, the attacker can replace the testing data with crafted , or he can modify that into [biggio2013evasion]. In such cases, the learner gives the label of rather than the label of , which leads to misclassification. Thirdly, the attacker can modify or replace and into and , and thus the compromised discriminant function will yield wrong results. A simple example is that the attacker can set and , thus , which leads to contrary predictions.
Testing attacks can cause disastrous consequences in centralized machine learning algorithms as there is only one discriminant function. However, Testing attacks are weak in distributed machine learning algorithms as uncompromised nodes can still give correct predictions, and compromised nodes can be easily detected as they have higher classification risks.
Iii-C Network Attacks
Network attacks can pose a significant threat with potentially severe consequences on networked systems [chi2001network]. Since distributed machine learning algorithms have been developed to solve large-scale problems using networked systems [forero2010consensus], network attacks can also cause damages on distributed learning systems. Network attacks include node capture attacks, Sybil attacks, and Man-In-The-Middle attacks, which are illustrated in Figure 2.
In node capture attacks (Figure 2(b)), an attacker gains access into the network, and controls a set of nodes, then he can alter both software programming and hardware configuration, and influence the outcome of network protocols [tague2008modeling]. When the attacker conducts node capture attacks on a DSVM learner, he can modify either the data in the compromised nodes, or the algorithms of the learner. Both of the modifications can lead to misclassifications in compromised nodes. Moreover, the attacker can also send misleading information through network connections between neighboring nodes, thus even uncompromised nodes can be affected by the attacker.
In Sybil attacks (Figure 2(c)), the attacker can create an adversarial node in the middle of a connection, and talk to both nodes and pretends to be the other node [newsome2004sybil]. If such attacks happen on the network of a DSVM learner, nodes in compromised connections will receive misleading information, and thus, the classifications in these nodes will be damaged.
In Man-in-the-Middle (MITM) attacks (Figure 2(d)), the attacker can create an adversarial node in the middle of a connection, talk to both nodes and pretend to be the other node [desmedt2011man]. If such attacks happen on the network of a DSVM learner, nodes in compromised connections will receive misleading information, and thus classifications in these nodes will be damaged.
There are many other network attack models, such as botnet attacks [zhang2011survey] and denial of service [wood2002denial], which makes it challenging to analyze and defend attacker’s behaviors. Though distributed systems improve the efficiency of the learning algorithms, however, the systems becomes more vulnerable to network attacks. Thus, it is important to design secure distributed ML algorithms against potential adversaries.
With various types of attack models, a distributed machine learning learner can be vulnerable in a networked system. Thus, it is important to design secure distributed algorithms against potential adversaries. Since the learner aims to increase the classification accuracy, while the attacker seeks to reduce that accuracy, game theory becomes a useful tool to capture the conflicting interests between the learner and the attacker. The equilibrium of this game allows us to predict the outcomes of a learner under adversary environments. In the next section, we build a game-theoretic framework to capture the conflicts between a DSVM learner and an attacker who can modify training labels and data.
Iv A Game-Theoretic Modeling of Attacks
In this section, we use training attacks as an example to illustrate the game-theoretic modeling of the interactions between a DSVM learner and an attacker. We mainly focus on modelling training labels attacks, and the training data attacks.
Iv-a Training Labels Attacks
In training labels attacks, the attacker controls a set of nodes , and aims at breaking the training process by flipping training labels to . To model the flipped labels at each node, we first define the matrix of expanded training data and the diagonal matrix of expanded labels . Note that the first data and the second data are the same in the matrix of expanded training data. We further introduce corresponding indicator vector , where , and , for [xiao2012adversarial]. indicates whether the label has been flipped, for example, if for , i.e., , then the label of data has been flipped. Note that indicates that there is no flipped label.
Since the learner aims to minimize the classification errors by minimizing the objective function in Problem (2), the attacker’s intention to maximize classification errors can be captured as maximizing that objective function. As a result, the interactions of the DSVM learner and the attacker can be captured as a nonzero-sum game. Furthermore, since they have same objective functions with opposite intentions, the nonzero-sum game can be reformulated into a zero-sum game, which takes the minimax or max-min form [zhang2017ciss]:
In (7), denotes a vector of size , denotes the set of uncompromised nodes. Note that the first two terms in the objective function and Constraints (7a) are related to the min-problem for the learner. When , i.e., there are no flipped labels, the min-problem is equivalent to Problem (2). The last two terms in the objective function and Constraints (7c)-(7e) are related to the max-problem for the attacker. Note that the last term of the objection function represents the number of flipped labels in compromised nodes. By minimizing this term, the attacker aims to create the largest impact by flipping the fewest labels. In Constraints (7c), , where indicates the cost for flipping the labels of in node . This constraint indicates that the capability of the attacker is limited to flip labels with a boundary at a compromised node . Constraints (7d) show that the labels in compromised nodes are either flipped or not flipped.
The Minimax-Problem (7) captures the learner’s intention to minimize the classification errors with attacker’s intention of maximizing that errors by flipping labels. Problem (7) can be also written into a max-min form, which captures the attacker’s intention to maximize the classification errors while the learner tries to minimize it. By minimax theorem, the minimax form in (7) is equivalent to its max-min form. Thus, solving Problem (7) can be interpreted as finding the saddle-point equilibrium of the zero-sum game between the learner and the attacker.
Let and be the action sets for the DSVM learner and the attacker respectively. Then, the strategy pair is a saddle-point equilibrium solution of the zero-sum game defined by the triple , if , where is the objective function in Problem (7).
To solve Problem (7), we construct the best response dynamics for the max-problem and min-problem separately. The max-problem and the min-problem can be achieved by fixing and , respectively. With solving both problems in a distributed way, we achieve the fully distributed iterations of solving Problem (7) as
) is a linear programming problem. Note that integer constraint (7e) has been further relaxed into (8c). It captures the attacker’s actions in compromised nodes . Note that each node can achieve their own without transmitting information to other nodes. Problem (9) comes from the min-part of Problem (7), which can be solved using similar method in [forero2010consensus] with ADMM [boyd2011distributed]. Note that differs from only in the feasible set of the Lagrange multipliers , In , , where indicates whether the label has been flipped, and it comes from the attacker’s Problem (8).
Iv-B Training Data Attacks
In training data attacks, the attacker has the ability to modify the training data into in compromised node . Following a similar method in training label attacks, we can capture the interactions of the DSVM learner and the attacker as a zero-sum game which is shown as follows:
Note that in the last term, norm denotes the number of elements which are changed by the attacker, and deleting it captures the attacker’s intentions to maximizing the classification errors with changing least number of elements. Constraint (10b) indicates that the sum of modifiactions in node are bounded by . Following a similar method in training labels attacks, we can construct the iterations of solving Problem (10) as follows [zhang2015fusion, zhang2016student, zhang2018tnnls]:
Note that here has been summed into , which captures the modifications in node . Constraints (11ab) and the last term of the objective function is the relaxation of the norm. Note that comparing to , has , where comes from Problem (11).
In this section, we have modeled the conflicting interests of a DSVM learner and an attacker using a game-theoretic framework. The interactions of them can be captured as a zero-sum game where a minimax problem is formulated. The minimization part captures the learner’s intentions to minimize classification errors, while the maximization part captures the attacker’s intentions to maximize that errors with making less modifications, i.e., flipping labels and modifying data. By constructing the min-problem for the learner and max-problem for the attacker separately, the minimax problem can be solved with the best response dynamics. Furthermore, the min-problem and max-problem can be solved in a distributed way with Sub-Max-Problems (8) and (11), and Sub-Min-Problems (9) and (12). Combing the iterations of solving these problems, we have the fully distributed iterations of solving the Minimax-Problems (7) and (10). The nature of this iterative operations provides real-time mechanisms for each node to reacts to its neighbors and the attacker. Since each node operates its own sub-max-problem and sub-min-problems, the game between a DSVM learner and an attacker now can be represented by sub-games in compromised nodes. This structure provides us tools to analyze per-node-behaviors of distributed algorithms under adversarial environments. The transmissions of misleading information can be also analyzed via the connections between neighboring games.
V Impact of Attacks
In this section, we present numerical experiments on DSVM under adversarial environments. We will verify the effects of both the training attacks and the network attacks. The performance of DSVM is measured by both the local and global classification risks. The local risk at node at step is defined as follows:
where is the true label, is the predicted label, and represents the number of testing samples in node . The global risk is defined as follows:
A higher global risk shows that there are more testing samples being misclassified, i.e., a worse performance of DSVM.
We define the degree of a node as the actual number of neighboring nodes divided by the most achievable number of neighbors . The normalized degree of a node is always larger than 0 and less or equal to 1. A higher degree indicates that the node has more neighbors. We further define the degree of the network as the average degrees of all the nodes.
V-a DSVM Under Training Attacks
Recall the attacker’s constraints in training labels attacks, where indicates the cost for flipping labels in node . This constraint indicates that the attacker’s modifications in node are bounded by . Without loss of generality, we assume that , and thus, the constraint now indicates that the number of flipped labels in node are bounded by . We also assume that the attacker has the same and in every compromised node . Note that we assume that the learner has and in every experiments.
. The data are generated from two-dimensional Gaussian distributions with mean vectorsand , and same covariance matrix . indicates the number of compromised nodes. indicates the largest number of training samples that can be flipped in each node. indicates the cost parameter. Blue filled circles and red filled circles are the original samples with class and , respectively. Blue hollow circles and red hollow circles are the flipped samples with class and , i.e, originally and respectively. We can see from the figures that when the attacker attacks more nodes, the equilibrium global risk is higher, and the uncompromised nodes have higher risks (i.e., Figures 3 and 4). When is large, the attacker can flip more labels, and thus the risks are higher (i.e., Figures 3 and 5). When is large, the cost for the attacker to take an action is too high, and thus there is no impact on the learner (i.e., Figures 3 and 6). Note that, when the attacker has large capabilities, i.e., larger , smaller or larger , even uncompromised nodes, e.g., Node , has higher risks.
Tables I and II show the results when the learner trains on different networks. and indicate training labels attacks and training data attacks, respectively. Networks , and have nodes, where each node contains training samples. Network and are balanced networks with degree and , respectively. The normalized degrees of each node in Network are . Network has nodes where each node contains training samples and has neighbors. In (resp. ) and (resp. ), the attacker attacks nodes with (resp. ). In (resp. ) and (resp. ), the attacker attacks (resp. ) nodes with higher degrees and lower degrees with (resp. ). In (resp. ), the attacker attacks nodes with (resp. ). Comparing (resp. ) with (resp. ), we can see that network with higher degree has a lower risk when there is an attacker. From (resp. ) and (resp. ), we can tell that the risks are higher when nodes with higher degrees are under attack. From (resp. ) and (resp. ), we can see that network with more nodes has a higher risk when it is under attack. Thus, from Tables I and II, we can see that network topology plays an important role in the security of the ML learner.
V-B DSVM Under Network Attacks
In this subsection, we use numerical experiments to illustrate the impact of network attacks. In node capture attacks, we assume that the attacker controls a set of nodes , and he has the ability to add noise to the decision variables . In Sybil attacks, the attacker can obtain access to compromised node , then he generates another malicious node to exchange information with the compromised node. Instead of sending random information, which can be easily detected, we assume that he sends a perturbed to make compromised node believe that this is a valid information, where comes from the compromised node . In MITM attacks, we assume that the attacker creates adversarial nodes in the middle of a connection, and he receives from both sides, but he sends a perturbed to the other sides.
In the experiment in Figure 7
, we assume that the elements of the noise vector are generated by a uniform distribution in, where indicates the size of the noise. Network 1 and 2 are fully connected networks with nodes and nodes, respectively. In node capture attacks and Sybil attacks, the x-axis indicates the number of compromised nodes. The attacker has for node capture attacks and Sybil attacks, respectively. In MITM attacks, the the x-axis represents the number of compromised connections, especially, indicates only connection has been broken, and for network and indicates that all the connections, i.e., and connections, have been attacked. Note that the attacker has for network 1 and 2 in MITM attacks, respectively . From the figure, we can see that when the attacker attacks more nodes or connections, the risk becomes larger. Note that, when more than half of the nodes or connections are compromised, the DSVM will completely fail, i.e, the classifier is the same as the one that randomly labels testing data.
Vi Discussions and Future Work
Distributed machine learning (ML) algorithms are ubiquitous but inherently vulnerable to adversaries. This paper has investigated the security issue of distributed machine learning in adversarial environments. Possible attack models have been analyzed and summarized into machine learning type attacks and network type attacks. Their effects on distributed ML have been studied using numerical experiments.
One major contribution of this work is the investigation of the security threats in distributed ML. We have shown that the consequence of both ML-type attacks and Network-type attacks can exacerbate in distributed ML. We have established a game-theoretic framework to capture the strategic interactions between a DSVM learner and an attacker who can modify training data and training labels. By using the technique of ADMM, we have developed a fully distributed iterative algorithms that each node can respond to his neighbors and the adversary in real-time. The developed framework is broadly applicable to a broad range of distributed ML algorithms and attack models.
Our work concludes that network topology plays a significant role in the security of distributed ML. We have recognized that a balanced network with fewer nodes and a higher degree is less vulnerable to the attacker who controls the whole system. We have shown that nodes with higher degrees can cause more severe damages to the system when under attack. Also, network-type attacks can increase the risks of the whole networked systems even though the attacker only add small noise to a small subset of nodes.
Our work is a starting point to explore the security of distributed ML. In the future, we aim to extend the game-theoretic framework to capture various attack models and different distributed ML algorithms. We have proposed four defense methods in [zhang2019jaif] to protect DSVM against training data attacks, and we intend to explore other defense strategies for distributed ML by leveraging techniques developed in robust centralized algorithms [globerson2006nightmare, biggio2011support] and detection methods [levine2006survey, chen2008securing]. Furthermore, we could use cyber insurance to mitigate the cyber data risks from attackers [zhang2019flipin, zhang2017jsac].