The problem of collaborative optimization in multi-agent systems has gained significant attention in recent years [6, 18, 12, 21, 22]. In this problem, each agent knows its own local objective (or cost) function. In the fault-free setting, all the agents are non-faulty (or honest), and the goal is to design a distributed (or collaborative) algorithm to compute a minimum of the aggregate of their local cost functions. We refer to this problem as collaborative optimization. Specifically, we consider a system of agents where each agent has a local real-valued cost function that maps a point in
-dimensional real-valued vector space (i.e.) to a real value. Unless otherwise stated, the cost functions are assumed to be convex111As noted later in Section 5, some of our results are valid even when the cost functions are non-convex. . The goal of collaborative optimization is to determine a global minimum , such that
Throughout the report, we use the shorthand ‘’ for ‘’, unless otherwise mentioned.
As a simple example, may denote the cost for an agent (which may be a robot or a person) to travel to location from its current location. In this case,
is a location that minimizes the total cost for all the agents. Such multi-agent collaborative optimization is of interest in many practical applications, including collaborative machine learning[5, 6, 14], swarm robotics , and collaborative sensing . Most of the prior work assumes all the agents to be non-faulty. Non-faulty agents follow a specified algorithm correctly. In our work we consider a scenario wherein some of the agents may be faulty and may behave incorrectly.
Su and Vaidya  introduced the problem of collaborative optimization in the presence of a Byzantine faulty agents. A Byzantine faulty agent may behave arbitrarily . In particular, the faulty agents may send incorrect and inconsistent information in order to bias the output of a collaborative optimization algorithm, and the faulty agents may also collaborate with each other. For example, consider an application of multi-agent collaborative optimization to the case of collaborative sensing where the agents (or sensors) are observing a common object in order to collectively identify the object. However, the faulty agents may send arbitrary observations concocted to prevent the non-faulty agents from making the correct identification [9, 11, 20]. Similarly, in the case of collaborative learning, which is another application of multi-agent collaborative optimization, the faulty agents may send incorrect information based on mislabelled or arbitrary concocted data points to prevent the non-faulty agents from learning a goodclassifier [1, 2, 4, 8, 10, 30].
1.1 System architecture
The contributions of this paper apply to two different system architectures illustrated in Figure 1. In the server-based architecture, the server is assumed to be trustworthy, but up to agents may be Byzantine faulty. The trusted server helps solve the distributed optimization problem in coordination with the agents. In the peer-to-peer architecture, the agents are connected to each other by a complete network, and up to of these agents may be Byzantine faulty.
Provided that , any algorithm for the server-based architecture can be simulated in the peer-to-peer system using
the well-known Byzantine broadcast primitive .
For the simplicity of presentation, the rest of this report assumes the server-based architecture.
1.2 Resilience in collaborative optimization
As stated above, we will assume the server-based architecture in the rest of our discussion.
We assume that up to of the agents may be Byzantine faulty, such that .
We assume that each agent has a “true” cost function. Unless otherwise noted, each such cost function is assumed to be convex.
If an agent is non-faulty, then its behavior is consistent with its true cost function, say . For instance, if agent is required to send to the server the value of its cost function at some point , then a non-faulty agent will indeed send .
If an agent is faulty, then its behavior can be arbitrary, and not necessarily consistent with its true cost function, say . For instance, if agent is required to to send to the server the value of its cost function at some point , then a faulty agent may send an arbitrary value instead of .
Clearly, when an agent is faulty, it may not share with the server correct information about its true cost function. However, it is convenient to define its true cost function as above, which is the cost function it would use in the absence of its failure.
Throughout this report, we assume the existence of a finite minimum for the aggregate of the true cost functions of the agents. Otherwise, the objective of collaborative optimization is vacuous. Specifically, we make following technical assumption.
Suppose that the true cost function of each agent is . Then, for every non-empty set of agents , we assume that there exists a finite such that .
Suppose that the true cost function of agent is . Then, ideally, the goal of collaborative optimization is to compute a minimum of the aggregate of the true cost functions of all the agents, , even if some of the agents are Byzantine faulty. In general, this may not feasible since the Byzantine faulty agents can behave arbitrarily. To understand the feasibility of achieving some degree of resilience to Byzantine faults, we consider two cases.
Independent functions: A set of cost functions are independent if information about some of the functions in the set does not help learn any information about the remaining functions in the set. In other words, the cost functions do not contain any redundancy.
Redundant functions: Intuitively speaking, a set of cost functions includes redundancy when knowing some of the cost functions helps to learn some information about the remaining cost functions. As a trivial example, consider the special case when it is known that there exists some function such that is the true cost function of every agent. In this case, knowing the true cost function of any agent suffices to learn the true cost functions of all the agents. Also, any value that minimize an individual agent’s true cost function also minimizes the total true cost over all the agents.
Su and Vaidya  defined the goal of fault-tolerant collaborative optimization as minimizing the aggregate of cost functions of just the non-faulty agents. Specifically, if is the true cost function of agent , and denotes the set of non-faulty agents in a given execution, then they defined the goal of fault-tolerant optimization to be to output a point in
We refer to the above goal as -resilience, formally defined below.
Definition 1 (-resilience).A collaborative optimization algorithm is said to be -resilient if it outputs a minimum of the aggregate of the true cost functions of the non-faulty agents despite up to agents being Byzantine faulty.
In general, Su and Vaidya  showed that, because the identity of the faulty agents is a priori unknown, a -resilient algorithm may not necessarily exist. In this report, we provide an exact characterization of the condition under which -resilience is achievable. In particular, we show that -resilience is achievable if and only if the agents satisfy a property named -redundancy, defined next.222 The notion of -redundancy can be extended to -redundancy by replacing in Definitions 2 and 3 by . The definitions below are vacuous if . Henceforth, we assume that the maximum number of faulty agents are in the minority, i.e., .
Definition 2 (-redundancy).
Let denote the true cost function of agent . The agents are said to satisfy -redundancy if the following holds for every two subsets and each containing agents.
The above definition of -redundancy is equivalent to the definition below, as shown in Appendix B.
Definition 3 (-redundancy).
Let denote the true cost function of agent . The agents are said to satisfy -redundancy if the following holds for any sets of agents and such that , , and .
Note that the -resilience property pertains the point in that is the output of a collaborative optimization
algorithm. -resilience property does not explicitly
impose any constraints on the function value.
The notion of -weak resilience stated below relates to function values.
Definition 4 (-weak resilience).Let denote the true cost function of agent . Let denote the set of all non-faulty agents. Then, a collaborative optimization algorithm is said to be -weak resilient if it outputs a point for which there exists a subset of such that , and
It can be shown easily that -weak resilience implies -resilience. The proof is deferred to Section 3. In many applications of multi-agent collaborative optimization, such as distributed machine learning, distributed sensing or hypothesis testing and swarm robotics, the cost functions are non-negative [5, 6, 14, 21, 22]. We show that if the cost functions of the non-faulty agents are non-negative and independent then -weak resilience is impossible if ; moreover, under these conditions, we present an algorithm that guarantees -weak resilience if .
1.3 Prior Work
The prior work on resilience in collaborative multi-agent optimization by Su and Vaidya, 2016 , and Sundaram and Gharesifard, 2018 , only consider the special class of univariate cost functions, i.e, dimension equals one. On the other hand, we consider the general class of multivariate cost functions, i.e., can be greater than one. Specifically, they have proposed algorithms that output a minimum of the non-uniformly weighted aggregate of the non-faulty agents’ cost functions when . However, their proposed algorithms do not extend easily for the case when . On the other hand, the algorithms and the fault-tolerance results presented in this report are valid regardless of the value of the dimension as long as it is finite.
Su and Vaidya have also considered a special case where the true cost functions of the agents are convex combinations of a finite number of basis convex functions in . They have shown that if the basis functions have a common minimum then a minimum point (as in (2)) can be computed accurately. This property of redundancy in the minimum of the basis functions, we note, is a special case of the -redundancy property that we prove necessary and sufficient for -resilience in this report. Other prior work related to the 2t-redundancy property is discussed in Section 2.2.
Yang and Bajwa, 2017  consider a very special case of collaborative optimization problem. They assume that the multivariate cost functions that can be split into independent univariate strictly convex functions. For this special, they have extended the fault-tolerance algorithm of Su and Vaidya, 2016  for approximate resilience. In general, however, the agents’ cost functions do not satisfy such specific properties. In this report, we do not make such assumptions about the agents’ cost functions. We only assume the cost functions to be convex, differentiable and that the minimum of their sum is finite (i.e., Assumption 1). Note that these assumptions are fairly standard in the optimization literature, and are also assumed in all of the aforementioned prior work.
Outline of the report: The rest of the report is organized as follows. In Section 2, we present the case when the cost functions have redundancy. In Section 3, we present the case when the cost functions are independent. In Section 4, we summarize a gradient-based algorithm for -resilience, which was proposed in our prior work . In Section 5, we discuss direct extension of our results to the case when the cost functions are non-differentiable and non-convex. In the same section, we also present a summary of our results.
2 The Case of Redundant Cost Functions
This section presents the key result of this report for the case when the cost functions are redundant. Unless otherwise mentioned, in the rest of the report, the cost functions are assumed to be differentiable, i.e., their gradients exist at all the points in . Indeed, the cost functions are differentiable for most aforementioned applications of collaborative optimization [5, 6, 21, 22]. Nevertheless, as elaborated in Section 5, some of our results are also applicable for non-differentiable cost functions.
Before we present Theorem 1 below which states the key result of this section, in Lemma 2 we present an alternate, and perhaps more natural, equivalent condition of the -redundancy property for the specific case when the agents’ cost functions are differentiable. The proof of Lemma 2 uses Lemma 1 stated below.
Suppose that Assumption 1 holds true, and . For a non-empty set , consider a set of functions , , such that
Appendix A presents the proof of the above lemma.
Suppose that Assumption 1 holds true, and . When the true cost functions of the agents are convex and differentiable then the -redundancy property stated in Definition 2 or Definition 3 is equivalent to the following condition:
A point is a minimum of the sum of true cost functions of the non-faulty agents if and only if that point is a minimum of the sum of the true cost functions of any non-faulty agents.
Let the true cost function of each agent be denoted by . Recall that there can be at most Byzantine faulty agents. Let with be the set of the non-faulty agents.
The condition stated in the lemma is equivalent to saying that for every subset of of size ,
Consider two arbitrary agents in , and then consider two size subsets and of such that , , and
The above equality and (8) imply that
This equality can be proven for any . As the true cost functions are assumed convex, from above we obtain,
Therefore, for every subset of of size ,
The above implies that for every subset of of size ,
Part II: We now show that the condition in Definition 3 implies the condition stated in the lemma. Now, (i.e., the right side of (4)) is a non-empty set due to Assumption 1. This and (4) imply that for every subset of size ,
Therefore, by Lemma 1,
The following theorem presents the main result of this section.
Suppose that Assumption 1 holds true, and . When the true cost functions of the agents are convex and differentiable then -resilience can be achieved if and only if the agents satisfy the -redundancy property.
The case of =0 is trivial, since there are no faulty agents.
In the rest of the proof, we assume that .
Sufficiency of -redundancy: Sufficiency of -redundancy is proved constructively using the
algorithm presented in Section 2.1. In particular, the algorithm
is proved to achieve -resilience if -redundancy holds.
Necessity of -redundancy:
We consider the worst-case scenario where arbitrary agents are faulty. Suppose that -resilience can be achieved using an algorithm named . Consider an execution of in which set with is the actual set of non-faulty agents. All the remaining agents in the set are the actual faulty agents. Suppose that the true cost function of agent in execution is . We assume that the functions are differentiable and convex.
In any -resilient algorithm for collaborative optimization, the server can communicate with the agents and learn some information about their local cost functions. The most information the server can learn about the cost function of an agent is the complete description of its local cost function. To prove the necessity of -redundancy, we assume that the server knows a cost function reported by each non-faulty agent .
Now consider the following executions.
In execution , all the agents are non-faulty. Let denote the set of all agents, which happen to be non-faulty in execution . Thus, . The true cost function of agent is , identical to its true cost function in execution .
In execution , where , agent is Byzantine faulty, and all the remaining agents are non-faulty. Let denote the set of agents that happen to be non-faulty in execution . In execution , the true cost function of each non-faulty agent is , which is identical to its true cost function in execution . Let the true cost function of faulty agent in execution be a differentiable and convex function . Assume that the functions , , and are independent. In execution , suppose that the behavior of faulty agent from the viewpoint of the server is consistent with the cost function (which equals the true cost function of agent in execution ).
Fix a particular , . From the viewpoint of the server, execution and execution are indistinguishable. Thus, the -resilient algorithm will produce an identical output in these executions; suppose that this output is . As is assumed to be -resilient, we have by Definition 1 and Assumption 1,
For a differentiable cost function , we denote its gradient at a point by . Let denote the zero-vector of dimension . If then
Recall that . Therefore,
As the cost functions are assumed to be convex, the above implies that,
By repeating the above argument for each , we have
Similarly, for every non-empty set of agents
Thus, . Then, Lemma 1 implies that
Now we consider execution (defined earlier) in which the nodes in set are non-faulty. Using the results derived in the proof so far,333Footnote 2 noted that the notion of -redundancy can be extended to -redundancy. The proof so far has relied only on 1-redundancy, which is weaker than -redundancy. The latter part of this proof makes use of -redundancy. we will show that, for any subject to ,
The proof concludes once we have shown the above equality.
Consider an arbitrary subset subject to . It is trivially true that
So it remains to show that is not a strict subset of . The proof below is by contradiction.
This implies that there exists a point
Therefore, there exists an such that
Let and . Then . Now we define executions and .
Execution : In execution the agents in set are faulty, and the agents in set are non-faulty. In execution , the behavior of each agent is consistent with its true cost function being , which is identical to its true cost function in execution . However, each faulty agent behaves consistent with a differentiable and convex true cost function that has a unique minimum at .
Execution : In execution the agents in set are faulty, and the remaining agents in are non-faulty. In execution , the behavior of each agent (including the faulty agents in ) is consistent with the cost function . Each non-faulty agent behaves consistent with it true cost function being , which is defined in execution . Recall that each has a unique minimum at .
Observe that the server cannot distinguish between executions and .
Now, (21) implies that does not minimize at any point in . That is, for every agent ,
As is -resilient, in execution , algorithm must produce an output in
(Recall that the agents in are non-faulty in execution .)
The above two equations imply that , and cannot output in execution (otherwise cannot be -resilient).
This is a contradiction.
Therefore, we have proved that is not a strict subset of .
Above result together with (18) implies that
Recall that is an arbitrary subset of with . Therefore, the above implies that for every subset of with ,
This together with (17) implies that
Thus, if is -resilient then the true cost functions of the agents satisfy the -redundancy property as stated in Definition 3. Hence, proving the necessity of -redundancy property for -resilience. ∎
The following collaborative optimization algorithm proves the sufficiency of -redundancy for -resilience.
2.1 A -resilient algorithm
We present an algorithm and prove that it is -resilient if the agents satisfy the -redundancy property stated in Definition 2 or 3. We will suppose that Assumption 1 holds true and . We only consider the case when , since the case of is trivial.
The server collects full description of the cost function of each agent. Suppose
that the server obtains cost function from each agent .
For each non-faulty agent , is the agent’s true objective function.
The proposed algorithm outputs a point such that there exists a set of agents such that for any with ,
If there are multiple candidate points that satisfy the condition above, then any one such point is chosen as the output.
Now we prove the correctness of the above algorithm if -redundancy holds.
Assume that the -redundancy property holds. First we observe that the algorithm will always be able to output a point if -redundancy is satisfied. Let denote the set of all non-faulty agents. Recall that . In particular, consider a set that consists of any non-faulty agents, that is, . For any where , due to -redundancy (Definition 3) and Assumption 1, we have
This implies that every point in
is a candidate for the output of the algorithm. Additionally,
due to Assumption 1,
is guaranteed to be non-empty.
Thus, the algorithm will always produce an output.
Next we show that the algorithm achieves -resilience. Consider any set for which the condition in the algorithm is true. The algorithm outputs . From the algorithm, we know that for any with ,
Now, since at most agents are faulty, there exists at least one set containing non-faulty agents such that (and also ). Thus,
Also, since , due to -redundancy (Definition 3), we have
Since is non-empty, the last equality implies that is non-empty. This, in turn, by Lemma 1 implies that
Thus, the above algorithm achieves -resilience. ∎
It should be noted that the correctness of the -resilient algorithm presented above does not require differentiability or convexity of the agents’ true cost functions. Therefore, the -redundancy is a sufficient condition for -resilience even when the agents’ cost functions are non-differentiable and non-convex.
Alternate -resilient algorithms: There exist other, and more practical, algorithms to achieve -resilience when -redundancy holds. However, there is a trade-off between algorithm complexity and additional properties assumed for the cost functions.
We present an alternate, computationally simpler, -resilient algorithm in Section 3.1 for the case when the minimum values of each true cost function is zero.
2.2 Prior work on redundancy
To the best of our knowledge, there is no prior work on the tightness of -redundancy property for -resilience in collaborative optimization. Nevertheless, it is worthwhile to note that conditions with some similarity to -redundancy are known to be necessary and sufficient for fault-tolerance in other systems, such as information coding and collaborative multi-sensing (or sensor fusion), discussed below. We note that collaborative multi-sensing can be viewed as a special case of the collaborative optimization problem presented in this report.
Redundancy for error-correction coding: Digital machines store or communicate information using a finite length sequence of symbols. However, these symbols are may become erroneous due to faults in the system or during communication. A way to recover the information despite such error is to use an error-correction code. An error-correction code transforms (or encodes) the original sequence of symbols into another sequence of symbols called a codeword. It is well-known that a code that generates codewords of length can correct (or tolerate) up to symbols errors if and only if the Hamming distance between any two codewords of the code is at least [16, 29]. There exist codes (e.g., Reed-Solomon codes) such that the sequence of symbols encoded in a codeword can be uniquely determined using any correct symbols of the codeword.
Redundancy for fault-tolerant state estimation: The problem of collaborative optimization finds direct application in distributed sensing . In this problem, the system comprises multiple sensors, and each sensor makes partial observations about the state of the system. The goal of the sensors is to collectively compute the complete state of the system. However, if a sensor is faulty then it may share incorrect observations. The problem of fault-tolerance in collaborative sensing for the special case wherein the sensors’ observations are linear in the system state has gained significant attention in recent years [3, 11, 19, 20, 23, 24, 25]. Chong et al., 2015  and Pajic et al., 2015  showed that the system state can be accurately computed when up to (out of ) sensors are faulty if and only if the system is -sparse observable, i.e., the state can be computed uniquely using observations of only non-faulty sensors. We note that the property of -sparse observability is a special instance of the more general -redundancy property presented in this report. Moreover, the necessity and sufficiency of the -redundancy property proved in this report implies the necessity and sufficiency of -sparse observability
for fault-tolerant state estimation for a more general setting wherein the sensor observations may be non-linear; however, the converse is not true.
Next, we consider the case when the cost functions are independent, and may not satisfy the -redundancy property.
3 The case of Independent Cost Functions
In this section, we present the case when the true cost functions of the agents are independent. Throughout this section we assume that , otherwise the problem of resilience is trivial.
We show below by construction that when the true cost functions non-negative then -weak resilience if even when the true cost functions are independent. Note that, by Definition 4, when the true cost functions of the agents are non-negative then -weak resilience trivially implies -weak resilience where . Therefore, achievability of -weak resilience implies the achievability of -weak resilience for all .
In the subsequent subsection we present a collaborative optimization algorithm that guarantees -weak resilience when the true cost functions are non-negative and . In Section 3.2, we show that the algorithm below also achieves -resilience under certain conditions.
3.1 Algorithm for -Weak Resilience
In the proposed algorithm, the server obtains a full
description of the agents’ cost functions.
We denote the function obtained by the server from agent as .
Let the true cost function of each agent be denoted . Then for each non-faulty agent ,
. On the other hand, for each faulty agent , may not necessarily equal .
The algorithm comprises three steps:
Pre-processing Step: For any agent , if is not non-negative for some or is not finite (or does not exist), then must be faulty. Remove from the system. Decrement and each by 1 for each agent thus removed. In other words, the cost functions of the remaining agents are non-negative. Also, it is easy to see that after pre-processing for the updated values of and .444A worst-case adversary may ensure that for faulty agent is non-negative, so that no faulty agents will be eliminated in the pre-processing step.
Step 1: For each set of agents such that , compute
Step 2: Determine a subset of size such that
Output a point .
Now we prove that the algorithm is -weak resilient. It should be noted that the -weak resilience property of the algorithm holds true despite the true cost function being non-convex and non-differentiable.
Suppose that Assumption 1 holds, and . If the true cost functions are non-negative then the above algorithm is -weak resilient.
In the proof, we consider the set of agents, and the values of and after the pre-processing step of the algorithm.
In the worst-case for the algorithm, all faulty agents will send non-negative functions, thus, no faulty agents are removed in the pre-processing step.
For an execution of the proposed algorithm, let denote the set of up to faulty agents, and let denote the set of non-faulty agents. Thus,
Recall the definition of in the algorithm above. Let
Since and ,
we have that and .
First, note that owing to the pre-processing step and the Assumption 1, for every set of agents , exists and is finite.
Now, note that
From (28), for all sets of size . Therefore, there exists a subset with such that
From above we obtain,
Recall that for all . As and are subsets of , the above implies that,
Each is a non-negative function (due to the pre-processing step). Therefore, for all . Substituting this in (31) implies,
As , non-negativity of cost functions implies that,
Substituting the above in (32) implies,
Recall that . The above implies that the proposed algorithm is -weak resilient. ∎
3.2 -Resilience Property
In this section, we show that if the minimum value of each true cost function is zero, and the -redundancy property holds, then a -weak resilient collaborative optimization, such as the algorithm presented above, is -resilient.
Suppose that Assumption 1 holds true, and . If the true cost functions of the agents satisfy the -redundancy property, and each true cost function has minimum value equal to zero, then a -weak resilient algorithm is also -resilient.
Let be a -weak resilient collaborative optimization algorithm. Consider an execution of , named , where denotes the set of faulty agents with . The remaining agents in are non-faulty. Suppose that the true cost function of each agent in execution is .
As is an arbitrary execution, to prove the lemma it suffices to show that the output of in execution is a minimum of the sum of the true cost functions of all the non-faulty agents .
We have assumed that the minimum values of the functions are zero, i.e.,
By applying the condition in Definition 3 of -redundancy property for all possible (where ) we can conclude that the set is contained in the set for each . This, and the fact that each individual cost function has minimum value 0, implies that
Substituting from (34) above implies that
Let denote the output of . As is -weak resilient, there exists a subset of of size such that
Substituting from (35) above implies that
From (34), . The above implies that
As , the -redundancy property implies that
From substituting the above in (36) we obtain,
Thus, algorithm achieves -resilience. ∎
If the true cost functions of the agents satisfy the -redundancy property, and have minimum value equal to zero, then the proposed algorithm in Section 3.1 is -resilient.
Note that the algorithm presented in this section is computationally much simpler than the -resilient algorithm previously presented in Section 2.1. However, the algorithm in this section relies on an additional assumption that the minimum value of the true cost function of each non-faulty agent is zero. In general, there is a trade-off between complexity of the algorithm, and the assumptions made regarding the true cost functions, as the discussion below also illustrates.
4 Gradient-Descent Based Algorithm
In certain application of collaborative optimization, the algorithms only use information about the gradients of the agents’ cost functions. Collaborative learning is one such application . Due to its practical importance, fault-tolerance in collaborative learning has gained significant attention in recent years [1, 2, 4, 10, 30].
In this section, we briefly summarize a gradient-descent based distributed collaborative optimization algorithm wherein the agents only send gradients of their cost functions to the server, instead of sending their entire cost functions. The algorithm was proposed in our prior work , where we proved -resilience of the algorithm when the true cost functions satisfy the -redundancy and certain additional properties.
The proposed algorithm is iterative. For an execution of the algorithm, let denote the set of non-faulty agents and suppose that the true cost functions of the agents are . The server maintains an estimate of the minimum point,
which is updated in each iteration of the algorithm. The initial estimate, named , is chosen arbitrarily by the server from . In iteration ,
the server computes estimate in steps S1 and S2 as described below.
In Step S1, the server obtains from the agents the gradients of their local cost functions at . A faulty
agent may send an arbitrary -dimensional vector for its gradient. Each non-faulty agent sends the gradient of its true cost function at , i.e., . In Step S2, to mitigate the detrimental impact of such incorrect gradients, the algorithm uses a filter to “robustify” the gradient aggregation
step. In particular, the gradients with the largest norms are “clipped” so that their norm equals the norm of the -th largest gradient (or, equivalently, the -th smallest gradient). The remaining gradients remain unchanged.
The resulting gradients are then accumulated to obtain the update direction, which is then used to compute . We refer to the method used in Step S2 for clipping the largest
gradients as “Comparative Gradient Clipping” (CGC), since the largest gradients are clipped
to a norm that is “comparable” to the next largest gradient.
Detailed description of the algorithm and its resilience guarantee can be found in our prior work . The above algorithm performs correctly despite the use of a simple filter on the gradients, which only takes into account the gradient norms, not the direction of the gradient vectors. This simplification is possible due to the assumptions made on the cost functions . Weaker assumptions will often necessitate more complex algorithms.
5 Summary of the Results
We have made the following key contributions in this report.
In case of redundant cost functions: We proved the necessary and sufficient condition of -redundancy for -resilience in collaborative optimization. We have presented -resilient collaborative optimization algorithms to demonstrate the trade-off between the complexity of a -resilient algorithm, and the properties of the agents’ cost functions.
In case of independent cost functions: We introduced the metric of -weak resilience to quantify the notion of resilience in case when the agents’ cost functions are independent. We have presented an algorithm that obtains -weak resilience for all when the cost functions are non-negative and .
Research reported in this paper was sponsored in part by the Army Research Laboratory under Cooperative Agreement W911NF- 17-2-0196, and by National Science Foundation award 1610543. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the the Army Research Laboratory, National Science Foundation or the U.S. Government.
Dan Alistarh, Zeyuan Allen-Zhu, and Jerry Li.
Byzantine stochastic gradient descent.In Advances in Neural Information Processing Systems, pages 4618–4628, 2018.
-  Jeremy Bernstein, Jiawei Zhao, Kamyar Azizzadenesheli, and Anima Anandkumar. signsgd with majority vote is communication efficient and Byzantine fault tolerant. arXiv preprint arXiv:1810.05291, 2018.
-  Kush Bhatia, Prateek Jain, and Purushottam Kar. Robust regression via hard thresholding. In Advances in Neural Information Processing Systems, pages 721–729, 2015.
-  Peva Blanchard, Rachid Guerraoui, Julien Stainer, et al. Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems, pages 119–129, 2017.
-  Léon Bottou, Frank E Curtis, and Jorge Nocedal. Optimization methods for large-scale machine learning. Siam Review, 60(2):223–311, 2018.
-  Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1):1–122, 2011.
-  Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.
Moses Charikar, Jacob Steinhardt, and Gregory Valiant.
Learning from untrusted data.
Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing, pages 47–60, 2017.
-  Yuan Chen, Soummya Kar, and Jose MF Moura. Resilient distributed estimation through adversary detection. IEEE Transactions on Signal Processing, 66(9):2455–2469, 2018.
-  Yudong Chen, Lili Su, and Jiaming Xu. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 1(2):44, 2017.
-  Michelle S Chong, Masashi Wakaiki, and Joao P Hespanha. Observability of linear systems under adversarial attacks. In American Control Conference, pages 2439–2444. IEEE, 2015.
-  John C Duchi, Alekh Agarwal, and Martin J Wainwright. Dual averaging for distributed optimization: Convergence analysis and network scaling. IEEE Transactions on Automatic control, 57(3):592–606, 2011.
-  Nirupam Gupta and Nitin H Vaidya. Byzantine fault tolerant distributed linear regression. arXiv preprint arXiv:1903.08752, 2019.
-  Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977, 2019.
-  Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems (TOPLAS), 4(3):382–401, 1982.
-  Yehuda Lindell. Introduction to coding theory lecture notes. Department of Computer Science Bar-Ilan University, Israel January, 25, 2010.
-  Nancy A Lynch. Distributed algorithms. Elsevier, 1996.
-  Angelia Nedic and Asuman Ozdaglar. Distributed subgradient methods for multi-agent optimization. IEEE Transactions on Automatic Control, 54(1):48–61, 2009.
-  Miroslav Pajic, Insup Lee, and George J Pappas. Attack-resilient state estimation for noisy dynamical systems. IEEE Transactions on Control of Network Systems, 4(1):82–92, 2017.
-  Miroslav Pajic, James Weimer, Nicola Bezzo, Paulo Tabuada, Oleg Sokolsky, Insup Lee, and George J Pappas. Robustness of attack-resilient state estimators. In ICCPS’14: ACM/IEEE 5th International Conference on Cyber-Physical Systems (with CPS Week 2014), pages 163–174. IEEE Computer Society, 2014.
-  Michael Rabbat and Robert Nowak. Distributed optimization in sensor networks. In Proceedings of the 3rd international symposium on Information processing in sensor networks, pages 20–27, 2004.
-  Robin L Raffard, Claire J Tomlin, and Stephen P Boyd. Distributed optimization for cooperative agents: Application to formation flight. In 2004 43rd IEEE Conference on Decision and Control (CDC)(IEEE Cat. No. 04CH37601), volume 3, pages 2453–2459. IEEE, 2004.
-  Yasser Shoukry, Pierluigi Nuzzo, Alberto Puggelli, Alberto L Sangiovanni-Vincentelli, Sanjit A Seshia, Mani Srivastava, and Paulo Tabuada. Imhotep-smt: A satisfiability modulo theory solver for secure state estimation. In Proc. Int. Workshop on Satisfiability Modulo Theories, 2015.
-  Yasser Shoukry, Pierluigi Nuzzo, Alberto Puggelli, Alberto L Sangiovanni-Vincentelli, Sanjit A Seshia, and Paulo Tabuada. Secure state estimation for cyber-physical systems under sensor attacks: A satisfiability modulo theory approach. IEEE Transactions on Automatic Control, 62(10):4917–4932, 2017.
-  Lili Su and Shahin Shahrampour. Finite-time guarantees for Byzantine-resilient distributed state estimation with noisy measurements. arXiv preprint arXiv:1810.10086, 2018.
-  Lili Su and Nitin H Vaidya. Fault-tolerant multi-agent optimization: optimal iterative distributed algorithms. In Proceedings of the 2016 ACM symposium on principles of distributed computing, pages 425–434. ACM, 2016.
-  Lili Su and Nitin H Vaidya. Robust multi-agent optimization: coping with Byzantine agents with input redundancy. In International Symposium on Stabilization, Safety, and Security of Distributed Systems, pages 368–382. Springer, 2016.
-  Shreyas Sundaram and Bahman Gharesifard. Distributed optimization under adversarial nodes. IEEE Transactions on Automatic Control, 2018.
-  Jacobus Hendricus Van Lint. Coding theory, volume 201. Springer, 1971.
-  Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta. Generalized Byzantine-tolerant sgd. arXiv preprint arXiv:1802.10116, 2018.
-  Zhixiong Yang and Waheed U. Bajwa. Byrdie: Byzantine-resilient distributed coordinate descent for decentralized learning, 2017.
Appendix A Proof of Lemma 1
Lemma 1. For a non-empty set , consider a set of functions , , such that
Consider any non-empty set , and functions , , such that
Part I: Consider any . Since each cost function , , is minimized at , it follows that is also minimized at . In other words, it is trivially true that
Part II: Let be a point such that