1 Introduction
The problem of collaborative optimization in multiagent systems has gained significant attention in recent years [6, 18, 12, 21, 22]. In this problem, each agent knows its own local objective (or cost) function. In the faultfree setting, all the agents are nonfaulty (or honest), and the goal is to design a distributed (or collaborative) algorithm to compute a minimum of the aggregate of their local cost functions. We refer to this problem as collaborative optimization. Specifically, we consider a system of agents where each agent has a local realvalued cost function that maps a point in
dimensional realvalued vector space (i.e.
) to a real value. Unless otherwise stated, the cost functions are assumed to be convex^{1}^{1}1As noted later in Section 5, some of our results are valid even when the cost functions are nonconvex. [7]. The goal of collaborative optimization is to determine a global minimum , such that(1) 
Throughout the report, we use the shorthand ‘’ for ‘’, unless otherwise mentioned.
As a simple example, may denote the cost for an agent (which may be a robot or a person) to travel to location from its current location. In this case,
is a location that minimizes the total cost for all the agents. Such multiagent collaborative optimization is of interest in many practical applications, including collaborative machine learning
[5, 6, 14], swarm robotics [22], and collaborative sensing [21]. Most of the prior work assumes all the agents to be nonfaulty. Nonfaulty agents follow a specified algorithm correctly. In our work we consider a scenario wherein some of the agents may be faulty and may behave incorrectly.Su and Vaidya [26] introduced the problem of collaborative optimization in the presence of a Byzantine faulty agents. A Byzantine faulty agent may behave arbitrarily [15]. In particular, the faulty agents may send incorrect and inconsistent information in order to bias the output of a collaborative optimization algorithm, and the faulty agents may also collaborate with each other. For example, consider an application of multiagent collaborative optimization to the case of collaborative sensing where the agents (or sensors) are observing a common object in order to collectively identify the object. However, the faulty agents may send arbitrary observations concocted to prevent the nonfaulty agents from making the correct identification [9, 11, 20]. Similarly, in the case of collaborative learning, which is another application of multiagent collaborative optimization, the faulty agents may send incorrect information based on mislabelled or arbitrary concocted data points to prevent the nonfaulty agents from learning a goodclassifier [1, 2, 4, 8, 10, 30].
1.1 System architecture
The contributions of this paper apply to two different system architectures illustrated in Figure 1. In the serverbased architecture, the server is assumed to be trustworthy, but up to agents may be Byzantine faulty. The trusted server helps solve the distributed optimization problem in coordination with the agents. In the peertopeer architecture, the agents are connected to each other by a complete network, and up to of these agents may be Byzantine faulty.
Provided that , any algorithm for the serverbased architecture can be simulated in the peertopeer system using
the wellknown Byzantine broadcast primitive [17].
For the simplicity of presentation, the rest of this report assumes the serverbased architecture.
1.2 Resilience in collaborative optimization
As stated above, we will assume the serverbased architecture in the rest of our discussion.
We assume that up to of the agents may be Byzantine faulty, such that .
We assume that each agent has a “true” cost function. Unless otherwise noted, each such cost function is assumed to be convex.

If an agent is nonfaulty, then its behavior is consistent with its true cost function, say . For instance, if agent is required to send to the server the value of its cost function at some point , then a nonfaulty agent will indeed send .

If an agent is faulty, then its behavior can be arbitrary, and not necessarily consistent with its true cost function, say . For instance, if agent is required to to send to the server the value of its cost function at some point , then a faulty agent may send an arbitrary value instead of .
Clearly, when an agent is faulty, it may not share with the server correct information about its true cost function. However, it is convenient to define its true cost function as above, which is the cost function it would use in the absence of its failure.
Throughout this report, we assume the existence of a finite minimum for the aggregate of the true cost functions of the agents. Otherwise, the objective of collaborative optimization is vacuous. Specifically, we make following technical assumption.
Assumption 1.
Suppose that the true cost function of each agent is . Then, for every nonempty set of agents , we assume that there exists a finite such that .
Suppose that the true cost function of agent is . Then, ideally, the goal of collaborative optimization is to compute a minimum of the aggregate of the true cost functions of all the agents, , even if some of the agents are Byzantine faulty. In general, this may not feasible since the Byzantine faulty agents can behave arbitrarily. To understand the feasibility of achieving some degree of resilience to Byzantine faults, we consider two cases.

Independent functions: A set of cost functions are independent if information about some of the functions in the set does not help learn any information about the remaining functions in the set. In other words, the cost functions do not contain any redundancy.

Redundant functions: Intuitively speaking, a set of cost functions includes redundancy when knowing some of the cost functions helps to learn some information about the remaining cost functions. As a trivial example, consider the special case when it is known that there exists some function such that is the true cost function of every agent. In this case, knowing the true cost function of any agent suffices to learn the true cost functions of all the agents. Also, any value that minimize an individual agent’s true cost function also minimizes the total true cost over all the agents.
Su and Vaidya [26] defined the goal of faulttolerant collaborative optimization as minimizing the aggregate of cost functions of just the nonfaulty agents. Specifically, if is the true cost function of agent , and denotes the set of nonfaulty agents in a given execution, then they defined the goal of faulttolerant optimization to be to output a point in
(2) 
We refer to the above goal as resilience, formally defined below.
Definition 1 (resilience).
A collaborative optimization algorithm is said to be resilient if it outputs a minimum of the aggregate of the true cost functions of the nonfaulty agents despite up to agents being Byzantine faulty.In general, Su and Vaidya [26] showed that, because the identity of the faulty agents is a priori unknown, a resilient algorithm may not necessarily exist. In this report, we provide an exact characterization of the condition under which resilience is achievable. In particular, we show that resilience is achievable if and only if the agents satisfy a property named redundancy, defined next.^{2}^{2}2 The notion of redundancy can be extended to redundancy by replacing in Definitions 2 and 3 by . The definitions below are vacuous if . Henceforth, we assume that the maximum number of faulty agents are in the minority, i.e., .
Definition 2 (redundancy).
Let denote the true cost function of agent . The agents are said to satisfy redundancy if the following holds for every two subsets and each containing agents.
(3) 
The above definition of redundancy is equivalent to the definition below, as shown in Appendix B.
Definition 3 (redundancy).
Let denote the true cost function of agent . The agents are said to satisfy redundancy if the following holds for any sets of agents and such that , , and .
(4) 
Note that the resilience property pertains the point in that is the output of a collaborative optimization
algorithm. resilience property does not explicitly
impose any constraints on the function value.
The notion of weak resilience stated below relates to function values.
Definition 4 (weak resilience).
Let denote the true cost function of agent . Let denote the set of all nonfaulty agents. Then, a collaborative optimization algorithm is said to be weak resilient if it outputs a point for which there exists a subset of such that , andIt can be shown easily that weak resilience implies resilience. The proof is deferred to Section 3. In many applications of multiagent collaborative optimization, such as distributed machine learning, distributed sensing or hypothesis testing and swarm robotics, the cost functions are nonnegative [5, 6, 14, 21, 22]. We show that if the cost functions of the nonfaulty agents are nonnegative and independent then weak resilience is impossible if ; moreover, under these conditions, we present an algorithm that guarantees weak resilience if .
1.3 Prior Work
The prior work on resilience in collaborative multiagent optimization by Su and Vaidya, 2016 [26], and Sundaram and Gharesifard, 2018 [28], only consider the special class of univariate cost functions, i.e, dimension equals one. On the other hand, we consider the general class of multivariate cost functions, i.e., can be greater than one. Specifically, they have proposed algorithms that output a minimum of the nonuniformly weighted aggregate of the nonfaulty agents’ cost functions when . However, their proposed algorithms do not extend easily for the case when . On the other hand, the algorithms and the faulttolerance results presented in this report are valid regardless of the value of the dimension as long as it is finite.
Su and Vaidya have also considered a special case where the true cost functions of the agents are convex combinations of a finite number of basis convex functions in [27]. They have shown that if the basis functions have a common minimum then a minimum point (as in (2)) can be computed accurately. This property of redundancy in the minimum of the basis functions, we note, is a special case of the redundancy property that we prove necessary and sufficient for resilience in this report. Other prior work related to the 2tredundancy property is discussed in Section 2.2.
Yang and Bajwa, 2017 [31] consider a very special case of collaborative optimization problem. They assume that the multivariate cost functions that can be split into independent univariate strictly convex functions. For this special, they have extended the faulttolerance algorithm of Su and Vaidya, 2016 [26] for approximate resilience. In general, however, the agents’ cost functions do not satisfy such specific properties. In this report, we do not make such assumptions about the agents’ cost functions. We only assume the cost functions to be convex, differentiable and that the minimum of their sum is finite (i.e., Assumption 1). Note that these assumptions are fairly standard in the optimization literature, and are also assumed in all of the aforementioned prior work.
Outline of the report: The rest of the report is organized as follows. In Section 2, we present the case when the cost functions have redundancy. In Section 3, we present the case when the cost functions are independent. In Section 4, we summarize a gradientbased algorithm for resilience, which was proposed in our prior work [13]. In Section 5, we discuss direct extension of our results to the case when the cost functions are nondifferentiable and nonconvex. In the same section, we also present a summary of our results.
2 The Case of Redundant Cost Functions
This section presents the key result of this report for the case when the cost functions are redundant. Unless otherwise mentioned, in the rest of the report, the cost functions are assumed to be differentiable, i.e., their gradients exist at all the points in . Indeed, the cost functions are differentiable for most aforementioned applications of collaborative optimization [5, 6, 21, 22]. Nevertheless, as elaborated in Section 5, some of our results are also applicable for nondifferentiable cost functions.
Before we present Theorem 1 below which states the key result of this section, in Lemma 2 we present an alternate, and perhaps more natural, equivalent condition of the redundancy property for the specific case when the agents’ cost functions are differentiable. The proof of Lemma 2 uses Lemma 1 stated below.
Lemma 1.
Suppose that Assumption 1 holds true, and . For a nonempty set , consider a set of functions , , such that
Then
Appendix A presents the proof of the above lemma.
Lemma 2.
Suppose that Assumption 1 holds true, and . When the true cost functions of the agents are convex and differentiable then the redundancy property stated in Definition 2 or Definition 3 is equivalent to the following condition:

A point is a minimum of the sum of true cost functions of the nonfaulty agents if and only if that point is a minimum of the sum of the true cost functions of any nonfaulty agents.
Proof.
Let the true cost function of each agent be denoted by . Recall that there can be at most Byzantine faulty agents. Let with be the set of the nonfaulty agents.
Part I: We first show that the condition stated in the lemma implies that in Definition 2. Recall that the conditions in Definitions 2 and 3 are equivalent.
The condition stated in the lemma is equivalent to saying that for every subset of of size ,
(6) 
We show below that (6) together with Assumption 1 imply that for every subset of of size ,
(7) 
Consider two arbitrary agents in , and then consider two size subsets and of such that , , and
(8) 
By Assumption 1, there exists a point . Now, (6) implies that
The above equality and (8) imply that
This equality can be proven for any . As the true cost functions are assumed convex, from above we obtain,
Therefore, for every subset of of size ,
The above implies that for every subset of of size ,
The above together with (6) implies the condition in Definition 2, i.e.,
Part II: We now show that the condition in Definition 3 implies the condition stated in the lemma. Now, (i.e., the right side of (4)) is a nonempty set due to Assumption 1. This and (4) imply that for every subset of size ,
Therefore, by Lemma 1,
Substituting the above in (4) implies (6) which is equivalent to the condition stated in the lemma. ∎
The following theorem presents the main result of this section.
Theorem 1.
Suppose that Assumption 1 holds true, and . When the true cost functions of the agents are convex and differentiable then resilience can be achieved if and only if the agents satisfy the redundancy property.
Proof.
The case of =0 is trivial, since there are no faulty agents.
In the rest of the proof, we assume that .
Sufficiency of redundancy: Sufficiency of redundancy is proved constructively using the
algorithm presented in Section 2.1. In particular, the algorithm
is proved to achieve resilience if redundancy holds.
Necessity of redundancy:
We consider the worstcase scenario where arbitrary agents are faulty. Suppose that resilience can be achieved using an algorithm named . Consider an execution of in which set with is the actual set of nonfaulty agents. All the remaining agents in the set are the actual faulty agents. Suppose that the true cost function of agent in execution is . We assume that the functions are differentiable and convex.
In any resilient algorithm for collaborative optimization, the server can communicate with the agents and learn some information about their local cost functions. The most information the server can learn about the cost function of an agent is the complete description of its local cost function. To prove the necessity of redundancy, we assume that the server knows a cost function reported by each nonfaulty agent .
Now consider the following executions.

In execution , all the agents are nonfaulty. Let denote the set of all agents, which happen to be nonfaulty in execution . Thus, . The true cost function of agent is , identical to its true cost function in execution .

In execution , where , agent is Byzantine faulty, and all the remaining agents are nonfaulty. Let denote the set of agents that happen to be nonfaulty in execution . In execution , the true cost function of each nonfaulty agent is , which is identical to its true cost function in execution . Let the true cost function of faulty agent in execution be a differentiable and convex function . Assume that the functions , , and are independent. In execution , suppose that the behavior of faulty agent from the viewpoint of the server is consistent with the cost function (which equals the true cost function of agent in execution ).
Fix a particular , . From the viewpoint of the server, execution and execution are indistinguishable. Thus, the resilient algorithm will produce an identical output in these executions; suppose that this output is . As is assumed to be resilient, we have by Definition 1 and Assumption 1,
(9) 
For a differentiable cost function , we denote its gradient at a point by . Let denote the zerovector of dimension . If then
(10) 
(11) 
Recall that . Therefore,
(12) 
As the cost functions are assumed to be convex, the above implies that,
(13) 
By repeating the above argument for each , we have
(14) 
Therefore,
(15) 
Similarly, for every nonempty set of agents
(16) 
Thus, . Then, Lemma 1 implies that
(17) 
Now we consider execution (defined earlier) in which the nodes in set are nonfaulty. Using the results derived in the proof so far,^{3}^{3}3Footnote 2 noted that the notion of redundancy can be extended to redundancy. The proof so far has relied only on 1redundancy, which is weaker than redundancy. The latter part of this proof makes use of redundancy. we will show that, for any subject to ,
The proof concludes once we have shown the above equality.
Consider an arbitrary subset subject to . It is trivially true that
(18) 
So it remains to show that is not a strict subset of . The proof below is by contradiction.
Suppose that
(19) 
This implies that there exists a point
(20) 
such that
(21) 
Therefore, there exists an such that
(22) 
Let and . Then . Now we define executions and .

Execution : In execution the agents in set are faulty, and the agents in set are nonfaulty. In execution , the behavior of each agent is consistent with its true cost function being , which is identical to its true cost function in execution . However, each faulty agent behaves consistent with a differentiable and convex true cost function that has a unique minimum at .

Execution : In execution the agents in set are faulty, and the remaining agents in are nonfaulty. In execution , the behavior of each agent (including the faulty agents in ) is consistent with the cost function . Each nonfaulty agent behaves consistent with it true cost function being , which is defined in execution . Recall that each has a unique minimum at .
Observe that the server cannot distinguish between executions and .
Now, (21) implies that does not minimize at any point in . That is, for every agent ,
(23) 
As is resilient, in execution , algorithm must produce an output in
(24) 
(Recall that the agents in are nonfaulty in execution .)
(20) and (23) together imply that
That is, the above set contains only .
This, in turn, by Lemma 1 implies that the set in (24) only
contains the point , and thus, algorithm
must output in execution .
Now, since algorithm cannot distinguish between executions and , it must also output in execution as well. However, from (17) and (21), respectively, we know that
and
The above two equations imply that , and cannot output in execution (otherwise cannot be resilient).
This is a contradiction.
Therefore, we have proved that is not a strict subset of .
Above result together with (18) implies that
Recall that is an arbitrary subset of with . Therefore, the above implies that for every subset of with ,
This together with (17) implies that
Thus, if is resilient then the true cost functions of the agents satisfy the redundancy property as stated in Definition 3. Hence, proving the necessity of redundancy property for resilience. ∎
The following collaborative optimization algorithm proves the sufficiency of redundancy for resilience.
2.1 A resilient algorithm
We present an algorithm and prove that it is resilient if the agents satisfy the redundancy property stated in Definition 2 or 3. We will suppose that Assumption 1 holds true and . We only consider the case when , since the case of is trivial.
Resilient Algorithm:
The server collects full description of the cost function of each agent. Suppose
that the server obtains cost function from each agent .
For each nonfaulty agent , is the agent’s true objective function.
The proposed algorithm outputs a point such that there exists a set of agents such that for any with ,
If there are multiple candidate points that satisfy the condition above, then any one such point is chosen as the output.
Now we prove the correctness of the above algorithm if redundancy holds.
Proof.
Assume that the redundancy property holds. First we observe that the algorithm will always be able to output a point if redundancy is satisfied. Let denote the set of all nonfaulty agents. Recall that . In particular, consider a set that consists of any nonfaulty agents, that is, . For any where , due to redundancy (Definition 3) and Assumption 1, we have
(25) 
This implies that every point in
is a candidate for the output of the algorithm. Additionally,
due to Assumption 1,
is guaranteed to be nonempty.
Thus, the algorithm will always produce an output.
Next we show that the algorithm achieves resilience. Consider any set for which the condition in the algorithm is true. The algorithm outputs . From the algorithm, we know that for any with ,
Now, since at most agents are faulty, there exists at least one set containing nonfaulty agents such that (and also ). Thus,
(26) 
Also, since , due to redundancy (Definition 3), we have
(27) 
Since is nonempty, the last equality implies that is nonempty. This, in turn, by Lemma 1 implies that
The last equality, (26) and (27) together imply that
Thus, the above algorithm achieves resilience. ∎
It should be noted that the correctness of the resilient algorithm presented above does not require differentiability or convexity of the agents’ true cost functions. Therefore, the redundancy is a sufficient condition for resilience even when the agents’ cost functions are nondifferentiable and nonconvex.
Alternate resilient algorithms: There exist other, and more practical, algorithms to achieve resilience when redundancy holds. However, there is a tradeoff between algorithm complexity and additional properties assumed for the cost functions.

We present an alternate, computationally simpler, resilient algorithm in Section 3.1 for the case when the minimum values of each true cost function is zero.

In our prior work [13], we proposed a gradientdescent based distributed algorithm that is resilient if the cost functions have certain additional properties presented in Section 4
. The algorithm uses a computationally simple “comparative gradient clipping” mechanism to tolerate Byzantine faults.
2.2 Prior work on redundancy
To the best of our knowledge, there is no prior work on the tightness of redundancy property for resilience in collaborative optimization. Nevertheless, it is worthwhile to note that conditions with some similarity to redundancy are known to be necessary and sufficient for faulttolerance in other systems, such as information coding and collaborative multisensing (or sensor fusion), discussed below. We note that collaborative multisensing can be viewed as a special case of the collaborative optimization problem presented in this report.
Redundancy for errorcorrection coding: Digital machines store or communicate information using a finite length sequence of symbols. However, these symbols are may become erroneous due to faults in the system or during communication. A way to recover the information despite such error is to use an errorcorrection code. An errorcorrection code transforms (or encodes) the original sequence of symbols into another sequence of symbols called a codeword. It is wellknown that a code that generates codewords of length can correct (or tolerate) up to symbols errors if and only if the Hamming distance between any two codewords of the code is at least [16, 29]. There exist codes (e.g., ReedSolomon codes) such that the sequence of symbols encoded in a codeword can be uniquely determined using any correct symbols of the codeword.
Redundancy for faulttolerant state estimation: The problem of collaborative optimization finds direct application in distributed sensing [21]. In this problem, the system comprises multiple sensors, and each sensor makes partial observations about the state of the system. The goal of the sensors is to collectively compute the complete state of the system. However, if a sensor is faulty then it may share incorrect observations. The problem of faulttolerance in collaborative sensing for the special case wherein the sensors’ observations are linear in the system state has gained significant attention in recent years [3, 11, 19, 20, 23, 24, 25]. Chong et al., 2015 [11] and Pajic et al., 2015 [20] showed that the system state can be accurately computed when up to (out of ) sensors are faulty if and only if the system is sparse observable, i.e., the state can be computed uniquely using observations of only nonfaulty sensors. We note that the property of sparse observability is a special instance of the more general redundancy property presented in this report. Moreover, the necessity and sufficiency of the redundancy property proved in this report implies the necessity and sufficiency of sparse observability
for faulttolerant state estimation for a more general setting wherein the sensor observations may be nonlinear; however, the converse is not true.
Next, we consider the case when the cost functions are independent, and may not satisfy the redundancy property.
3 The case of Independent Cost Functions
In this section, we present the case when the true cost functions of the agents are independent. Throughout this section we assume that , otherwise the problem of resilience is trivial.
We show below by construction that when the true cost functions nonnegative then weak resilience if even when the true cost functions are independent. Note that, by Definition 4, when the true cost functions of the agents are nonnegative then weak resilience trivially implies weak resilience where . Therefore, achievability of weak resilience implies the achievability of weak resilience for all .
In the subsequent subsection we present a collaborative optimization algorithm that guarantees weak resilience when the true cost functions are nonnegative and . In Section 3.2, we show that the algorithm below also achieves resilience under certain conditions.
3.1 Algorithm for Weak Resilience
In the proposed algorithm, the server obtains a full
description of the agents’ cost functions.
We denote the function obtained by the server from agent as .
Let the true cost function of each agent be denoted . Then for each nonfaulty agent ,
. On the other hand, for each faulty agent , may not necessarily equal .
The algorithm comprises three steps:

Preprocessing Step: For any agent , if is not nonnegative for some or is not finite (or does not exist), then must be faulty. Remove from the system. Decrement and each by 1 for each agent thus removed. In other words, the cost functions of the remaining agents are nonnegative. Also, it is easy to see that after preprocessing for the updated values of and .^{4}^{4}4A worstcase adversary may ensure that for faulty agent is nonnegative, so that no faulty agents will be eliminated in the preprocessing step.

Step 1: For each set of agents such that , compute

Step 2: Determine a subset of size such that
(28) Output a point .
Now we prove that the algorithm is weak resilient. It should be noted that the weak resilience property of the algorithm holds true despite the true cost function being nonconvex and nondifferentiable.
Theorem 2.
Suppose that Assumption 1 holds, and . If the true cost functions are nonnegative then the above algorithm is weak resilient.
Proof.
In the proof, we consider the set of agents, and the values of and after the preprocessing step of the algorithm.
In the worstcase for the algorithm, all faulty agents will send nonnegative functions, thus, no faulty agents are removed in the preprocessing step.
For an execution of the proposed algorithm, let denote the set of up to faulty agents, and let denote the set of nonfaulty agents. Thus,
.
Recall the definition of in the algorithm above. Let
(29)  
(30) 
Thus, .
Since and ,
we have that and .
First, note that owing to the preprocessing step and the Assumption 1, for every set of agents , exists and is finite.
Now, note that
From (28), for all sets of size . Therefore, there exists a subset with such that
From above we obtain,
Recall that for all . As and are subsets of , the above implies that,
(31) 
Each is a nonnegative function (due to the preprocessing step). Therefore, for all . Substituting this in (31) implies,
(32) 
As , nonnegativity of cost functions implies that,
Substituting the above in (32) implies,
(33) 
Recall that . The above implies that the proposed algorithm is weak resilient. ∎
3.2 Resilience Property
In this section, we show that if the minimum value of each true cost function is zero, and the redundancy property holds, then a weak resilient collaborative optimization, such as the algorithm presented above, is resilient.
Lemma 3.
Suppose that Assumption 1 holds true, and . If the true cost functions of the agents satisfy the redundancy property, and each true cost function has minimum value equal to zero, then a weak resilient algorithm is also resilient.
Proof.
Let be a weak resilient collaborative optimization algorithm. Consider an execution of , named , where denotes the set of faulty agents with . The remaining agents in are nonfaulty. Suppose that the true cost function of each agent in execution is .
As is an arbitrary execution, to prove the lemma it suffices to show that the output of in execution is a minimum of the sum of the true cost functions of all the nonfaulty agents .
We have assumed that the minimum values of the functions are zero, i.e.,
(34) 
By applying the condition in Definition 3 of redundancy property for all possible (where ) we can conclude that the set is contained in the set for each . This, and the fact that each individual cost function has minimum value 0, implies that
Substituting from (34) above implies that
(35) 
Let denote the output of . As is weak resilient, there exists a subset of of size such that
Substituting from (35) above implies that
From (34), . The above implies that
Alternately,
(36) 
As , the redundancy property implies that
From substituting the above in (36) we obtain,
Thus, algorithm achieves resilience. ∎
Corollary 1.
If the true cost functions of the agents satisfy the redundancy property, and have minimum value equal to zero, then the proposed algorithm in Section 3.1 is resilient.
Note that the algorithm presented in this section is computationally much simpler than the resilient algorithm previously presented in Section 2.1. However, the algorithm in this section relies on an additional assumption that the minimum value of the true cost function of each nonfaulty agent is zero. In general, there is a tradeoff between complexity of the algorithm, and the assumptions made regarding the true cost functions, as the discussion below also illustrates.
4 GradientDescent Based Algorithm
In certain application of collaborative optimization, the algorithms only use information about the gradients of the agents’ cost functions. Collaborative learning is one such application [5]. Due to its practical importance, faulttolerance in collaborative learning has gained significant attention in recent years [1, 2, 4, 10, 30].
In this section, we briefly summarize a gradientdescent based distributed collaborative optimization algorithm wherein the agents only send gradients of their cost functions to the server, instead of sending their entire cost functions. The algorithm was proposed in our prior work [13], where we proved resilience of the algorithm when the true cost functions satisfy the redundancy and certain additional properties.
The proposed algorithm is iterative. For an execution of the algorithm, let denote the set of nonfaulty agents and suppose that the true cost functions of the agents are . The server maintains an estimate of the minimum point,
which is updated in each iteration of the algorithm. The initial estimate, named , is chosen arbitrarily by the server from . In iteration ,
the server computes estimate in steps S1 and S2 as described below.
In Step S1, the server obtains from the agents the gradients of their local cost functions at . A faulty
agent may send an arbitrary dimensional vector for its gradient. Each nonfaulty agent sends the gradient of its true cost function at , i.e., . In Step S2, to mitigate the detrimental impact of such incorrect gradients, the algorithm uses a filter to “robustify” the gradient aggregation
step. In particular, the gradients with the largest norms are “clipped” so that their norm equals the norm of the th largest gradient (or, equivalently, the th smallest gradient). The remaining gradients remain unchanged.
The resulting gradients are then accumulated to obtain the update direction, which is then used to compute . We refer to the method used in Step S2 for clipping the largest
gradients as “Comparative Gradient Clipping” (CGC), since the largest gradients are clipped
to a norm that is “comparable” to the next largest gradient.
Detailed description of the algorithm and its resilience guarantee can be found in our prior work [13]. The above algorithm performs correctly despite the use of a simple filter on the gradients, which only takes into account the gradient norms, not the direction of the gradient vectors. This simplification is possible due to the assumptions made on the cost functions [13]. Weaker assumptions will often necessitate more complex algorithms.
5 Summary of the Results
We have made the following key contributions in this report.

In case of redundant cost functions: We proved the necessary and sufficient condition of redundancy for resilience in collaborative optimization. We have presented resilient collaborative optimization algorithms to demonstrate the tradeoff between the complexity of a resilient algorithm, and the properties of the agents’ cost functions.

In case of independent cost functions: We introduced the metric of weak resilience to quantify the notion of resilience in case when the agents’ cost functions are independent. We have presented an algorithm that obtains weak resilience for all when the cost functions are nonnegative and .
Acknowledgements
Research reported in this paper was sponsored in part by the Army Research Laboratory under Cooperative Agreement W911NF 1720196, and by National Science Foundation award 1610543. The views and conclusions contained in this document are those of the authors and should not be interpreted as representing the official policies, either expressed or implied, of the the Army Research Laboratory, National Science Foundation or the U.S. Government.
References

[1]
Dan Alistarh, Zeyuan AllenZhu, and Jerry Li.
Byzantine stochastic gradient descent.
In Advances in Neural Information Processing Systems, pages 4618–4628, 2018.  [2] Jeremy Bernstein, Jiawei Zhao, Kamyar Azizzadenesheli, and Anima Anandkumar. signsgd with majority vote is communication efficient and Byzantine fault tolerant. arXiv preprint arXiv:1810.05291, 2018.
 [3] Kush Bhatia, Prateek Jain, and Purushottam Kar. Robust regression via hard thresholding. In Advances in Neural Information Processing Systems, pages 721–729, 2015.
 [4] Peva Blanchard, Rachid Guerraoui, Julien Stainer, et al. Machine learning with adversaries: Byzantine tolerant gradient descent. In Advances in Neural Information Processing Systems, pages 119–129, 2017.
 [5] Léon Bottou, Frank E Curtis, and Jorge Nocedal. Optimization methods for largescale machine learning. Siam Review, 60(2):223–311, 2018.
 [6] Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato, Jonathan Eckstein, et al. Distributed optimization and statistical learning via the alternating direction method of multipliers. Foundations and Trends® in Machine learning, 3(1):1–122, 2011.
 [7] Stephen Boyd and Lieven Vandenberghe. Convex optimization. Cambridge university press, 2004.

[8]
Moses Charikar, Jacob Steinhardt, and Gregory Valiant.
Learning from untrusted data.
In
Proceedings of the 49th Annual ACM SIGACT Symposium on Theory of Computing
, pages 47–60, 2017.  [9] Yuan Chen, Soummya Kar, and Jose MF Moura. Resilient distributed estimation through adversary detection. IEEE Transactions on Signal Processing, 66(9):2455–2469, 2018.
 [10] Yudong Chen, Lili Su, and Jiaming Xu. Distributed statistical machine learning in adversarial settings: Byzantine gradient descent. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 1(2):44, 2017.
 [11] Michelle S Chong, Masashi Wakaiki, and Joao P Hespanha. Observability of linear systems under adversarial attacks. In American Control Conference, pages 2439–2444. IEEE, 2015.
 [12] John C Duchi, Alekh Agarwal, and Martin J Wainwright. Dual averaging for distributed optimization: Convergence analysis and network scaling. IEEE Transactions on Automatic control, 57(3):592–606, 2011.
 [13] Nirupam Gupta and Nitin H Vaidya. Byzantine fault tolerant distributed linear regression. arXiv preprint arXiv:1903.08752, 2019.
 [14] Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Keith Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. Advances and open problems in federated learning. arXiv preprint arXiv:1912.04977, 2019.
 [15] Leslie Lamport, Robert Shostak, and Marshall Pease. The Byzantine generals problem. ACM Transactions on Programming Languages and Systems (TOPLAS), 4(3):382–401, 1982.
 [16] Yehuda Lindell. Introduction to coding theory lecture notes. Department of Computer Science BarIlan University, Israel January, 25, 2010.
 [17] Nancy A Lynch. Distributed algorithms. Elsevier, 1996.
 [18] Angelia Nedic and Asuman Ozdaglar. Distributed subgradient methods for multiagent optimization. IEEE Transactions on Automatic Control, 54(1):48–61, 2009.
 [19] Miroslav Pajic, Insup Lee, and George J Pappas. Attackresilient state estimation for noisy dynamical systems. IEEE Transactions on Control of Network Systems, 4(1):82–92, 2017.
 [20] Miroslav Pajic, James Weimer, Nicola Bezzo, Paulo Tabuada, Oleg Sokolsky, Insup Lee, and George J Pappas. Robustness of attackresilient state estimators. In ICCPS’14: ACM/IEEE 5th International Conference on CyberPhysical Systems (with CPS Week 2014), pages 163–174. IEEE Computer Society, 2014.
 [21] Michael Rabbat and Robert Nowak. Distributed optimization in sensor networks. In Proceedings of the 3rd international symposium on Information processing in sensor networks, pages 20–27, 2004.
 [22] Robin L Raffard, Claire J Tomlin, and Stephen P Boyd. Distributed optimization for cooperative agents: Application to formation flight. In 2004 43rd IEEE Conference on Decision and Control (CDC)(IEEE Cat. No. 04CH37601), volume 3, pages 2453–2459. IEEE, 2004.
 [23] Yasser Shoukry, Pierluigi Nuzzo, Alberto Puggelli, Alberto L SangiovanniVincentelli, Sanjit A Seshia, Mani Srivastava, and Paulo Tabuada. Imhotepsmt: A satisfiability modulo theory solver for secure state estimation. In Proc. Int. Workshop on Satisfiability Modulo Theories, 2015.
 [24] Yasser Shoukry, Pierluigi Nuzzo, Alberto Puggelli, Alberto L SangiovanniVincentelli, Sanjit A Seshia, and Paulo Tabuada. Secure state estimation for cyberphysical systems under sensor attacks: A satisfiability modulo theory approach. IEEE Transactions on Automatic Control, 62(10):4917–4932, 2017.
 [25] Lili Su and Shahin Shahrampour. Finitetime guarantees for Byzantineresilient distributed state estimation with noisy measurements. arXiv preprint arXiv:1810.10086, 2018.
 [26] Lili Su and Nitin H Vaidya. Faulttolerant multiagent optimization: optimal iterative distributed algorithms. In Proceedings of the 2016 ACM symposium on principles of distributed computing, pages 425–434. ACM, 2016.
 [27] Lili Su and Nitin H Vaidya. Robust multiagent optimization: coping with Byzantine agents with input redundancy. In International Symposium on Stabilization, Safety, and Security of Distributed Systems, pages 368–382. Springer, 2016.
 [28] Shreyas Sundaram and Bahman Gharesifard. Distributed optimization under adversarial nodes. IEEE Transactions on Automatic Control, 2018.
 [29] Jacobus Hendricus Van Lint. Coding theory, volume 201. Springer, 1971.
 [30] Cong Xie, Oluwasanmi Koyejo, and Indranil Gupta. Generalized Byzantinetolerant sgd. arXiv preprint arXiv:1802.10116, 2018.
 [31] Zhixiong Yang and Waheed U. Bajwa. Byrdie: Byzantineresilient distributed coordinate descent for decentralized learning, 2017.
Appendix A Proof of Lemma 1
Proof.
Consider any nonempty set , and functions , , such that
Part I: Consider any . Since each cost function , , is minimized at , it follows that is also minimized at . In other words, it is trivially true that
(37) 
Part II: Let be a point such that
Then