# Distributed Learning with Adversarial Agents Under Relaxed Network Condition

This work studies the problem of non-Bayesian learning over multi-agent network when there are some adversarial (faulty) agents in the network. At each time step, each non-faulty agent collects partial information about an unknown state of the world and tries to estimate true state of the world by iteratively sharing information with its neighbors. Existing algorithms in this setting require that all non-faulty agents in the network should be able to achieve consensus via local information exchange. In this work, we present an analysis of a distributed algorithm which does not require the network to achieve consensus. We show that if every non-faulty agent can receive enough information (via iteratively communicating with neighbors) to differentiate the true state of the world from other possible states then it can indeed learn the true state.

## Authors

• 2 publications
• 9 publications
• 21 publications
• ### Distributed Hypothesis Testing and Social Learning in Finite Time with a Finite Amount of Communication

We consider the problem of distributed hypothesis testing (or social lea...
04/02/2020 ∙ by Shreyas Sundaram, et al. ∙ 0

• ### Adversarial attacks in consensus-based multi-agent reinforcement learning

Recently, many cooperative distributed multi-agent reinforcement learnin...
03/11/2021 ∙ by Martin Figura, et al. ∙ 0

• ### Cohesive Networks using Delayed Self Reinforcement

How a network gets to the goal (a consensus value) can be as important a...
03/14/2020 ∙ by Santosh Devasia, et al. ∙ 0

• ### CIRFE: A Distributed Random Fields Estimator

This paper presents a communication efficient distributed algorithm, CIR...
02/14/2018 ∙ by Anit Kumar Sahu, et al. ∙ 0

• ### Reasoning in Bayesian Opinion Exchange Networks Is PSPACE-Hard

We study the Bayesian model of opinion exchange of fully rational agents...
09/04/2018 ∙ by Jan Hązła, et al. ∙ 0

• ### Switching to Learn

A network of agents attempt to learn some unknown state of the world dra...
03/11/2015 ∙ by Shahin Shahrampour, et al. ∙ 0

• ### Distributed Estimation and Learning over Heterogeneous Networks

We consider several estimation and learning problems that networked agen...
11/10/2016 ∙ by M. Amin Rahimian, et al. ∙ 0

##### This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

## I Introduction

Distributed algorithms in multi-agent networks for various network settings have been studied since long time [13, 3]. In this work, we consider a set of agents which are connected by directed links, thus forming a directed network. Each agent is attached to a sensor which senses some partial information about the state of the world (environment) in which the network is present. There is only one true state of the world and the aim for each agent is to estimate the true state by iteratively sharing information with its neighbors. Distributed learning has been studied in different settings like in the presence of a fusion center [16, 12] and when there is no fusion center [2, 1, 5].

Non-Bayesian learning with the use of iterative distributed consensus algorithm was first proposed by Jadbabaie et. al. [4]. The approach proposed in [4] requires the network formed by the agents to achieve consensus in order to learn the true state. Since then the non-Bayesian learning has been applied in various network settings; see [7] for a survey of results in this area.

Our aim is to study a network of agents in which an unknown set of agents is adversarial. We assume that an adversarial agent suffers Byzantine faults, i.e., may send arbitrary information to its neighbors and may not follow the specified algorithm. Learning the true state of the world in a network with adversarial agents was first studied by [9, 10]. The algorithm in [9] uses a geometric averaging update similar to that used in other works [6, 8].

The algorithm analysis in [9, 10] requires that the network topology be such that non-faulty agents can achieve consensus by iteratively sharing their information with their neighbors. In this work we circumvent this limitation. We analyze the algorithm proposed in [9, 10] and show that in order to estimate the true state of the world by non-faulty agents, achieving distributed consensus is not required. Intuitively, we show that if the set of agents that can reach an agent can collectively estimate the true state then the agent can also estimate the true state almost surely.

### I-a Preview

We introduce the system model in Section II and present the algorithm to estimate true state in presence of adversarial agents (which is first introduced in [9]) in Section III. In Section III-A we state our assumption on network along with our main contribution (Lemma 1). We use this lemma to analyze Algorithm 1 in Section III-B. We conclude the work in Section IV.

## Ii Problem Formulation

We consider a system model similar to that in [15, 14]. We consider a set of agents which are connected via directed links thus forming a directed network where We consider synchronous system setting. Maximum agents can suffer with Byzantine faults at each execution of the algorithm. Any agent with Byzantine fault may send arbitrary different information to different neighbors. Adversarial agents can collaborate with each other and have full knowledge of the system. Let be the set of faulty agents and be set of non-faulty (good) agents in an execution. Each good agent at every execution knows the upper bound on number of faulty agents, i.e., but does not know the set Let

Every agent collects some partial information about the world. The aim of every fault-free agent is to estimate the true state of the world by iteratively sharing information with neighbors. For this we use the same model as presented in [4, 10]. Let there be possible states of the world and we represent the set of states by: Out of possible states, there is one true state Initially, at the true state is unknown to every agent in the network. At every time iteration every agent independently observes some information (signal) about the state Observed signal space for agent is represented by and we assume that Let be the marginal distribution of the signal observed by agent when the true state is Each agent knows the structure of its observed signal which is represented by a set of marginal distributions We also assume that and In other words, the support of the distribution is the whole signal space. Let be the signal history observed by agent up to time

Throughout this work, log of any vector

is defined as a vector with i.e., the log operation on a vector is element-wise.

## Iii Non-Bayesian learning with faulty agents

In this section we present the algorithm for non-Bayesian learning when some agents in the network are faulty. Note that Algorithm 1 was first presented in [9] and in this work we present an improved analysis which circumvent the need to achieve consensus in order to learn the true state by non faulty agents. Algorithm 1 and some related concepts are presented here for the sake of completeness of this manuscript. For more details refer to [9, 11, 15].

For convenience of presentation, we assume that the non-faulty agents are numbered (where is the number of faulty agents at every time iteration). At each time iteration every non-faulty agent maintains a vector of the possible states of the world. is a stochastic vector over all states with and We assume that initially at

The Tverberg point is guaranteed to be in the convex hull of values received from non-faulty agents. See [15] for definition of Tverberg point. As shown in [9], the dynamics of for fault free agent () of Algorithm 1 can be written as:

 ηit(θ)=logn−ϕ∏j=1μjt−1(θ)Aij[t],   ∀θ∈Θ, (1)

where is a

row stochastic matrix corresponding to the execution of Algorithm

1 at time As shown in [15], is affected by the behavior of faulty agents. For any and for any agent let and be as follows:

 ψit(θ1,θ2)≜logμit(θ1)μit(θ2),Lit(θ1,θ2) ≜ logℓi(sit|θ1)ℓi(sit|θ2). (2)

Following the analysis in [9] the evolution of can be written as:

 ψit(θ,θ∗)=t∑r=1n−ϕ∑j=1Φij(t,r+1)r∑k=1Ljk(θ,θ∗). (3)

where is -th element of for By convention, and

### Iii-a Properties of Φ(t,r)

Many concepts of this section were presented in [14, 9] and we present them here for the sake of completeness of this manuscript. Recall that is a row stochastic matrix which defines the run of Algorithm 1 at time Note that Algorithm 1 uses Tverberg points to generate which is obtained by rejecting extreme values received from neighbors. It is shown in [15] that this can be seen as removing some incoming links at each round of the algorithm and the effective network can be characterized by reduced graph of

###### Definition 1.

[15] A reduced graph of network is obtained by:

1. removing all faulty agents and all the links incident on the these agents

Let the set of all such reduced graphs be By the definition of reduced graph and finiteness of note that the number of possible reduced graphs of is finite, i.e., A source component in a reduced graph is a strongly connected set of agents which does not have any incoming links from outside that set. We make the following assumption in our analysis.

###### Assumption 1.

Every reduced graph contains one or more source components and each agent in the reduced graph is either a part of a source component or has a directed path from one or more source components.

Remark: Note that analysis in [9] assumes that every reduced graph contains only one source component. This assumption is shown [9, 15] to be sufficient to achieve approximate Byzantine vector consensus. We do not assume that there is a unique source component in each reduced graph. Thus, under our assumption, consensus on arbitrary inputs is not necessarily guaranteed. However, under Assumption 2 stated below regarding the sensor observations, the learning problem is solvable.

Assumption 1 is different than the one made in [9] and thus is not sufficient to achieve consensus among fault free agents. The key contribution of this work is to show correctness of Algorithm 1 under Assumption 1.

It was shown in [14] that for any there exists a reduced graph of say whose transition matrix is such that where is a constant. For more details on this relationship and definition of refer to [14]. Now we present a new result which will be used for the analysis.

###### Lemma 1.

For with there exists a reduced graph such that the following holds for each there exists a source component such that for each agent in that source component of

###### Proof.

We will prove the result for two cases. First for recall the product matrix and for any where is the adjacency matrix of the reduced graph corresponding to -th round of Algorithm 1. Thus,

 Φ(t,r+1)≥βνt∏x=r+1H[x].

The product contains reduced graphs of As there are distinct reduced graphs, there is one reduced graph which will occur at least times in By Assumption 1, every agent has a directed path from at least one source component in and let be any one source component which has a directed path to in As the maximum length of any path in is for each agent for all Thus for each agent and Hence the result is proved when

Now, for any value of such that where is an integer, we get

 Φ(t,r+1) =A[t]…A[t−k+1]A[t−k]…A[r+1] =Φ(t,t−k+2)Φ(t−k+1,r+1).

Let the -th row of be and that of be Then can be written in terms of as:

 Li=n−ϕ∑j=1Φij(t,t−k+2)Kj.

Recall that is a row stochastic matrix thus for every there exists some such that By first part of the proof, there exists a reduced graph such that for each row of there exists a source component of such that where belongs to that source component. Thus, for each row of there exists a reduced graph such that there exists a source component of such that where belongs to

### Iii-B Analysis of Algorithm 1

In this section we present the analysis of Algorithm 1 under Assumption 1 which does not require the network topology to achieve distributed consensus. We make the following assumption on agents’ capacity to identify the true state of the world based on the Kullback-Leiber divergence between the true state’s marginal and marginal of any other state The Kullback-Leiber divergence is defined as:

 D(lj(.|θ∗)||lj(.|θ)=∑ωi∈Sjlj(ωi|θ∗)loglj(ωi|θ∗)lj(ωi|θ).
###### Assumption 2.

Let be the set of all source components in any reduced graph of Then, for any for every source component for every reduced graph the following holds:

 ∑j∈PD(lj(.|θ∗)||lj(.|θ))≠0.

Intuitively, Assumption 2 states that in any reduced graphs all agents in any source component can collaboratively detect the true state. Before presenting our main result we define few notations from [9] which will be used to prove our main result. For each and define as:

 Hi(θ,θ∗) ≜∑ωi∈Siℓi(ωi|θ∗)logℓi(ωi|θ)ℓi(ωi|θ∗) =−D(ℓi(.|θ∗)||ℓi(.|θ))≤0. (4)

Let be any arbitrary reduced graph with a set of source components and be the set of all possible source components for all the reduced graph. Then we define and as:

 −C0 ≜mini∈Vminθ1,θ2∈Θ;θ1≠θ2minωi∈Si(logℓi(ωi|θ1)ℓi(ωi|θ2)), (5) C1 ≜minP∈\lx@paragraphsignminθ,θ∗∈Θ;θ≠θ∗∑i∈PD(ℓi(.|θ∗)||ℓi(.|θ)). (6)

Due to finiteness of and for each agent we know that and Also under Assumption 2 we get Since the support of is the whole signal space for each it is easy to observe that

 0≥Hj(θ,θ∗) ≥minwj∈Sj(logℓj(wj|θ)ℓj(wj|θ∗)) ≥−C0>−∞. (7)

The following lemma is used to prove our main result.

###### Lemma 2.

Under Assumption 2, for Algorithm 1 the following statement is true for any

 1t2t∑r=1(n−ϕ∑j=1Φij(t,r+1)r∑k=1Ljk(θ,θ∗)−rn−ϕ∑j=1Φij(t,r+1)Hj(θ,θ∗))a.s.−−→0.
###### Proof.

The lemma statement is similar (but not identical) to Lemma 3 of [9]. The proof of Lemma 3 of [9] requires each row of to converge to an identical stochastic vector. We do not have this requirement; moreover under Assumption 1 a row of may not converge as goes to infinity. The proof is presented in Appendix A. ∎

Now we present our main result for non-Bayesian learning when some agents in the network are faulty.

###### Theorem 1.

Under Assumption 2, for Algorithm 1 every agent will concentrate its vector on the true state almost surely, i.e.,

###### Proof.

For any to show for it is enough to show that By (7) we know that for each agent Note that is a row stochastic matrix. Due to finiteness of by adding and subtracting from (3), we get,

 ψit(θ,θ∗)=t∑r=1(n−ϕ∑j=1Φij(t,r+1)r∑k=1Ljk(θ,θ∗) −rn−ϕ∑j=1Φij(t,r+1)Hj(θ,θ∗)) +t∑r=1rn−ϕ∑j=1Φij(t,r+1)Hj(θ,θ∗). (8)

We first derive bound for the second term.

 t∑r=1rn−ϕ∑j=1Φij(t,r+1)Hj(θ,θ∗) ≤∑r:t−r≥νr∑j∈PirΦij(t,r+1)Hj(θ,θ∗), (9)

where for agent is a source component of for which the lower bound of Lemma 1 holds. The above inequality holds because by (7),

 t∑r=1rn−ϕ∑j=1Φij(t,r+1)Hj(θ,θ∗) ≤∑r:t−r≥νr⎛⎜⎝∑j∈PirβνnHj(θ,θ∗)⎞⎟⎠ By Lemma~{}??? ≤−∑r:t−r≥νr(βνnC1)by (???) and (???) ≤−(t−ν)22βνnC1. (10)

Therefore by (8), (10) and Lemma 2, we get

 limt→∞1t2ψit(θ,θ∗)≤−12nβνC1.

Thus, and for all non-faulty agents and

## Iv Conclusion

In this work, we presented an analysis of a distributed algorithm for non-Bayesian learning over multi-agent network with adversaries which is based on a weaker assumption on the underlying network than the one present in literature [10, 11]. Our analysis does not need the network to achieve consensus among all the fault-free agents. It shows that if all the agents, whose information can reach an agent, can collaboratively correctly estimate the true state of the world then the agent itself can estimate the true state. The analysis presented here proves a sufficient network topological condition and global identifiability of the network to correctly estimate the true state by all fault-free agents. It will be interesting to prove this condition also being the necessary to estimate true state in a network with adversarial agents.

The analysis also extends to a network with no adversaries, i.e., and leads to much weaker assumption on the network as compared to the one present in literature. Previous analysis in [9, 6] for fault-free network assume that the network is strongly connected thus capable of achieving distributed consensus. The analysis of Section III can be extended to fault-free network that can have more than one connected components and each connected component may not be strongly connected.

In this work we assume a synchronous system, i.e., in each round of the algorithm every agent sends its information at the same time to all its neighbors. In future, we would like to extend this work in case of asynchronous setting. In addition to that we assume the network to be static, i.e., neighborhood of any agent is not changing over the course of execution of the algorithm. We believe that our results can be easily generalized to the dynamic networks where the network topology is changing with time.

## References

• [1] F. S. Cattivelli and A. H. Sayed. Distributed detection over adaptive networks using diffusion adaptation. IEEE Transactions on Signal Processing, 59(5):1917–1932, May 2011.
• [2] D. Gale and S. Kariv. Bayesian learning in social networks. Games and Economic Behavior, 45:329–346, 1988.
• [3] R. G. Gallager. Finding parity in simple broadcast networks. IEEE Trans. on Info. Theory, 34:176–180, 1988.
• [4] A. Jadbabaie, P. Molavi, A. Sandroni, and A. Tahbaz-Salehi. Non-baysian social learning. Games and Economic Behavior, 76:210–225, 2012.
• [5] D. Jakovetić, J. M. Moura, and J. Xavier. Distributed detection over noisy networks: Large deviations analysis. IEEE Transactions on Signal Processing, 60(8):4306–4320, 2012.
• [6] A. Nedić, A. Olshevsky, and C. A. Uribe. Nonasymptotic convergence rates for cooperative learning over time-varying directed graphs. In American Control Conference (ACC), pages 5884–5889, 2015.
• [7] A. Nedić, A. Olshevsky, and C. A. Uribe. A tutorial on distributed (non-bayesian) learning: Problem, algorithms and results. In 55th IEEE Conf. on Decision and Control, pages 6795–6801, 2016.
• [8] S. Shahrampour and A. Jadbabaie. Exponentially fast parameter estimation in networks using distributed dual averaging. In IEEE Conference on Decision and Control (CDC), pages 6196–6201. JAI Press, 2013.
• [9] L. Su and N. H. Vaidya. Defending non-bayesian learning against adversarial attacks. https://arxiv.org/abs/1606.08883, 2016.
• [10] L. Su and N. H. Vaidya. Non-bayesian learning in the presence of byzantine agents. In International Symposium on Distributed Computing, pages 414–427. Springer, 2016.
• [11] L. Su and N. H. Vaidya. Defending non-bayesian learning against adversarial attacks. Distributed Computing https://doi.org/10.1007/s00446-018-0336-4, 2018.
• [12] J. N. Tsitsiklis. Decentralized detection. In Advances in Statistical Signal Processing, pages 297–344. JAI Press, 1993.
• [13] J. N. Tsitsiklis and M. Athans. Convergence and asymptotic agreement in distributed decision problems. IEEE Transactions on Automatic Control, 29(1):42–50, 1984.
• [14] N. H. Vaidya. Matrix representation of iterative approximate byzantine consensus in directed graphs. available at https://arxiv.org/abs/1203.1888, 2012.
• [15] N. H. Vaidya. Iterative byzantine vector consensus in incomplete graphs. Distributed Computing and Networking, pages 14–28, 2014.
• [16] P. K. Varshney. Distributed Detection and Data Fusion. Springer Science & Business Media, 2012.

## Appendix A Proof of Lemma 2

To prove Lemma 2, we will show that almost surely for any there exists sufficiently large such that for all

 1t2∣∣ ∣∣t∑r=1n−ϕ∑j=1Φij(t,r+1)(r∑k=1Ljk(θ,θ∗)−rHj(θ,θ∗))∣∣≤ϵ. (11)

For ease of notations, we will represent the left hand side of (11) by We prove this by dividing into two ranges and For we have,

 1t2Q(1,√t) ≤1t2√t∑r=1n−ϕ∑j=1Φij(t,r+1)(2rC0) =1t2(2C0)√t∑r=1r≤C0(1t+1t32).

Here first inequality is due to (7) and finiteness of Thus, there exists such that for all ,

As

’s are i.i.d., due to Strong Law of Large Numbers, we get

Thus for each convergent sample path, there exists such that for any , Thus for there exists sufficiently large such that for all is large enough and

 ∣∣ ∣∣1rr∑k=1Ljk(θ,θ∗)−Hj(θ,θ∗)∣∣ ∣∣≤ϵ2.

For all ,

 1t2Q(√t,t) ≤1tt∑r=√t+1n−ϕ∑j=1Φij(t,r+1)rtϵ2 =1tt∑r=√t+1rtϵ2=ϵ21t2t∑r=√t+1r =ϵ41t2(t2−√t)≤ ϵ2.

Therefore, for every convergent path for any , there exists , such that for any , Thus (11) holds almost surely and Lemma 2 is proved.