1 Introduction
We consider a scenario involving a network of agents, where each agent receives a stream of private signals sequentially over time. The observations of every agent are generated by a common underlying distribution, parameterized by an unknown static quantity which we call the true state of the world. The task of the agents is to collectively identify this unknown quantity from a finite family of hypotheses, while relying solely on local interactions. The problem described above arises in a variety of contexts ranging from detection and object recognition using autonomous robots, to statistical inference and learning over multiple processors, to sequential decisionmaking in social networks. As such, the distributed inference/hypothesis testing problem enjoys a rich history [5, 6, 14, 12, 7, 16, 10, 9]
, where a variety of techniques have been proposed over the years, with more recent efforts directed towards improving the convergence rate. These techniques can be broadly classified in terms of the mechanism used to aggregate data: while consensusbased linear
[5, 6] and loglinear [14, 12, 7, 16] rules have been extensively studied, [10] and [9] propose a minprotocol that leads to the best known (asymptotic) learning rate for this problem.A much less explored aspect of distributed inference is that of communicationefficiency  a theme that is becoming increasingly important as we envision distributed autonomy with lowpower sensor devices, and limitedbandwidth wireless communication channels. Motivated by this gap in the literature, we seek to answer the following questions in this paper. (i) When should an agent exchange information with a neighbor? (ii) What piece of information should the agent exchange? To address the questions posed above, we draw on ideas from the theory of eventtriggered control. The initial results [15, 4] on this topic were centered around stabilizing dynamical systems by injecting control inputs only when needed, as opposed to the traditional approach of periodic control inputs. Since then, the ideas emanating from this line of work have found their way into the design of eventdriven control and communication techniques for multiagent systems; the recent survey [13] provides an excellent overview of such techniques, focusing primarily on variations of the basic consensus problem. Notably, the common recipe for designing such techniques centers around a Lyapunov argument for deterministic systems. However, it is not at all apparent how such design ideas can be exploited for the stochastic inference problem we consider in this paper.^{1}^{1}1
The stochastic nature of our problem arises from the fact that the signals seen by each agent are random variables.
In this context, our main contributions are as follows.Contributions: The main contribution of this paper is the development of a novel eventtriggered distributed learning rule along with a detailed theoretical characterization of its performance. Our approach to learning is based on the principle of diffusing low beliefs on each false hypothesis across the network. Building on this principle, we design a trigger condition that carefully takes into account the specific structure of the problem, and enables an agent to decide, using purely local information, whether or not to broadcast its belief^{2}^{2}2By an agent’s belief vector, we imply a distribution over the set of hypotheses; this vector gets recursively updated over time as an agent acquires more information. on a given hypothesis to a given neighbor. Specifically, based on our eventtriggered strategy, an agent broadcasts only those components of its belief vector that have adequate “innovation”, to only those neighbors that are in need of the corresponding pieces of information. In this way, our approach not only reduces the number of communication rounds, but also the amount of information transmitted in each round.
We establish that our proposed eventtriggered learning rule enables each agent to learn the true state exponentially fast under standard assumptions on the observation model and the network structure. We characterize the learning rate of our algorithm, and identify conditions under which one can achieve the best known learning rate of [9], even when the intercommunication intervals between the agents grow unbounded over time. In other words, we identify sparse communication regimes where communicationefficiency comes essentially for “free”. We further demonstrate, both in theory and in simulations, that our eventtriggered scheme has the potential of reducing information flow from uninformative agents to informative agents. Finally, we argue that if asymptotic learning of the true state is the only consideration, then one can allow for communication schemes with arbitrarily long intervals between successive communications.
2 Model and Problem Formulation
Network Model: We consider a group of agents , and model interactions among them via an undirected graph .^{3}^{3}3The results in this paper can be easily extended to directed graphs. An edge indicates that agent can directly transmit information to agent , and vice versa. The set of all neighbors of agent is defined as . We say that is rooted at , if for each agent , there exists a path to it from some agent . For a connected graph , we will use to denote the length of the shortest path between and .
Observation Model: Let denote possible states of the world, with each state representing a hypothesis. A specific state , referred to as the true state of the world, gets realized. Conditional on its realization, at each timestep , every agent privately observes a signal , where denotes the signal space of agent .^{4}^{4}4We use and to represent the set of nonnegative integers and positive integers, respectively. The joint observation profile so generated across the network is denoted , where , and . Specifically, the signal is generated based on a conditional likelihood function , the th marginal of which is denoted , and is available to agent . The signal structure of each agent is thus characterized by a family of parameterized marginals . We make certain standard assumptions [5, 6, 14, 12, 7]: (i) The signal space of each agent , namely , is finite. (ii) Each agent has knowledge of its local likelihood functions , and it holds that , and . (iii) The observation sequence of each agent is described by an i.i.d. random process over time; however, at any given timestep, the observations of different agents may potentially be correlated. (iv) There exists a fixed true state of the world
(unknown to the agents) that generates the observations of all the agents. The probability space for our model is denoted
, where , is the algebra generated by the observation profiles, and is the probability measure induced by sample paths in . Specifically, . We will use the abbreviation a.s. to indicate almost sure occurrence of an event w.r.t. .The goal of each agent in the network is to eventually learn the true state . However, the key challenge in achieving this objective arises from an identifiability problem that each agent might potentially face. To make this precise, define . In words, represents the set of hypotheses that are observationally equivalent to from the perspective of agent . Thus, if , it will be impossible for agent to uniquely learn the true state without interacting with its neighbors.
In the next section, we will develop a distributed learning algorithm that not only resolves the identifiability problem described above, but does so in a communicationefficient manner. Before describing this algorithm, we first recall the following definition from [10] that will show up in our subsequent developments.
Definition 1.
(Source agents) An agent is said to be a source agent for a pair of distinct hypotheses if it can distinguish between them, i.e., if , where represents the KLdivergence [1] between the distributions and . The set of source agents for pair is denoted .
Throughout the rest of the paper, we will use as a shorthand for .
3 An EventTriggered Distributed Learning Rule
BeliefUpdate Strategy: In this section, we develop an eventtriggered distributed learning rule that enables each agent to eventually learn the truth, despite infrequent information exchanges with its neighbors. Our approach requires each agent to maintain a local belief vector , and an actual belief vector
, each of which are probability distributions over the hypothesis set
. While agent updates in a Bayesian manner using only its private signals (see eq. (2)), to formally describe how it updates , we need to first introduce some notation. Accordingly, let be an indicator variable which takes on a value of 1 if and only if agent broadcasts to agent at time . Next, we define as the subset of agent ’s neighbors who broadcast their belief on to at time . As part of our learning algorithm, each agent keeps track of the lowest belief on each hypothesis that it has heard up to any given instant , denoted by . More precisely, , and ,(1) 
We are now in position to describe the beliefupdate rule at each agent: and are initialized with (but otherwise arbitrarily), and subsequently updated as follows :
(2) 
(3) 
Communication Strategy: We now focus on specifying when an agent broadcasts its belief on a given hypothesis to a neighbor. To this end, we first define a sequence of eventmonitoring timesteps, where , and Here, is a continuous, nondecreasing function that takes on integer values at integers. We will henceforth refer to as the eventinterval function. At any given time , let represent agent ’s belief on the last time (excluding time ) it transmitted its belief on to agent . Our communication strategy can now be described as follows. At , each agent broadcasts its entire belief vector to every neighbor. Subsequently, at each , transmits to if and only if the following event occurs:
(4) 
where is a nonincreasing function, which we will henceforth call the threshold function. If , then an agent does not communicate with its neighbors at time , i.e., all interagent interactions are restricted to timesteps in , subject to the triggercondition given by (4). Notice that we have not yet specified the functional forms of and ; we will comment on this topic later in Section 4.
Summary: At each timestep , and for each hypothesis , the sequence of operations executed by an agent is summarized as follows. (i) Agent updates its local and actual beliefs on via (2) and (3), respectively. (ii) For each neighbor , it decides whether or not to transmit to , and collects .^{5}^{5}5If , this step gets bypassed, and . (iii) It updates via (1) using the (potentially) new information it acquires from its neighbors at time .
Intuition: The premise of our beliefupdate strategy is based on diffusing low beliefs on each false hypothesis. For a given false hypothesis , the local Bayesian update (2) will generate a decaying sequence for each . Update rules (1) and (3) then help propagate agent ’s low belief on to the rest of the network. We point out that in contrast to our earlier work [10, 9], where for updating , agent used the lowest neighboring belief on at the previous timestep , our approach here requires an agent to use the lowest belief on that it has heard up to time , namely . This modification will be crucial in our convergence analysis.
To build intuition regarding our communication strategy, let us consider the network in Fig 1. Suppose , and , i.e., agent 1 is the only informative agent. Since our principle of learning is based on eliminating each false hypothesis, it makes sense to broadcast beliefs only if they are low enough. Based on this observation, one naive approach to enforce sparse communication could be to set a fixed low threshold, say , and wait till beliefs fall below such a threshold to broadcast. While this might lead to sparse communication initially, in order to learn the truth, there must come a time beyond which the beliefs of all agents on the false hypothesis always stay below , leading to dense communication eventually. The obvious fix is to introduce an eventcondition that is statedependent. Consider the following candidate strategy: an agent broadcasts its belief on a state only if it is sufficiently lower than what it was when it last broadcasted about . While an improvement over the “fixedthreshold” strategy, this new scheme has the following demerit: broadcasts are not agentspecific. In other words, going back to our example, agent 2 (resp., agent 3) might transmit unsolicited information to agent 1 (resp., agent 2)  information, that agent 1 (resp., agent 2) can do without. To remedy this, one can consider a request/poll based scheme as in [2], where an agent receives information from a neighbor only by polling that neighbor. However, now each time agent 2 needs information from agent 1, it needs to place a request, the request itself incurring extra communication.
Given the above issues, we ask: Is it possible to devise an eventtriggered scheme that eventually stops unnecessary broadcasts from agents 3 to 2, and 2 to 1, while preserving essential information flow from agents 1 to 2, and 2 to 3? More generally, we seek a triggering rule that can reduce transmissions from uninformative agents to informative agents. This leads us to the event condition in Eq. 4. For each , an agent broadcasts to a neighbor only if has adequate “innovation” w.r.t. ’s last broadcast about to , and ’s last broadcast about to . A decreasing threshold function makes it progressively harder to satisfy the event condition in Eq. 4, demanding more innovation to merit broadcast as time progresses.^{6}^{6}6We will see later on (Prop. 2) that for the network in Fig. 1, this scheme provably stops communications from agents 3 to 2, and 2 to 1, eventually. The rationale behind checking the event condition only at timesteps in is twofold.^{7}^{7}7While this might appear similar to the Periodic EventTriggering (PETM) framework [3] where events are checked periodically, the sequence can be significantly more general than a simple periodic sequence. First, it saves computations since the event condition need not be checked all the time. Second, and more importantly, it provides an additional instrument to control communicationsparsity on top of eventtriggering. Indeed, a monotonically increasing eventinterval function implies fewer agent interactions with time, since all potential broadcasts are restricted to . In particular, without the event condition in Eq. 4, our communication strategy would boil down to a simple timetriggered rule, akin to the one studied in our recent work [8].
We close this section by highlighting that our event condition (i) is specific, since an agent may not be equally informative about all states; (ii) is neighborspecific, since not all neighbors might require information; (iii) can be checked using local information only; and (iv) leverages the structure of the specific problem under consideration.
4 Main Results
In this section, we state the main results of this paper and discuss their implications. Proofs of all results are deferred to Section 5. To state the first result concerning the convergence of our learning rule, let be used to denote the integral of , and represent the inverse of . Since is strictly positive by definition, is strictly increasing, and hence, is welldefined.
Theorem 1.
Suppose the functions and satisfy:
(5) 
Furthermore, suppose the following conditions hold. (i) For every pair of hypotheses , the source set is nonempty. (ii) The communication graph is connected. Then, the eventtriggered distributed learning rule governed by (1), (2), (3), and (4) guarantees the following.

(Consistency): For each agent , a.s.

(Exponentially Fast Rejection of False Hypotheses): For each agent , and for each false hypothesis the following holds:
(6)
At this point, it is natural to ask: For what classes of functions does the result of Theorem 1 hold? The following result provides an answer.
Corollary 1.
Suppose the conditions in Theorem 1 hold.

Suppose , where is any positive integer. Then, for each , and :
(7) 
Suppose , where is any positive integer. Then, for each , and :
(8)
Proof.
The proof follows by directly computing the limit in Eq. (5). For case (i), , and for case (ii), . ∎
Clearly, the communication pattern between the agents is at least as sparse as the sequence . The eventtriggering strategy that we employ introduces further sparsity, as we establish in the next result.
Proposition 1.
Suppose the conditions in Theorem 1 are met. Then, there exists such that , and for each , such that the following hold.

At each such that , and .

Consider any , and . Then, at each , such that .^{8}^{8}8In this claim, might depend on .
The following result is an immediate application of the above proposition.
Proposition 2.
Suppose the conditions in Theorem 1 are met. Additionally, suppose is a tree graph, and for each pair , . Consider any , and let . Then, each agent stops broadcasting its belief on to its parent in the tree rooted at eventually almost surely.
A few comments are now in order.
On the nature of and : Intuitively, if the eventinterval function does not grow too fast, and the threshold function does not decay too fast, one should expect things to fall in place. Theorem 1 makes this intuition precise by identifying conditions on and
that lead to exponentially fast learning of the truth. In particular, our framework allows for a considerable degree of freedom in the choice of
and . Indeed, from (5), we note that any that decays subexponentially works for our purpose. Moreover, Corollary 1 reveals that up to integer constraints, can be any polynomial or exponential function.Design tradeoffs: What is the price paid for sparse communication? To answer the above question, we set as benchmark the scenario studied in our previous work [9], where we did not account for communication efficiency. There, we showed that each false hypothesis gets rejected exponentially fast by every agent at the networkindependent rate:  the best known rate in the existing literature on this problem. We note from (6) that it is only the eventinterval function that potentially impacts the learning rate, since . However, from claim (i) in Corollary 1, we glean that, polynomially growing intercommunication intervals between the agents, coupled with our proposed eventtriggering strategy, lead to no loss in the longterm learning rate relative to the benchmark case in [9], i.e., communicationefficiency comes essentially for “free” under this regime. With exponentially growing eventinterval functions, one still achieves exponentially fast learning, albeit at a reduced learning rate that is networkstructure dependent (see Eq. 8). The above discussion highlights the practical utility of our results in understanding the tradeoffs between sparse communication and the rate of learning.
Sparse communication introduced by eventtriggering: Observe that being able to eliminate each false hypothesis is enough for learning the true state. In other words, agents need not exchange their beliefs on the true state (of course, no agent knows a priori what the true state is). Our eventtriggering scheme precisely achieves this, as evidenced by claim (i) of Proposition 1: every agent stops broadcasting its belief on eventually almost surely. In addition, an important property of our eventtriggering strategy is that it reduces information flow from uninformative agents to informative agents. To see this, consider any false hypothesis , and an agent . Since , agent ’s local belief will stop decaying eventually, making it impossible for agent to lower its actual belief without the influence of its neighbors. Consequently, when left alone between consecutive eventmonitoring timesteps, will not be able to leverage its own private signals to generate enough “innovation” in to broadcast to the neighbor who most recently contributed to lowering . The intuition here is simple: an uninformative agent cannot outdo the source of its information. This idea is made precise in claim (ii) of Proposition 1. To further demonstrate this facet of our rule, Proposition 2 stipulates that when the baseline graph is a tree, then all upstream broadcasts to informative agents stop after a finite period of time.
4.1 Asymptotic Learning of the Truth
If asymptotic learning of the true state is all one cares about, i.e., if exponential convergence is no longer a consideration, then one can allow for arbitrarily sparse communication patterns, as we shall now demonstrate. Accordingly, we first allow the baseline graph to now change over time. To allow for this generality, we set , i.e., the event condition (4) is now monitored at each timestep. Furthermore, we set At each timestep , and for each , an agent decides whether or not to broadcast to an instantaneous neighbor by checking the event condition (4). While checking this condition, if agent has not yet transmitted to (resp., heard from) agent about prior to time , then it sets (resp., ) to . Update rules (1), (2), (3) remain the same, with now interpreted as . Finally, by an union graph over an interval , we will imply the graph with vertex set , and edge set . With these modifications in place, we have the following result.
Theorem 2.
Suppose for every pair of hypotheses , is nonempty. Furthermore, suppose for each , the union graph over is rooted at . Then, the eventtriggered distributed learning rule described above guarantees a.s.
While a result of the above flavor is well known for the basic consensus setting [11], we are unaware of its analogue for the distributed inference problem. When , we observe from Theorem 2 that, as long as each agent transmits its belief vector to every neighbor infinitely often, all agents will asymptotically learn the truth. In particular, other than the above requirement, our result places no constraints on the frequency of agent interactions.
5 Proofs
In this section, we provide proofs of all our technical results. We begin by compiling various useful properties of our update rule which will come handy later on.
Lemma 1.
Suppose the conditions in Theorem 1 hold. Then, there exists a set with the following properties. (i) . (ii) For each , there exist constants and such that
(9) 
(iii) Consider a false hypothesis , and an agent . Then on each sample path , we have:
(10) 
Although we consider a modified update rule as compared to that in [9], the proofs of claims (ii) and (iii) in the above Lemma essentially follow the same arguments as that of [9, Lemma 2] and [9, Lemma 3], respectively; we thus omit them here. The following result will be the key ingredient in proving Theorem 1.
Lemma 2.
Consider a false hypothesis and an agent . Suppose the conditions stated in Theorem 1 hold. Then, the following is true for each agent :
(11) 
Proof.
Let be the set of sample paths for which assertions (i)(iii) of Lemma 1 hold. Fix a sample path , an agent , and an agent . When , the assertion of Eq. (11) follows directly from Eq. (10) in Lemma 1. In particular, this implies that for a fixed , , such that:
(12) 
Moreover, since , Lemma 1 guarantees the existence of a timestep , and a constant , such that on , . Let . Let be the first evenmonitoring timestep in to the right of .^{9}^{9}9We will henceforth suppress the dependence of various quantities on , and for brevity. Now consider any such that . In what follows, we will analyze the implications of agent deciding whether or not to broadcast its belief on to a onehop neighbor at . To this end, we consider the following two cases.
Case 1: , i.e., broadcasts to at . Thus, since , we have from (1). Let us now observe that :
(13)  
In the above inequalities, (a) follows directly from (3), (b) follows by noting that the sequence is nonincreasing based on (1), and (c) follows from (12) and the fact that all beliefs on are bounded below by for .
Case 2: , i.e., does not broadcast to at . From the event condition in (4), it must then be that at least one of the following is true: (a) , and (b) . Suppose . From (12), we then have:
(14) 
In words, the above inequality places an upper bound on the belief of agent on when it last transmitted its belief on to agent , prior to timestep ; at least one such transmission is guaranteed to take place since all agents broadcast their entire belief vectors to their neighbors at . Noting that , , using (3), (14), and arguments similar to those for arriving at (13), we obtain:
(15) 
where the last inequality follows from the fact that is a nonincreasing function of its argument. Now consider the case when . Following the same reasoning as before, we can arrive at an identical upperbound on as in (14). Using the definition of , and the fact that agent incorporates its own belief on in the update rule (1), we have that . Using similar arguments as before, observe that the bound in (15) holds for this case too.
Combining the analyses of cases 1 and 2, referring to (13) and (15), and noting that , we conclude that the bound in (15) holds for each such that . Now since , for any we have:
(16) 
Next, noting that is nondecreasing, observe that:
(17) 
The above yields: . Fix any timestep , let be the largest index such that , and be the largest index such that . Observe:
(18) 
Using the above inequality, the fact that , and referring to (15), we obtain:
(19) 
From the definition of , we have . This yields:
(20)  
From (19) and (20), we obtain the following :
(21) 
where , and . Now taking the limit inferior on both sides of (21) and using (5) yields:
(22) 
Finally, since the above inequality holds for any sample path , and an arbitrarily small , it follows that the assertion in (11) is true for every onehop neighbor of agent .
Now consider any agent such that . Clearly, there must exist some such that . Following an identical line of reasoning as before, it is easy to see that with measure 1, decays exponentially at a rate that is at least times the rate at which decays to zero. From (22), the latter rate is at least , and hence, the former is at least . This establishes the claim of the lemma for all agents that are twohops away from agent . Since is connected, given any , there exists a path in from to . One can keep repeating the above argument along the path and proceed via induction to complete the proof. ∎
We are now in position to prove Theorem 1.
Proof.
Proof.
(Proposition 1) Let the set have the same meaning as in Lemma 2. Fix any , and note that since the conditions of Theorem 1 are met, on . We prove the first claim of the proposition via contradiction. Accordingly, suppose the claim does not hold. Since there are only finitely many agents, this implies the existence of some and some , such that broadcasts its belief on to infinitely often, i.e., there exists a subsequence of at which the eventcondition (4) gets satisfied for . From (4), , where . This implies , contradicting the fact that on , .
For establishing the second claim, fix , , and . Since , there exists and , such that This follows from the fact that since is observationally equivalent to for agent , the claim regarding in Eq. (9) applies identically to . Note also that since the conditions of Theorem 1 are met, on . From (1), as well. Thus, there must exist some such that . Let . Consider any . We claim:
(23) 
(24) 
To see why the above inequalities hold, consider the update of based on (3). Since , we have . Noting that the denominator of the fraction on the R.H.S. of (3) is at most , we obtain: If , then the claim follows. Else, if , then since no communication occurs at , we have from (1) that We can keep repeating the above argument for each to establish the claim. In words, inequalities (23) and (24) reveal that agent cannot lower its belief on the false hypothesis between two consecutive eventmonitoring timesteps when it does not hear from any neighbor. We will make use of this fact repeatedly during the remainder of the proof. Let be the first timestep in to the right of . Now consider the following sequence, where :
(25) 
The above sequence represents those eventmonitoring timesteps at which decreases. We first argue that is welldefined, i.e., each term in the sequence is finite. If not, then based on (24), this would mean that remains bounded away from , contradicting the fact that on . Next, for each , let We claim that . To see why this is true, suppose, if possible, . Then, based on the definition of , we would have . However, as , we have from (3) that , leading to the desired contradiction. In the final step of the proof, we claim that does not broadcast its belief on to over .
To establish this claim, we start by noting that based on the definitions of and , . Let us first consider the case when there are no intermediate eventmonitoring timesteps in , i.e., and are consecutive terms in . Then, at
Comments
There are no comments yet.