The theory of adaptive decision systems lies at the intersection of the fields of decision theory [1, 2] and adaptive learning and control [3, 4]. In many instances, the qualification “adaptive” in adaptive decision systems refers to the ability of the system to select the best action based on the observed data  as in cognitive radar , active hypothesis testing , or controlled sensing [8, 9]. The qualification “adaptive” can also refer to the ability of the decision system to track changes in the underlying state of nature and to monitor its drifts over time and deliver reliable decisions in real time. The theme of this article falls into this second type of decision systems.
The classical implementation of adaptive decision systems has often relied on the use of centralized (fusion) processing units. In more recent years, there has been a shift from centralized architectures to sensor network architectures [10, 11, 12, 13, 14, 15, 16] where data are monitored/collected in a distributed fashion by a collection of individually simple devices but the processing continues to be centralized. Energy efficiency, robustness, and security issues become a challenge over such implementations. Besides, the presence of a single central unit makes the system vulnerable to failures and external attacks [11, 12, 17, 18]. One approach to remedy these difficulties is the SENMA paradigm in which several mobile central units travel across the network to query the remote nodes from close proximity [19, 20, 21, 22].
A more prominent and flexible solution is to avoid the presence of central units. A fully-flat or fully-decentralized sensor network refers to a network in which all the information processing takes place at the nodes in a fully distributed fashion, and no data storage or processing at centralized devices is allowed. The evident mitigation of security and failure issues, and the added robustness come at the expense of the need for local information processing capabilities at the nodes, which will now need to interact with each other, perform processing tasks with groups of nearby agents, and arrive at local decisions. Statistical signal processing over fully-decentralized networks or graphs has become an active area of research (e.g., see the overviews in [23, 24] and the many references therein). The theme of this work is to design and analyze an adaptive decision system over such networks.
I-a Related Work
Inference problems over fully-flat sensor networks have received considerable attention in the last years in connection with estimation problems first and, more recently, in connection with detection/decision problems by employing either consensus[25, 26, 27, 28, 29, 30, 31] or diffusion [32, 33, 34, 35, 36, 37, 24, 38, 39, 40, 41] strategies.
Consensus solutions employ diminshing step-sizes to enhance the memory properties of the decision system, which leads to asymptotically optimal performance [25, 26, 27, 28, 29, 30]. Unfortunately, decaying step-size parameters limit the adaptation ability of the resulting network because learning comes to a halt as the step-size parameter approaches zero. Switching to constant step-size adaptation poses a challenge for consensus-based solutions because of an inherent asymmetry in the update equations of consensus implementations. This asymmetry has been studied in some detail and shown earlier in the works [23, 34] to be a source of instability when constant step-sizes are used for adaptation purposes. In other words, consensus strategies under constant step-sizes can be problematic for applications that necessitate continuous learning due to potential instability. This fact motivates us to focus on diffusion implementations since these strategies do not suffer from the aforementioned asymmetry and have been shown to deliver superior performance under both constant and decaying step-size learning scenarios [23, 42]. There have been a series of works that develop the theory of diffusion strategies with constant step-size for inference purposes and explore their capabilities for learning and adaptation in dynamic environments [32, 33, 34, 35, 36, 37, 24, 38, 39, 40, 41]. For example, references [32, 33, 34] deal with estimation problems, and the latter also addresses a comparison between consensus and diffusion protocols. In  the adaptive diffusion scheme for detection is studied, and [36, 37, 24] present extensive overviews of these detection algorithms, as well as many access points to the related literature. The learning behavior of the network is investigated in [38, 39], while a large deviation analysis and the so-called exact asymptotic framework are the focus of [40, 41].
I-B Contribution & Preview of the Main Results
The diffusion scheme considered in this paper employs a modified form of the adapt-then-combine (ATC) diffusion rule, which has some advantages with respect to alternative schemes . According to the ATC rule, each node updates (adapts) its status by incorporating the fresh information coming from new measurements, and then makes its current status available to its neighbors for the combination stage. In the combination stage each node weighs its status with those of its neighbors. In all the articles mentioned so far in this introduction it is assumed that the communication among nearby nodes is essentially unconstrained. This means that the nodes can share their state with the neighbors with full precision.
In many practical scenarios, an unconstrained inter-node communication capability cannot be guaranteed, and the system designer is faced with the problem of revisiting the signal/information processing of the network in order to take into account this limitation. Thus, the basic consideration that motivates our work is that the nodes of the network cannot exchange their state as they are, because the communication links do not support messages with arbitrary precision. Taking this viewpoint to one extreme, we assume severe communication constraints which impose that only one bit can be reliably delivered, per link usage, over the inter-node links. Accordingly, in the combination stage of the ATC rule, the neighbors of node cannot be informed about the value of the status of node , but they can only be informed about a one-bit quantized quantity. In the binary detection problem addressed here, this quantized quantity can be regarded as a local decision made by node at the current time.
Of course, decentralized inference using quantized messages in networks equipped with a central unit (or having other classical structures, such as the tandem architecture or some variation thereof) has a long history, see e.g., [43, 44, 45, 46, 47] and the references therein. Also, consensus implementations with quantized messages has been widely investigated [48, 49, 50, 51, 52, 53, 54, 55, 56]. Apparently there are not similar studies on distributed strategies ensuring continuous learning and adaptation even under drifting or non-stationary conditions.
The one-bit diffusion messaging scheme addressed in this paper poses new challenges. The combination stage of the ATC scheme will now fuse discrete and continuous variables and the analysis of the steady-state distribution becomes more complex than that developed, e.g., in . In contrast to the results of 
, our analysis shows that (a version of) the central limit theorem (CLT) only applies in the special circumstance that the step size is small and the weight assigned to local decisions gathered by neighbor nodes is vanishing. In general cases, for arbitrary step sizes and combination weights, deriving the steady-state statistical distributions requires separate analysis of the continuous and discrete components. Neither of these can be, in general, approximated by a Gaussian distribution via some version of the CLT, and different analysis tools are required.
The main results of this work can be summarized as follows. By exploiting a key distributional structural property [see (17)], the steady-state distribution of the continuous component [see (15)] is obtained in an integral form that involves the log-characteristic function. A series approximation for the continuous component distribution, particularly suitable for numerical analysis, is also provided. These results are collected in Theorem 1.
As to the discrete component, we show that this reduces to a combination of geometric series with random signs —the so-called asymmetric Bernoulli convolution. The Bernoulli convolution has been widely studied in the literature for its measure-theoretic implications and it is known that, aside from some special cases, its distribution does not reduce to simple forms. This notwithstanding, exploiting the fact that the node status is the sum of the discrete component with the continuous one, we derive simple approximations in the regime of highly reliable local decisions [, see (7)]. In principle, approximations of any degree can be developed, but the second-order approximation detailed in Sec. IV-B2 gives accurate results even for moderately large values of and . The combination of several Bernoulli convolutions requires numerical convolutions and we develop careful approximations for the individual contributions so that these convolutions can be easily computed over discrete, low-cardinality, sets.
) show that the shape of these distributions is by no means obvious. It is rewarding that the developed theory is able to closely follow those shapes for a wide range of values of the relevant system parameters. The main performance figures are the system-level detection and false alarm probabilitiesand , which are straightforwardly related to the distributions of the network nodes. Expressing in function of , the receiver operating characteristic (ROC) curve is obtained (Figs. 7-8).
The analysis developed in this paper allows us to easily derive the decision performance of the system for a wide range of the parameters under the control of the system designer — step size and combination weights . A critical scenario is when the self-combination weights are very large and is very small, because the developed numerical procedures can be time-consuming. For this scenario we develop a tailored version of the CLT for triangular arrays and continuous parameters, which is the subject of Theorem 2.
The remainder of this paper is organized as follows. Section II introduces the classical adaptive diffusion scheme for detection. The one-bit-message version of these detection systems is designed in Sec. III, and the steady-state analysis is conducted in Sec. IV. The results of computer experiments are presented in Sec. V, and Sec. VI concludes the paper with final remarks. Some technical material is postponed to Appendices A-D.
Ii Adaptive Diffusion for Detection
We consider a multi-agent network consisting of nodes running an adaptive diffusion scheme to solve a binary hypothesis test problem in which the state of nature is represented by or . Using the same notation from , the update rule for the diffusion strategy is given by
where is the step-size parameter, usually much smaller than one. Moreover, the symbol denotes the data received by agent at time , while represents a local state variable that is updated regularly by the same agent through (1b). This latter expression combines the intermediate values from the neighbors of agent using the nonnegative convex combination weights . The weights are required to satisfy
) can be grouped together across all agents in vector form, say, as:
In this work we make the assumption that the incoming data is a statistic computed from some observed variable, say, , namely, is a prescribed function of . Under both hypotheses and , the observations are i.i.d. (independent and identically distributed) across all sensors ; they are also i.i.d. over time, meaning that, for each , the sequence is white. It follows that the are i.i.d. across all sensors and over time. We further assume that each57], under both hypotheses. This assumption is mainly because the case of continuous random variables is the most interesting in the presence of data quantization.
We refer to as the local observation, and to as the marginal decision statistic, where the adjective “marginal” is meant to indicate that is based on the single sample . The variable is referred to as the state of the node. The detection problem consists of comparing the state against a threshold level, say , and deciding on the state of nature or , namely,
While our formulation is general enough to address different types of marginal statistics, special attention will be given to the case in which is selected as the log-likelihood of :
where is the probability density function of under , .
Ii-a Some Technical Conditions
We introduce the following technical conditions. First, we let denote the expectation under hypothesis , , and assume that . The assumption is automatically verified when the marginal statistic is the log-likelihood (6) because, in that case, the quantities and
are two Kullback Leibler divergences and, therefore, they are strictly positive for distinct hypotheses
. We also assume that the varianceexists finite for . Note that, for simplicity, we are using the short-hand notation instead of .
Second, we let denote the probability operator under hypothesis , , and assume, for all agents , that
where is a local threshold level, and where and represent the marginal detection and false alarm probabilities, namely, the probabilities that would be obtained by making decisions based on the marginal statistic and the local threshold: decide locally in favor of , otherwise decide for . Note that is the same for all sensors, which is justified by the assumption of i.i.d. observations under both hypotheses. As a consequence, all sensors have the same marginal performance and . The assumption , in (7) rules out trivialities.
Iii One-bit Diffusion Messaging
The classical diffusion rule is described by equations (1a)-(1b), and its detection properties are studied in detail in [40, 41]. In the system described by (1a)-(1b), the data exchanged among the nodes are uncompressed and non-quantized. However, in most communications scenarios, a more realistic assumption is that the messages exchanged among the nodes are quantized. Accordingly, we will consider an update rule in which the information sent at time by node to its neighbors is the marginal statistic quantized to one bit, as follows:
where is the local threshold level from (7). Thus, if the state of nature is , node sends to its neighbors the message with probability , and the message with probability . Similarly, under , is sent with probability , and with probability . Needless to say, given that the nodes are aware of the detection problem they are faced with, there is no need to deliver the actual values , but simply a binary flag. We can interpret as representative of the marginal decision about the hypothesis, made by node by exploiting only its current observation .
When the nodes compute the log-likelihood of the observations using (6), we can set in (7) and (8). Comparing the log-likelihood of to zero corresponds to the optimal ML (Maximum Likelihood) marginal decision about the underlying hypothesis .
Note that in the scheme of (9a)-(9b) sensor sends to its neighbors the quantized version of the marginal decision statistic. One alternative would be a system in which the sensor sends to its neighbors the quantized version of its status. In this case, however, it would not be possible to derive a simple analytical relationship between and that yields an explicit expression for the state [as in (11) below], and the analytical tractability would be compromised. More importantly, the scheme (9a)-(9b) ensures improved adaptation performance properties. Indeed, by using (9a)-(9b), the quantity available at node will be used [through its quantized version ] by the set of neighbors of node (even though it is never made available to non-neighboring nodes). Changes in the state of nature are immediately reflected in the value of , while, with , the status of the node incorporates these changes only slowly, as shown in (1a) and (9a). In dynamic environments, where the state of nature changes with time, this means that the system described by (9a)-(9b) will be able to react more rapidly to these changes, especially when the self-combination coefficient is small.
Figure 1 illustrates the improved adaptive properties of the scheme (9a)-(9b) by showing the expected value for node of the network shown later in Fig. 2, when the ’s are Gaussian random variables distributed as detailed in the example of Sec. V-A. The state of nature is initially , then switches to at , and finally switches down to at . The solid curve in black represents for the system defined by recursion (9a)-(9b), where nodes exchange the quantized version of their marginal statistic, while the curve in blue refers to a one-bit message scheme in which the nodes exchange the quantized version of their status. It is evident that this latter system does not react promptly to changes in the underlying state of nature and therefore is less suitable to operate in dynamic environments, with respect to the system defined by (9a)-(9b).
For comparison purposes, Fig. 1 also shows the expected value of a diffusion scheme in which no restrictions are imposed on the messages and therefore the nodes are allowed to exchange the unquatized status . This is the diffusion system (1a)-(1b) studied in [40, 41]. The inset of Fig. 1 makes it evident that the scheme of (9a)-(9b) exhibits faster reaction even in comparison to the message-unconstrained diffusion scheme (1a)-(1b). Therefore, investigating the steady-state detection performance of the system defined by (9a)-(9b) is of great importance, and is the main theme of this work.
Iii-a Explicit Form of the State
Let us introduce the coefficients
For notational convenience, we also introduce the scalar:
which combines the step size with the self-combination coefficient . It is useful to bear in mind that is defined as a combination of and , although our notation does not emphasize that.
Consider the quantity in (11). After a change of variable we have:
For , the transient part converges exponentially to zero with probability one so that, by Slutsky theorem [1, Th. 11.2.11], converges in distribution to . In the following, we will investigate the steady-state properties of the system described by Eqs. (9a)-(9b) and, accordingly, we can set without loss in generality,
In the next section we analyze separately the two components (referred to as the continuous component) and (the discrete component), in the asymptotic regime of large number of iterations. This allows us to provide a suitable approximation for the statistical distribution of , which is our goal.
Iv Steady-State Analysis
Iv-a Continuous Component
Consider the quantity defined in (13). Since the random variables are i.i.d., the following equality in distribution holds as :
Let us introduce a new random variable , which is an independent copy of the random variable . From definition (15) the following structural property of immediately follows:
Let be the imaginary unit and, for , let
be the log-characteristic functions  of and , respectively, under hypothesis , . Also, denote by , , and
the cumulative distribution function (CDF) and the log-characteristic function of the steady-state continuous component, respectively, under .
Theorem 1 (Distribution of ): The continuous component converges in distribution for , and the CDF of its limit can be found as follows. Suppose that admits a density. Suppose also that can be expanded in a power series with radius of convergence , namely:
Then we have the following results.
i) The log-characteristic function can be uniquely expanded in a power series with radius of convergence :
ii) If , we have the representation:
and the series appearing in (22) is (absolutely and uniformly) convergent for .
iii) If , the following approximation holds:
where denotes the imaginary part, and where
iv) Let , and denote by the approximation error in (23). Neglecting the effect of the series truncation due the to finiteness of and , if
Proof: See Appendix A.
The approximation in (23) can be controlled by choosing and sufficiently large, and sufficiently small. Note that the choice of is related to that of . For a given , the value of is chosen so as to satisfy condition (25), and then we set to verify the second inequality in (24). A possible choice is , where is the largest integer . If the series truncation error obtained with this value of is not small enough, then a smaller value of is selected and a new value of is computed. This procedure is repeated until one obtains a negligible truncation error.
revealing that is Gaussian with mean and variance . This appears to be an obvious result because, for all ,
is a linear transformation of the variables, and linear transformations preserve Gaussianity, see e.g. . Note, however, that the Gaussianity of the asymptotic variable cannot be obtained by the usual assumption of the central limit theorem. This assumption is that the sum of the variances of the individual components diverge [60, Eq. (8-123)]. Instead, from (15) and (16) we have , and the sum of the variances of the individual terms at the right-hand side converges to a finite value.
Iv-B Discrete Component
We now derive an approximate distribution for the steady-state component , where is defined in (13). The approximation is valid for large and [see (7) for the definitions of these quantities], namely in the regime where the marginal decisions of the nodes are reliable enough.
Let us start by an obvious equality in distribution:
By introducing the normalized binary variables
whose alphabet is , the quantity in (28) can be rewritten, after straightforward algebra, as
From (8), note that under we have , while, under , .
Let us summarize some known properties of the series (31). Suppose that is in force. If we had , then expression (31) would represent a geometric series with equally likely random signs. This is known as the Bernoulli convolution, and attracted considerable interest since the pioneering works by Erdös [61, 62]. It is known that the Bernoulli convolution is absolutely continuous with a finite-energy density for almost every , and is purely singular for , being in that case supported on a Cantor set of zero Lebesgue measure. It is also easily verified that for the random variable is uniform in . We refer to  and the references therein for details.
Likewise, the asymmetric Bernoulli convolution with , which is of interest to us, has been extensively studied. For , it is known that is absolutely continuous for almost all , and singular for , where is the binary entropy function . Obviously, the same considerations hold under , with replaced by .
Returning to (31), let us assume first that is in force and consider the following approximation. For any positive integer :
where the approximation consists of assuming that all the binary digits are equal to the most likely value , yielding . To the other extreme, when they are all equal to the most unlikely value , we have , which shows that the error involved in the approximation (32) is bounded (with probability one) by
with the upper bound achieved when . Note, from (32), that we have approximated the random variabletaking on values.
To control the error in the approximation, we enforce the condition
where is the smallest integer . Note that the right-hand side of (36) is a decreasing function of and an increasing function of , when . Note also, from (36), that depends also on the hypothesis , . This dependence is not made explicit for simplicity of notation.
Iv-B1 First-Order Approximation of
We now exploit the assumption to get an approximation for . The value taken by the random variable in (32) depends on the value of the binary variables . For large values of , such string of binary variables typically consists of many “” and a few “”. This suggests to neglect the occurrences of two, or more, “”, leading to a random variable that takes on only values, instead of . According to this approximation, in Table I we report, arranged in ascending order, the values taken by , followed by the string of the binary digits that generates that value, which is referred to as the pattern (“” means “”, and “” means “”). In Table I the subindex to and is omitted.
The probabilities assigned to the values are shown in the last column of Table I, and are computed as follows. The probability assigned to the first pattern “” is the sum of the probabilities of all patterns in the form “”, where “” can be either “” or “”. The probability of the second pattern “” is the sum of the probabilities of all patterns in the form “”, and so forth. The general rule is to replace by stars all the symbols in the pattern following the symbol “”, if any.111In the case this corresponds to quantizing the random variable , with quantization regions that are intervals. For one can still regard the approximation as a quantization, but the quantization regions are not straightforward.
Iv-B2 Second-Order Approximation of
A similar approximation can be derived by assuming that the sequence in (32) contains at most two occurrences of the unlikely digit “”, in which case the random variable is approximated by a random variable that takes on the values shown in Table II, with associated patterns and probabilities. In Table II the subindex to and is omitted.
The probabilities in the last column of Table II are computed as follows. Consider a generic pattern with two occurrences of “”, and let us replace with the symbol “” all the “” appearing to the right of the rightmost “”. Then, the probability assigned to that pattern is the sum of the probabilities corresponding to all the distinct patterns obtained by assigning to the stars either the symbol “” or the symbol “”. For the patterns with only one “”, the probabilities are exactly those of the pattern, without modification. To understand why we use this convention, consider for instance the pattern “”, corresponding to the value . All probabilities of patterns of the form “” are included in the probability of some pattern with two “”, except the single pattern “”, whose probability is just .
Straightforward algebra shows that the values in the first column of Table II are arranged in ascending order only if222Note that is the golden ratio