Modeling and Mitigating Errors in Belief Propagation for Distributed Detection

04/10/2020 ∙ by Younes Abdi, et al. ∙ Jyväskylän yliopisto 0

We study the behavior of the belief-propagation (BP) algorithm affected by erroneous data exchange in a wireless sensor network (WSN). The WSN conducts a distributed binary hypothesis test where the joint statistical behavior of the sensor observations is modeled by a Markov random field whose parameters are used to build the BP messages exchanged between the sensing nodes. Through linearization of the BP message-update rule, we analyze the behavior of the resulting erroneous decision variables and derive closed-form relationships that describe the impact of stochastic errors on the performance of the BP algorithm. We then develop a decentralized distributed optimization framework to enhance the system performance by mitigating the impact of errors via a distributed linear data-fusion scheme. Finally, we compare the results of the proposed analysis with the existing works and visualize, via computer simulations, the performance gain obtained by the proposed optimization.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Dealing with a large collection of random variables and their interactions is a common practice when designing statistical inference systems. Graphical models, a.k.a., factor graphs, which are commonly used to capture the interdependencies between correlated random variables, are known to provide a powerful framework for developing effective low-complexity inference algorithms in various fields such as wireless communications, image processing, combinatorial optimization, and machine learning, see e.g.,

[8, 12, 10]. Belief propagation (BP) [21] is a well-known statistical inference algorithm that works based on parallel message-passing between the nodes in a factor graph. BP is sometimes referred to as the sum-product algorithm.

When working with the BP algorithm, we should bear in mind that digital computation and digital communication are both error-prone processes in general. The messages exchanged between the nodes in a wireless network can always be adversely affected by errors caused by unreliable hardware components, quantization processes, approximate representations, wireless channel impairments, etc. Even though the BP algorithm has been extensively studied in the literature, we have rather limited knowledge about how stochastic errors in messages affect the beliefs obtained and how these erroneous beliefs influence the result of statistical inference schemes implemented by the BP algorithm. This territory is difficult to explore mainly due to the nonlinearities in the BP message-passing iteration.

In [3], we have developed a systematic framework for analyzing the behavior of BP and optimizing its performance in a distributed detection scenario. In particular, we have shown that the decision variables built by the BP algorithm are, approximately, linear combinations of the local likelihoods in the network. Consequently, we have derived in [3] closed-form relationships for the system performance metrics and formulated a distributed optimization scheme to achieve a near-optimal detection performance. Moreover, we have discussed the relationship between the BP and the max-product algorithms in [2] where we extend the proposed framework in [3] to optimize the performance of the max-product algorithm in a distributed detection scenario. In this paper, we further extend that framework to gain insight into the impact of computation and communication errors, in a BP iteration, on the resulting decision variables and to effectively mitigate that impact. Examples of BP being used in distributed detection can be found in [14, 23, 9, 24].

Accumulation of message errors and their adverse effect on the performance of BP is analyzed in [6] where the message errors are modeled as uncorrelated random variables to find probabilistic guarantees on the magnitude of errors affecting the beliefs. Assuming uncorrelated behavior for errors is inspired in [6] by observing the behavior and stability of digital filters, in the presence of quantization effects, which can be analyzed reliably by assuming uncorrelated behavior in the corresponding random errors [22]. Such a modeling approach is in line with the von Neumann model of noisy circuits [20], which considers transient faults in logic gates and wires as message and node computation noise that is both spatially and temporally independent [19].

The behavior of BP implemented on noisy hardware is investigated in [5] where it is observed that under the so-called contracting mapping condition [11], the distance between successive messages in a noise-free BP decreases by the number of iterations. Consequently, in the presence of hardware (or computation) noise, the faulty messages which violate this trend can be detected and discarded (censored) from the BP iterations. Such an approach is termed censoring BP in [5] and is shown to performs well when the hardware noise distribution has a large mass at zero and has non-negligible masses at some points sufficiently away from zero. As an alternative approach, the so-called averaging BP (ABP) is also proposed in [5]. In this method, as the name implies, an average of the messages up to the last iteration is saved and then used, instead of the actual messages, to build the beliefs. This method is proposed and its convergence is established for general zero-mean computation noise distributions. Again, the von Neumann model is used in [5] to analyze the behavior of message errors.

In this paper, we use the fact that the BP algorithm and the linear data-fusion scheme are elegantly related to each other in the context of distributed detection. Fortunately, there already exists a rich collection of scientific works in the literature that investigate low-complexity detector structures based on linear fusion in various design scenarios [16, 15, 17, 18, 1]. In many of these works, the data-exchange process within the sensor network is assumed adversely affected by non-idealities in the underlying communication links. Hence, dealing with erroneous data is a familiar challenge when designing wireless sensor networks (WSN). We use this knowledge to cope with the impact of message errors on distributed detection systems realized by BP.

In particular, we approximate the message structure in BP by a linear expression to study the impact of erroneous data exchange on the BP algorithm and to clarify how it affects the performance of the distributed detection concerned. We derive approximate expressions to measure the strength of the cumulative errors that affect the BP-based decision variables. These expressions are in the form of mean-squared error (MSE) levels. We compare the MSE levels obtained with the one in [6] to gain insight into the behavior of BP and to see how computation and communication errors propagate throughout the underlying factor graph. Moreover, based on the proposed linear approximation, we show that ABP is effective in alleviating message errors and falls short of mitigating the impact of erroneous local likelihood ratios (LLRs) on the resulting decision variables.

We also show, under practical assumptions, that the decision variables built by an erroneous BP are disturbed by a sum of independent error components whose collective impact can be modeled, approximately, by Gaussian random variables. Consequently, we establish the probability distribution of the resulting erroneous decision variables, derive the performance metrics of the BP-based distributed detection in closed form, and propose a two-stage optimal linear fusion scheme to cope with the impact of errors on the system performance. We then develop a blind adaptation algorithm to realize the proposed two-stage optimization when the statistics describing the radio environment are not available

a priori.

Here is an overview of the paper organization: In Sec. II, we briefly explain the use of linear fusion and BP in distributed detection and provide the related formulations. In Sec. III, we discuss errors in BP and model their impact on the decision variables obtained. In Sec. IV, we view BP as a distributed linear fusion and formulate the proposed optimization framework. In Sec. V, we conduct computer simulations to verify our analysis and to illustrate how effectively the proposed method mitigates the impact of errors in a WSN with faulty devices. Finally, we provide our concluding remarks in Sec. VI.

Ii Linear Fusion and Belief Propagation for Distributed Detection

We consider binary random variables, represented by

, whose status are estimated based on

observations denoted made by a network of sensing nodes. Each node, say node , which intends to estimate the status of , collects observation samples, denoted by , and exchanges information with other nodes in the network to realize together a binary hypothesis test as . This test can be conducted with low implementation complexity in two alternative ways that are explained in the following.

Ii-a Linear Data-Fusion

Linear fusion has been extensively used in the context of spectrum sensing where the aim is to detect the presence or absence of a target signal by evaluating noisy observations made throughout a wireless sensor network (WSN). For brevity, we explain the uni-variate case here. In this detection scenario, each node, say node , collects the signal samples , where the random variable determines the presence or absence of the target signal in the radio environment. In this model,

denotes a vector of zero-mean Gaussian white noise samples while

denotes an unknown deterministic vector of target signal samples, which, in general, represent a superposition of multiple signals received at node from different transmitters.

The optimal approach to such a detection is known to be the so-called likelihood-ratio test (LRT) [7], which is conducted by evaluating the LLR, i.e., by where

(1)

where

(2)

where is referred to as the local LLR at node . By we represent the indicator function that returns one if its argument is positive and returns zero otherwise. It is clear that the LRT is, in fact, a matched-filtering process, which requires the target signal to be known a priori at the sensing nodes. In practice, the local sensing process is realized by energy detection due to its ease of implementation and due to the fact that its structure does not depend on the behavior of the target signal. Energy detection is realized by and the sensor outcomes are combined linearly to build a globaltest statistic [16, 15, 17, 18, 1], i.e.,

(3)

where and . Then, is compared against a predefined threshold to conduct the hypothesis test, i.e., . Assuming that the number of signal samples is large enough [16, 15, 17, 18, 1] such that ’s in (3) behave as Gaussian random variables, we can model the test summary , given the status of , as a Gaussian random variable as well. Specifically, for we have where for , and . Consequently, the false-alarm and detection probabilities, denoted and respectively, of this detector are derived in closed form by

(4)

where denotes the -function. Note that while . By setting , we have and then the detection performance can be optimized by maximizing the resulting over . This is the well-known Neyman-Pearson approach [7]. Through some algebraic manipulations, this optimization is formally stated as

(5)

where . This problem is solved by quadratic programming in [16], by semidefinite programming in [17], and by invoking the Karush-Kuhn-Tucker conditions in [18]. From these works, we know that the performance of linear fusion is close to the LRT performance. Alternatively, we can maximize the so-called deflection coefficient of the detector. This approach, which has a low computational complexity and leads to a good performance level, is realized by

(6)

where

(7)

Consequently, by using the Rayleigh-Ritz inequality [16], is obtained in closed form as . When is used in (7), the objective function is referred to as modified deflection coefficient.

Note that both optimizations in (5) and (6) can be realized while taking into account the impact of erroneous ’s. The so-called reporting errors in [16, 15, 17, 18, 1] model the impact of erroneous communication links through which the sensing nodes share their observations. The optimal fusion weights obtained by (5) and (6) emphasize the impact of local sensing outcomes generated in high SNR conditions while suppressing the impact of errors caused by the data communication process between the sensing nodes.

Extension of the linear detection structure in (3) to variables is discussed in [15], in the context of multiband spectrum sensing, where the detection performance is optimized by the so-called sequential optimization method that is based on maximizing the deflection coefficient of the system. In the following, we discuss the BP algorithm and show that it can be interpreted as a multivariate linear data-fusion.

Ii-B Belief Propagation

We model the sensor network structure concerned by a Markov random field (MRF) defined on an undirected graph . In this model, the set of vertices corresponds to the set of network nodes while each edge represents a possible connection between nodes and . Each node, say node , is associated with a random variable and the edge models a possible correlation between and . This model fits well into the commonly-used ad-hoc network configurations in which major network functionalities are conducted through pairwise i.e., one-hop, links between the nodes located close to each other. This design method is based on the common assumption that nodes located close enough to each other for one-hop communication, experience some levels of correlation between their sensor outcomes.

By using the MRF, we write as a product of univariate and bivariate functions, i.e.,

(8)

Note that in (8) refers to a normalization that ensures . When including the bivariate terms in the product, each edge in the factor graph is included in the product only once. This is realized by doing the multiplication on while . We use to denote the set of neighbors of node in the graph, i.e., . By using (8), we formulate the message received at node from node as

(9)

where by we denote all nodes connected to node except for node . We denote by the belief, about the status of , formed at node , which is obtained via multiplying the potential at node by the messages received from all its neighbors, i.e.,

(10)

The beliefs are used as estimates of the desired marginal distributions, i.e., . By adopting the commonly-used exponential model [21] to represent the a priori probability measure defined on , we have

(11)

For a given , we assume the local observations to be mutually independent. Consequently, we have [3]

(12)

Hence, by using (12), the BP messages are built as

(13)

and the beliefs at iteration are expressed as

(14)

In the log domain, (13) and (14) convert, respectively, to

(15)
(16)

where

(17)
(18)

denote, respectively, the estimated likelihood ratio at node and the message sent to node from node while and . In this model, denotes the signal received at node . Hence, indicates that the target signal is absent leaving the the spectrum free where node operates. If , then the corresponding spectrum band is occupied. ’s are calculated as in Eq. (16) in [3] by processing a window of sensing outcomes. Note that in (15) is merged into without having any impact on the rest of the analysis.

After iterations, is compared, as a decision variable, against a detection threshold at node to decide the status of , i.e., . By a linear approximation of (15), we have

(19)

where . Consequently, we see that where

(20)

Therefore, we see that, given enough time, all the local likelihood ratios observed in the network are linearly combined at node to calculate its decision variable . We have shown in [3] that the convergence of this linear message-passing algorithm is guaranteed when . The linear combination in (II-B) can be expressed as , which is compactly stated in matrix form as

(21)

where and while . Through some algebra, we can find the relationship between and ’s in (II-B). Specifically, we have

(22)

where and denotes a diagonal matrix whose main diagonal is equal to that of . The proof is provided in Appendix I.

It is now clear that to have convergence in the message-passing iteration (19), the spectral radius of has to be less than one. This criterion may be used to impose bounds on ’s to guarantee the convergence of the algorithm. Alternatively, the convergence can be guaranteed, without dealing with the complexities of finding the spectral radius, by using the contracting mapping condition as we have discussed in [3]. We use (22) in the following section to derive an estimation of the error strength affecting the decision variables built by an erroneous BP.

Iii Errors in Belief Propagation

Eq. (15) shows that at each BP iteration each node creates its messages in terms of its local LLR value as well as the messages received from the neighboring nodes at the previous iteration. In our system model, we assume that the local LLRs and the BP messages are erroneous. As in [5, 6], we use the von Neumann approach to modeling the joint statistical behavior of errors.

Iii-a Error Model and Analysis

Since the messages are multiplied together to build the beliefs, we formulate them as multiplicative perturbations affecting true (i.e., error-free) message values, i.e.,

(23)

where denotes the erroneous message sent to node from node at iteration while denotes the corresponding error, which is considered in this paper as a stochastic process.

Remark 1: Eq. (23) differs from the model used in [6] in the sense that the error model in that work measures the difference between the messages at iteration with their counterparts at the fixed point of the message-passing iteration. In other words, the error model in [6] measures the deviation of the messages at each iteration from their final value reached by BP after convergence. The stochastic error we discuss here is briefly studied in [6] under the notion of additional error.

By expressing the messages in the the log domain we have

(24)

where

(25)

Based on the von Neumann model, we assume that if , then for all . Consequently, we have . To measure the collective impact of errors on the belief of node , we use

(26)

where denotes the belief at node resulting from a BP iteration with erroneous messages as in (23) while denotes the belief of node at a fixed point reached by an error-free BP iteration. We use instead of to indicate the messages and beliefs at a fixed point of the error-free BP.

By assuming uncorrelated stochastic behavior for the message errors, an upper bound on cumulative errors affecting the beliefs can be obtained. Specifically, assuming for all , an upper bound on the resulting cumulative strength of errors at node is derived in [6] as,

(27)

where and

(28)

while

(29)

where

(30)
(31)

We use the upper bound in (27) in the log domain based on the fact that (see (17) and (26))

(32)

which leads to

(33)

Hence, in the detection structure discussed, (27) gives an upper bound on the MSE level observed in the decision variable at node .

Iii-B Linear Approximations

In our analysis, we distinguish between the message errors and the errors in the computation of local LLRs to gain further insight into the behavior of the BP algorithm. In particular, we model the erroneous local LLRs as and refer to ’s as likelihood errors (LE) while assuming that LEs are uncorrelated as well, i.e., for . We refer to ’s as message errors (ME) and assume that LEs and MEs are mutually independent. Moreover, we assume that all MEs and LEs are independent of the messages and of the local LLRs. Note that the bound in (III-A) does not take LEs into account.

Taking both types of error into account, we express the messages as

(34)
(35)

which show that the errors pass through the same nonlinear transformation (i.e., ) as the messages do. By using (34), we can analyze the behavior of errors. The proposed linear BP iteration in the presence of message errors is expressed as

(36)

Consequently, similar to the way (II-B) is derived, the resulting erroneous decision variable is formed as

(37)

which can be reorganized as

(38)

where

(39)

Hence, we have the following remark, which we will use in Sec. IV where we develop an optimization framework for the system.

Remark 2: Eq. (39) shows that, the error affecting the decision variable at node has two distinct components. The first component is built as a linear combination of LEs while the second one is the sum of the MEs received at node from its one-hop neighbors. The first component is fixed whereas the second one exhibits a new realization at every iteration.

According to (39), deviation from the error-free decision variables, caused by errors in the BP iterations, can approximately be measured by

(40)

where and while and denotes an -by-1 vector that contains ’s for where while .

Eq. (37) shows that when BP is used to realize a distributed detection, the erroneous local likelihoods in the network are combined linearly to build the decision variables. We can evaluate the impact of the errors on the system performance by analyzing the stochastic behavior of the erroneous decision variables . Given , the decision variable at node is obtained as a linear combination of independent random variables. Consequently, its conditional pdf is derived as

(41)

where

(42)

while and denote the convolution operator. Consequently, we have

(43)

where , and while . Solving gives a threshold value that fixes the false-alarm rate at . Similarly, fixes the detection rate at . Recall that ’s are found by using ’s, see (22).

As a common practical case, when the local LLRs and the errors follow Gaussian distributions

[16, 15, 17, 18, 1] the decision variable follows a Gaussian distribution as well and it is fully characterized by its first- and second-order statistics. Specifically, we have

(44)

where

(45)
(46)

In (45) we have assumed, without loss of generality, to have zero-mean errors. Note that, without the proposed approximation these performance measures are not available analytically due to the nonlinearity of (15). In the rest of the paper, we assume that the local likelihoods, LEs, and MEs are Gaussian random variables. Eq. (41

) shows that, according to the central limit theorem

[13], even if the local LLRs and errors are not Gaussian random variables, the stochastic behavior of the decision variables can still be approximately described by Gaussian distributions.

Iii-C Impact of Averaging

In ABP, the message-passing iteration is the same as in BP. However, instead of the actual message values, an average of the messages are used to build the decision variables. To be more specific, in the log domain and for , let

(47)

The decision variable at node is calculated by

(48)

Similar to our discussion regarding (II-B), we can show that when the message-passing iteration is error-free, . Hence, we have the following remark.

Remark 3: The averaging process does not alter the fixed points achieved by the error-free linear BP. From an approximation-based point of view, this observation is in line with the convergence analysis provided in [5].

The impact of averaging on LEs and MEs can be clarified by noting that

(49)

where, assuming to be large enough, we have

(50)

since . We can state (50) in the form of MSE as

(51)

Remark 4: Assuming to be large enough and the MEs to have zero mean, (50) shows that the resulting decision variable built by ABP in (48) is almost cleared of MEs. However, the averaging process has almost no impact on LEs.

Note that in ABP the message-passing iteration is the same as in BP and the averaging is only performed when computing the decision variables. Moreover, in ABP, instead of storing the messages in past iterations separately, we only need to store the sum of the messages up to the current iteration. As a consequence, the number of additional memory cells required can be kept constant [5]. We will use ABP in Sec. IV-B to build an offline learning-optimization structure for the linear BP in the presence of errors.

Iv Mitigating Errors by Linear Fusion

In this section, we first propose a two-stage linear fusion scheme to obtain a near-optimal detection performance by suppressing the impact of the errors. Then, we realize the proposed optimization in a blind decentralized setting where the required statistics are not available a priori.

Iv-a Linear Fusion

First, since , we further approximate the decision variable in (II-B) as

(52)

Due to the symmetry of the data-fusion process in (II-B), the approximation in (52) is an effective approach to building a distributed computing framework for the system performance optimization. In this framework, each node interacts only with its immediate neighbors. We have clarified this symmetry in [3, Sec. III-B]. By taking into account the errors while analyzing the linear BP, (II-B) and (37) lead to

(53)

We see that the disturbance on the decision variable caused by LEs is built, approximately, as a linear combination of ’s with ’s acting as weights in this combination. Therefore, we use ’s as design parameters to mitigate the impact of ’s. Moreover, MEs are combined in (53) linearly and in this combination, all weights are one. We propose to extend this combination by using a modified version of (35) as

(54)

This modification in the structure of the decision variable does not affect the convergence of the proposed linear BP since it does not alter the message-passing iteration. Now, based on an approximation similar to the one in (53), we have

(55)

Since

is a Gaussian random variable, we only need its mean and variance to characterize its statistical behavior. Specifically, for

, we have

(56)

where

(57)
(58)

where in which denotes the Hadamard product while and . Moreover, , , , and are -by-1 vectors containing ’s, ’s, ’s, and ’s for , respectively. Eq. (56) gives the system false-alarm probability for and the detection probability for . The false-alarm probability can be set to by

(59)

and then by using (56) – (58), and can jointly be optimized in a Neyman-Pearson setting.

In order to avoid the challenges associated with this optimization, we maximize the deflection coefficient of the detector. We already know that the resulting detector performs well when the decision variables follow the Gaussian distribution. In this manner, we mitigate the joint impact of LEs and MEs with low computational complexity.

The proposed optimization is conducted in two consecutive stages based on the fact that we can decompose the construction of into two consecutive fusion processes. That is, we first optimize ’s by considering the impact of ’s on ’s. Then, we consider the resulting scaled LLRs, i.e., ’s, as new statistics to be linearly combined, while being weighted by ’s and distorted by ’s, to make the decision variable at node .

More specifically, first, we optimize in a hypothetical linear detector with its decision variable defined as

(60)

The coefficients resulting from this optimization scale up the more reliable local LLRs, with respect to the ones built under low SNR regimes, to suppress the effect of LEs. We denote the resulting fusion weights by . Then, we use within the structure of the actual detector to optimize to mitigate the impact of MEs. That is, we consider the following linear detector at node

(61)

where contains ’s for while . The vector contains ’s with . In this structure, the elements of ’s are seen as the actual local LLRs that are combined to build the decision variable at node while the combination takes into account the joint degrading effect of MEs and LEs.

Based on the material provided in Sec. II-A, the first stage of the proposed optimization is formally stated as

(62)

where

(63)

where . The resulting is then used to realize the second stage of the proposed optimization by solving

(64)

where

(65)

where and . Having and , the detection threshold