I Introduction
Standard optimization methods are computationally demanding when dealing with a large collection of correlated random variables. This is a wellknown challenge in designing distributed statistical inference techniques used in a wide range of signalprocessing applications such as channel decoding, image processing, spreadspectrum communications, distributed detection, etc. Alternatively, messagepassing algorithms over factor graphs provide a powerful lowcomplexity approach to characterizing and optimizing the collective impact of those variables on the desired system performance, see e.g.,
[22, 9, 5]. Consequently, a better understating of the statistical behavior of the messagepassing algorithms leads to statistical inference systems with better performance. Two widelyused messagepassing algorithms are the socalled sumproduct and maxproduct algorithms. We have analyzed the behavior of the sumproduct algorithm, a.k.a., the belief propagation algorithm, in [2].Our main focus in this paper is on the maxproduct algorithm which is an iterative method for approximately solving the problem of maximum a posteriori (MAP) estimation [23]. We analyze the behavior of this algorithm in a distributed detection scenario where every network node estimates a binaryvalued random variable based on noisy observations collected throughout the entire network. The correlations between the random variables are modeled by a pairwise Markov random field (MRF) [22] whose structure fits well into pairwise interactions between the nodes in an adhoc network configuration. An MRF is an undirected graph where vertices correspond to the random variables of interest and edges represent the correlations between them. By using the maxproduct algorithm, the estimation problem concerned is decomposed into a number of small optimizations performed locally at each node based on information provided by other nodes in the network via onehop communications per iteration.
We show that the maxproduct algorithm works as a linear datafusion process. Linear fusion schemes are commonly used in distributed detection systems to achieve nearoptimal performance with low implementation complexity, see e.g., [17, 18, 1]. Therefroe, we indicate that the knowledge already developed in distributed detection through inspecting linear fusion schemes can be used to better understand the behavior of the maxproduct algorithm. The proposed analysis is supported by a strong connection between the sumproduct and maxproduct operations. In particular, we show that, in the distributed detection scenario concerned, the behavior of the maxproduct algorithm is very similar to the behavior of the sumproduct algorithm and that the decision variables built by the maxproduct operation are linear combinations of the local likelihoods in the network—a behavior we have already observed in the sumproduct algorithm [2]. By using this linearity, we formulate the detection performance in closed form and propose a distributed optimization framework for the system.
Analyzing the statistical behavior of the messagepassing algorithms is challenging in general. Numerous works in the literature use some sort of approximation to offer a deeper insight into the behavior of those algorithms or to propose better inference methods. In [4, 6, 7] the use of an approximate messagepassing (AMP) algorithm is discussed assuming that a linear mixing problem is to be solved in the context of spars signal processing. The AMP is extended in [19, 20]
to deal with linear mixing structures observed through nonlinear channels. The analyses provided in those works assume independent identicallydistributed (i.i.d.) behaviors in the random variables of interest and in the parameters describing the mixing structure concerned. In the detection scenario considered in this paper, we assume Markovian dependencies between the random variables of interest and assume that the parameters describing the underlying mixing structure are correlated with possibly different probability distributions.
Our work is inspired by the findings in [13] and [24] where the sumproduct algorithm is configured to solve the problem of distributed MAP estimation in a cognitive radio network. Due to the nonlinearity of the sumproduct algorithm, which makes it difficult to formulate the detection performance obtained, [13] and [24] do not consider the system performance optimization problem. In [2], we argue that fitting a proper factor graph to the statistical behavior of a sensor network and running the sumproduct algorithm based on that graph is equivalent to optimizing a linear datafusion scheme in that network. Accordingly, we have proposed a lowcomplexity optimization framework in [2], which can be conducted effectively in a distributed setting. In this paper, we extend those arguments by showing that the same optimization framework achieves the optimal performance in a maxproductbased distributed detection as well. In particular, we make the following contributions in this paper:

We show that the messageupdate rule in the maxproduct algorithm is almost the same as its counterpart in the sumproduct algorithm.

We show that when performing a distributed MAP estimation via the maxproduct algorithm over a network modeled by a pairwise MRF, under certain practical conditions, the decision variables obtained are linear combinations of the local loglikelihood ratios (LLR) in the network.

We find the probability distribution function of the decision variables in a practical detection scenario and formulate the detection performance in closed form.

We show how to set the detection threshold to achieve a predefined detection performance.

We show that the optimal linear messagepassing algorithm in [2] attains the optimal detection performance of the maxproduct algorithm in the distributed detection scenario concerned.
As in [13, 24], and [2], we clarify our findings by considering a spectrum sensing scheme in a cognitive radio (CR) network. In these networks, the wireless nodes perform spectrum sensing, in bands allocated to the socalled primary users (PU), in order to discover vacant parts of the radio spectrum and establish communication on those temporarily or spatiallyavailable spectral opportunities [3]. In this context, CRs are considered secondary users (SU) in the sense that they have to vacate the spectrum, to avoid making any harmful interference, once the PUs are active.
The rest of the paper is organized as follows. In Section II, we formulate the MAP estimation problem and discuss how to solve it in a network of distributed agents via the sumproduct and maxproduct algorithms. In addition, we illustrate in Section II the connection between the sumproduct and maxproduct operations. Then, we analyze the behavior of the maxproduct algorithm in Section III to show that it works as a linear fusion scheme. In Section IV, we briefly discuss the use of linear datafusion in distributed detection along with the proposed optimization framework. And, finally, in Section V, we verify our analysis by computer simulations.
Ii MAP Estimation in a Distributed Setting
We consider the problem of MAP estimation based on a set of noisy observations in a wireless network. In this section, we briefly discuss this process and how it is implemented in a distributed setting by using two wellknown parallel messagepassing mechanisms, i.e., the maxproduct and the sumproduct algorithms.
Iia Problem Formulation
Let
denote a vector of
random variables to be estimated given the observations where denotes samples collected at node for . The MAP estimation of is formally stated as(1) 
We can reconfigure this problem, to be wellsuited for a network of distributed agents, by using the concept of maxmarginal distributions or simply maxmarginals. In this manner, the complex global optimization in (1) is broken down into a set of local scalar optimizations in the network, which can be solved in a distributed fashion. The maxmarginal distribution of at node is defined as
(2) 
where denotes a positive arbitrary normalization constant. Assuming that all the maxmarginals are somehow available, if, for each node, the maximum of is attained at a unique value, then the MAP configuration is unique and can be obtained by maximizing the corresponding maxmarginal at each node [23], i.e.,
(3) 
In case there is a node at which the maximum of is not attained at a unique value, this approach provides a suboptimal solution to the problem. We will discuss this case in Section IVC.
According to (3), the MAP estimation problem can be considered as a problem of finding the maxmarginals. Clearly, this is still a challenging task, in general, since it requires dealing with optimization problems each with dimensions. The challenge is even greater when the desired estimation is to be realized in a network of distributed devices with limited computational capacity.
The maxproduct algorithm provides a lowcomplexity method for calculating the desired maxmarginals in a distributed setting. This algorithm is built as a parallel iterative messagepassing mechanism which returns the maxmarginals for a collection of random variables with their joint a posteriori distribution described as a Markov random field over a network with a treestructured graph representation. If the graph contains loops, however, then the outcomes of the maxproduct algorithm approximate those maxmarginals. Such an approximation is shown to perform well in numerous applications, see, e.g., [21].
Alternatively, a good suboptimal solution for (1) is obtained by using the marginal distributions of the random variables of interest instead of their maxmarginals. Specifically, can be estimated as
(4) 
where
(5) 
denotes the marginal distribution of . Since the computational complexity of the summation in (5) grows rapidly by , these marginals are still challenging to obtain if calculated directly. This is where the sumproduct algorithm plays an important role in the desired estimation by providing the required marginals via a lowcomplexity messagepassing iteration.
The use of the sumproduct algorithm in distributed detection is discussed in [13, 24, 2]. Specifically, we know how to define sumproduct messages and how to use them to build proper decision variables to be used in a distributed binary hypothesis test. Moreover, based on the link between the linear datafusion and the sumproduct algorithm, we know how to optimize the detection performance of the sumproduct algorithm. In this paper, we show how to conduct the MAP estimation discussed by using the maxproduct algorithm and why it can be viewed as a distributed linear datafusion method as well. Consequently, we show that when the local observations are correlated Gaussian random variables and their correlations are described by an MRF over a factor graph, the optimal performance of the maxproduct algorithm is obtained by a distributed linear datafusion scheme. Moreover, this optimal performance is nearly attained by a linear sumproduct algorithm optimized based on the first and secondorder statistics of the local observations in the network.
IiB Parallel MessagePassing
We consider a pairwise MRF defined on an undirected graph composed of a set of vertices or nodes and a set of edges . Each node corresponds to a random variable and each edge , which connects nodes and , represents a possible correlation between random variables and . This graph can be used to represent the interdependency of local inference results in a network of distributed agents such as a wireless sensor network. In this network, spatiallydistributed nodes exchange information with each other in order to solve a statistical inference problem.
The MRF is used to factorize the a posteriori distribution function into singlevariable and pairwise terms, i.e.,
(6) 
where denotes proportionality up to a multiplicative constant. Each singlevariable term captures the impact of the corresponding random variable
on the joint distribution whereas each pairwise term
represents the interdependency of the corresponding pair of random variables and which are connected by an edge in the graph.In our detection scenario, the main goal of each node, say node , is to find its maxmarginal a posteriori distribution . This goal is achieved by the maxproduct algorithm where the messages sent from node to node in the network are built first by multiplying these three factors together: the local inference result at node , which corresponds to , the correlation between and , i.e., , and, the product of all messages received from the neighbors of node except for node . The result is then maximized over all values of to form the message sent to node . More specifically, at ’th iteration, the message from node to node is formed as
(7) 
where denotes the set of neighbors of node except for node . The belief at node at ’th iteration, denoted , is formed by multiplying its local inference result by all the messages received from its neighbors, i.e.,
(8) 
which is then used to estimate the desired maxmarginal distribution, i.e., . It is more convenient to express in logarithm form as
(9) 
where
(10) 
We have replaced by equality in our formulations since the proportionality constant turns into an offset value when the expressions are formulated in logarithm format and this offset, as will be seen later, can be merged into the detection threshold with no impact on the proposed analysis.
We adopt the commonlyused exponential model to represent the probability measure defined on , i.e.,
(11) 
where when and is a constant for all this model is the classic Ising model. The use of an exponential form in (6) to model the behavior of correlated random variables is a popular choice since, first, it is supported by the principle of maximum entropy [22] and, second, when the graphical model is built in terms of products of functions, these products turn into additive decompositions in the log domain. The a posteriori probability distribution of can be stated as
(12) 
where is factorized assuming independence in the local observations conditioned on , i.e.,
(13) 
This model is used in [24, 13, 15, 14] based on the fact that is evaluated at node , when calculating the loglikelihood ratio (LLR), solely based on the status of . Consequently, in this detection structure works as a sufficient statistic for , i.e., . From (11), (12), and (13) we have
(14) 
The proportionality sign in (14) covers and since does not affect the proposed analysis, we have set for all . By comparing (14) to (6), we obtain
(15)  
(16) 
Assuming Gaussian observations at the nodes, we have
(17) 
where . In this model, the variable of interest is disturbed by zeromean additive Gaussian noise. Specifically, for , we have where and . In the vector format we have
(18) 
where denotes a deterministic but unknown sequence of PU signal samples received at node and denotes the noise samples at node . Hence, .
The state of determines whether the spectrum band sensed by node is free to be used for data communication by the secondary users. Specifically, if then there is no PU signal received at node and the corresponding frequency band is free. Otherwise, the band is occupied and cannot be used by the SUs. Note that may contain a superposition of signals received at node from multiple PUs operating on the same frequency band. Our spectrum sensing scenario is modeled by a binary hypothesis test where for all while maps the state of to the occupancy state of the radio spectrum sensed by node .
The maxmarginals obtained by the maxproduct algorithm are used to conduct a distributed MAP estimation as in (3). This test is conducted at node by evaluating what we refer to as the maxloglikelihood ratio (maxLLR) of and defined as
(19) 
That is, if and otherwise.
To realize this test based on the maxproduct algorithm, after iterations, the approximate maxLLR is built and compared, as a decision variable, to a predefined threshold, i.e.,
(20) 
which means that if and otherwise. Note that . To see the impact of messages on the decision variable, we express as
(21) 
where denotes the local LLR obtained at node , i.e.,
(22) 
while denotes the LLR of the messages at iteration , i.e.,
(23) 
By using the signal model in (17), we obtain
(24) 
where . Consequently, it is clear that, given , the local LLR behaves as a Gaussian random variable. For simplicity we assume that . Eq. (24) indicates a matched filtering process, a.k.a., coherent detection [11] performed locally at each sensing node. In practice, due to the lack of knowledge about the PU signal ( is unknown) and ease of implementation, energy detection is used as the local sensing scheme. By using energy detection, the local sensing outcome is formed as
(25) 
where is set such that , i.e.,
(26) 
This approach is practically appealing in the sense that, can be simply calculated in terms of the noise level and without the need for the channel gain and the transmitted power level. Assuming the number of signal samples is large enough [17, 18, 1]
, the central limit theorem states that, given
, the sensor outcome in (25) follows a Gaussian distribution.
The sumproduct algorithm has a similar structure except the operator in (IIB) is replaced by a summation. This messageupdate rule is given by
(27) 
The beliefs made by the sumproduct algorithm are denoted in this paper and calculated by (8) in which is replaced by . The sumproduct algorithm approximates the marginal distributions of the random variables of interest, i.e., . The detection process is conducted by comparing the resulting decision variable to a predefined threshold, i.e,
(28) 
where denotes the detection threshold. Similar to (21), we have
(29) 
where while . Through some algebraic manipulations we obtain
(30) 
where denotes the transformation applied on the received messages at node to build the message sent to node .
In [13, 24], a simple learning process is used to adapt the MRF parameters ’s based on the detection outcomes. In this method, ’s are set by an empirical estimation of the correlations between the neighboring nodes in a window of time slots, i.e.,
(31) 
where , referred to as the learning factor in this paper, is a constant and denotes the indicator function which returns if is true and otherwise. denotes the number of samples used in this training process.
In [2], we argue that ’s can be used as optimization parameters to enhance the performance of the sumproduct algorithm in distributed detection. We use the same argument in this paper considering the maxproduct algorithm. We show that, the same optimization framework can be applied to optimize the system performance when the underlying messagepassing mechanism is realized as a maxproduct algorithm.
In order to properly determine the detection threshold and to characterize the system performance in terms of commonlyused performance metrics, we need to formulate the statistical behavior of the decision variables and . This appears to be a challenging task due to the apparent nonlinearity in both of the messagepassing operations discussed. We discuss how to tackle this challenge in the following.
IiC Linear Fusion via Message Passing
It is now clear that the nonlinearity in the sumproduct messages stems from
. We replace this function with a linear transformation and then optimize the linear messagepassing algorithm obtained. Specifically, we use the firstorder Taylor series expansion of
to linearize the messageupdate rule as [2](32) 
where which is obtained by using , where .
Then, by applying this linear sumproduct iteration on (29), we obtain an approximate expression for the decision variable at node , i.e., if , we have
(33) 
which shows that the sumproduct algorithm builds the decision variable at node (approximately) as a linear combination of all the local LLRs obtained throughout the entire network. Consequently, since the local LLRs are normal random variables, given the state of ’s, the decision variable behaves as a normal random variable as well. We can express this linear fusion in compact form as where determines the impact of on .
Viewing the sumproduct algorithm as a linear fusion scheme allows us to formulate the system performance in terms of the falsealarm and detection probabilities. By using these metrics we formulate the system performance optimization effectively and obtain a better detection scheme through optimizing the fusion coefficients ’s in (IIC) along with the detection thresholds ’s. In Section III, we show that the maxproduct algorithm is a linear fusion scheme as well. This observation leads to two important results. Firstly, assuming the local LLRs to be Gaussian random variables, the decision variables generated by the maxproduct algorithm are all Gaussian random variables. Secondly, the optimal linear sumproduct algorithm proposed in [2] serves as the optimal maxproduct algorithm for distributed detection as well.
To see the connection between the sumproduct and maxproduct operations let us take a closer look at the messages in the sumproduct algorithm which are given by
(34) 
which shows that the messages, received at node and combined with the likelihoods and , pass through the following transformation to form the message sent to node
(35) 
As shown in Fig. 1, due to the highly selective nature of the exponential function, behaves like a operator. Specifically, we see that provides a piecewise linear approximation for . That is,
(36) 
Consequently, we can approximate the messageupdate rule in the sumproduct algorithm as
(37) 
which clearly shows that, the messageupdate rule in the sumproduct algorithm is almost the same as its counterpart in the maxproduct algorithm.
Therefore, we expect the maxproduct algorithm to work as a linear fusion as well. More specifically, we expect to have where determines the impact of on . In the next section, we formally establish that the maxproduct algorithm is a distributed linear fusion scheme.
Iii Analysis of the MaxProduct Operation
Performance of a binary hypothesis test is commonly measured by two parameters; the probability of detection and the probability of false alarm. These performance metrics are calculated based on the statistical behavior of the decision variable . Specifically, at node we have
(38)  
(39) 
where denotes the falsealarm probability of node and denotes the corresponding detection probability.
Hence, we need to find the probability distribution of to measure the system performance analytically. To realize this goal, we calculate the outcome of each iteration and show that even though the iteration process involves nonlinear transformations, its outcome is a linear combination of the local LLRs. The analysis of the maxproduct process provided in this section does not limit the node variables to be binaryvalued. We only use binaryvalued ’s when evaluating the result of the proposed analysis.
Recall that the local observation at node is represented by while the correlation between the observations at nodes and is captured by . Since in the beginning there is no messages received, i.e., , each node builds its message only based on its own local observation and the correlation of its random variable with the ones of the neighboring nodes. That is, at the messages are created based on
(40) 
where is found by solving which leads to
(41) 
where, by using , we have
(42)  
(43) 
Consequently, at the beginning of the iteration, the message sent form node to node is a linear function of two factors: i) the local LLR at node , denoted , and ii) the realization of the random variable concerned at node , i.e., . We see as the outcome of a maximumlikelihood estimation (MLE) process at node . This estimation provides a point at which the likelihood functions at node are evaluated to build a message sent to node . Please make sure to distinguish between the MLE performed locally at each node and the MAP estimation discussed earlier.
As we show in the following, the linear behavior observed in (41) propagates throughout the entire iteration. Specifically, at the ’th iteration, the MLE results at node , which build the messages sent to node , are in the form of linear combinations of and local LLRs obtained at nodes located within less than hops from node . Consequently, given , the decision variable at node (i.e., ) is built by a linear fusion of the local LLRs obtained at node and at all the nodes located within less than hops from node . In other words, the hypothesis test result obtained by the maxproduct algorithm is equivalent to the one obtained by a distributed linear datafusion scheme whose scope is increased by every iteration.
We now clarify this observation by solving the iterative optimizations in (IIB). For , we have
(44) 
which leads to
(45) 
where is found by solving
(46) 
which leads to
(47) 
where
(48)  
(49) 
Consequently, is built as a linear function of plus a linear combination of the local LLRs obtained at node and at its onehop neighbors. More specifically, from (47), (48), and (49) we see that is formed as a linear combination of with ’s for . In addition, note that is a constant whereas is a random variable which captures the statistical behavior of the local observations.
Through iterative calculations for , we see that the MLE result has similar components, i.e., a linear function of plus a linear combination of the local LLRs obtained within less than hops from node , i.e.,
(50) 
where is a constant and can be expressed as a linear combination of the local likelihood values, i.e.,
(51) 
where denotes the weight of in this linear combination. Moreover, is zero if node is located more than hops away from node . Therefore, by increasing , we expand the maximum radius around node within which the local likelihoods are combined to build . Consequently, to include all the local LLRs in the fusion process, the maximum number of iterations does not need to be greater than the length of longest path in the network graph. This justifies the observation in [13] where the desired detection performance is achieved by only a few iterations.
It is worth noting that, one does not need to perform many iterative calculations to see the linearity of the final result. Starting from in (III), we see that the term is concave quadratic in . Hence, its partial derivative leads to a linear equation which, in turn, leads to a linear expression for in terms of and . Moreover, in order to build from one needs to add some terms, inside the max operator in (IIB), with similar concave quadratic attributes. The only difference is that, these new terms have as their arguments some linear expressions with positive coefficients. Note that in (IIB) is calculated by (50). Since these linear transformations preserve the concave quadratic nature of the whole expression inside the max operator, the maximum in (IIB) is found by solving a linear equation which leads to a linear expression in terms of and .
Based on this observation, we propose a set of formulas to recursively calculate . This calculation is realized by using a quadratic form to represent the messages, which can be expressed as
(52) 
The partial derivative of the messages is then a linear expression as
(53) 
Now, by solving the following equation
(54) 
we link and to and as
(55)  
(56) 
Hence, by using and we calculate and which are then used to obtain and . This recursive calculation starts from and in (42) and (43).
The fusion weights in (51) are determined in terms of ’s. As we saw in (43) and (49), ’s are determined in terms of ’s which capture the interdependencies of the random variables in the MRF. We will further discuss this point and its implications on the system design later.
Eqs. (55) and (56) indicate that, the higher the received SNR level at node , the lower the impact of other nodes on the data sent from node to node . Hence, node relies more on its own local observation when it is operating under good SNR conditions. Otherwise, it relies more on the data received from its neighbors. In addition, according to (48) and (49), each LLR received from a neighbor is scaled by the SNR level perceived at that neighbor. Consequently, the messageupdate rule in (IIB) works like a maximalratio combining (MRC) scheme.
Since the outcomes of the MLEs are derived in closed form, we can now see their impact on the binary hypothesis test. To this end, we show that is a linear combination of the local LLRs. First note that for all we have
(57)  
(58) 
which are linear expressions in , and . Then, recall that . Consequently, in (23) contains expressions, in the form of (57) and (58), which are linear functions of ’s. To clarify this observation, we focus on here. A similar argument can be made for . We see that,
Comments
There are no comments yet.