The Max-Product Algorithm Viewed as Linear Data-Fusion: A Distributed Detection Scenario

09/20/2019 ∙ by Younes Abdi, et al. ∙ Jyväskylän yliopisto 0

In this paper, we disclose the statistical behavior of the max-product algorithm configured to solve a maximum a posteriori (MAP) estimation problem in a network of distributed agents. Specifically, we first build a distributed hypothesis test conducted by a max-product iteration over a binary-valued pairwise Markov random field and show that the decision variables obtained are linear combinations of the local log-likelihood ratios observed in the network. Then, we use these linear combinations to formulate the system performance in terms of the false-alarm and detection probabilities. Our findings indicate that, in the hypothesis test concerned, the optimal performance of the max-product algorithm is obtained by an optimal linear data-fusion scheme and the behavior of the max-product algorithm is very similar to the behavior of the sum-product algorithm. Consequently, we demonstrate that the optimal performance of the max-product iteration is closely achieved via a linear version of the sum-product algorithm which is optimized based on statistics received at each node from its one-hop neighbors. Finally, we verify our observations via computer simulations.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Standard optimization methods are computationally demanding when dealing with a large collection of correlated random variables. This is a well-known challenge in designing distributed statistical inference techniques used in a wide range of signal-processing applications such as channel decoding, image processing, spread-spectrum communications, distributed detection, etc. Alternatively, message-passing algorithms over factor graphs provide a powerful low-complexity approach to characterizing and optimizing the collective impact of those variables on the desired system performance, see e.g.,

[22, 9, 5]. Consequently, a better understating of the statistical behavior of the message-passing algorithms leads to statistical inference systems with better performance. Two widely-used message-passing algorithms are the so-called sum-product and max-product algorithms. We have analyzed the behavior of the sum-product algorithm, a.k.a., the belief propagation algorithm, in [2].

Our main focus in this paper is on the max-product algorithm which is an iterative method for approximately solving the problem of maximum a posteriori (MAP) estimation [23]. We analyze the behavior of this algorithm in a distributed detection scenario where every network node estimates a binary-valued random variable based on noisy observations collected throughout the entire network. The correlations between the random variables are modeled by a pairwise Markov random field (MRF) [22] whose structure fits well into pairwise interactions between the nodes in an ad-hoc network configuration. An MRF is an undirected graph where vertices correspond to the random variables of interest and edges represent the correlations between them. By using the max-product algorithm, the estimation problem concerned is decomposed into a number of small optimizations performed locally at each node based on information provided by other nodes in the network via one-hop communications per iteration.

We show that the max-product algorithm works as a linear data-fusion process. Linear fusion schemes are commonly used in distributed detection systems to achieve near-optimal performance with low implementation complexity, see e.g., [17, 18, 1]. Therefroe, we indicate that the knowledge already developed in distributed detection through inspecting linear fusion schemes can be used to better understand the behavior of the max-product algorithm. The proposed analysis is supported by a strong connection between the sum-product and max-product operations. In particular, we show that, in the distributed detection scenario concerned, the behavior of the max-product algorithm is very similar to the behavior of the sum-product algorithm and that the decision variables built by the max-product operation are linear combinations of the local likelihoods in the network—a behavior we have already observed in the sum-product algorithm [2]. By using this linearity, we formulate the detection performance in closed form and propose a distributed optimization framework for the system.

Analyzing the statistical behavior of the message-passing algorithms is challenging in general. Numerous works in the literature use some sort of approximation to offer a deeper insight into the behavior of those algorithms or to propose better inference methods. In [4, 6, 7] the use of an approximate message-passing (AMP) algorithm is discussed assuming that a linear mixing problem is to be solved in the context of spars signal processing. The AMP is extended in [19, 20]

to deal with linear mixing structures observed through nonlinear channels. The analyses provided in those works assume independent identically-distributed (i.i.d.) behaviors in the random variables of interest and in the parameters describing the mixing structure concerned. In the detection scenario considered in this paper, we assume Markovian dependencies between the random variables of interest and assume that the parameters describing the underlying mixing structure are correlated with possibly different probability distributions.

Our work is inspired by the findings in [13] and [24] where the sum-product algorithm is configured to solve the problem of distributed MAP estimation in a cognitive radio network. Due to the nonlinearity of the sum-product algorithm, which makes it difficult to formulate the detection performance obtained, [13] and [24] do not consider the system performance optimization problem. In [2], we argue that fitting a proper factor graph to the statistical behavior of a sensor network and running the sum-product algorithm based on that graph is equivalent to optimizing a linear data-fusion scheme in that network. Accordingly, we have proposed a low-complexity optimization framework in [2], which can be conducted effectively in a distributed setting. In this paper, we extend those arguments by showing that the same optimization framework achieves the optimal performance in a max-product-based distributed detection as well. In particular, we make the following contributions in this paper:

  • We show that the message-update rule in the max-product algorithm is almost the same as its counterpart in the sum-product algorithm.

  • We show that when performing a distributed MAP estimation via the max-product algorithm over a network modeled by a pairwise MRF, under certain practical conditions, the decision variables obtained are linear combinations of the local log-likelihood ratios (LLR) in the network.

  • We find the probability distribution function of the decision variables in a practical detection scenario and formulate the detection performance in closed form.

  • We show how to set the detection threshold to achieve a predefined detection performance.

  • We show that the optimal linear message-passing algorithm in [2] attains the optimal detection performance of the max-product algorithm in the distributed detection scenario concerned.

As in [13, 24], and [2], we clarify our findings by considering a spectrum sensing scheme in a cognitive radio (CR) network. In these networks, the wireless nodes perform spectrum sensing, in bands allocated to the so-called primary users (PU), in order to discover vacant parts of the radio spectrum and establish communication on those temporarily- or spatially-available spectral opportunities [3]. In this context, CRs are considered secondary users (SU) in the sense that they have to vacate the spectrum, to avoid making any harmful interference, once the PUs are active.

The rest of the paper is organized as follows. In Section II, we formulate the MAP estimation problem and discuss how to solve it in a network of distributed agents via the sum-product and max-product algorithms. In addition, we illustrate in Section II the connection between the sum-product and max-product operations. Then, we analyze the behavior of the max-product algorithm in Section III to show that it works as a linear fusion scheme. In Section IV, we briefly discuss the use of linear data-fusion in distributed detection along with the proposed optimization framework. And, finally, in Section V, we verify our analysis by computer simulations.

Ii MAP Estimation in a Distributed Setting

We consider the problem of MAP estimation based on a set of noisy observations in a wireless network. In this section, we briefly discuss this process and how it is implemented in a distributed setting by using two well-known parallel message-passing mechanisms, i.e., the max-product and the sum-product algorithms.

Ii-a Problem Formulation

Let

denote a vector of

random variables to be estimated given the observations where denotes samples collected at node for . The MAP estimation of is formally stated as

(1)

We can reconfigure this problem, to be well-suited for a network of distributed agents, by using the concept of max-marginal distributions or simply max-marginals. In this manner, the complex global optimization in (1) is broken down into a set of local scalar optimizations in the network, which can be solved in a distributed fashion. The max-marginal distribution of at node is defined as

(2)

where denotes a positive arbitrary normalization constant. Assuming that all the max-marginals are somehow available, if, for each node, the maximum of is attained at a unique value, then the MAP configuration is unique and can be obtained by maximizing the corresponding max-marginal at each node [23], i.e.,

(3)

In case there is a node at which the maximum of is not attained at a unique value, this approach provides a sub-optimal solution to the problem. We will discuss this case in Section IV-C.

According to (3), the MAP estimation problem can be considered as a problem of finding the max-marginals. Clearly, this is still a challenging task, in general, since it requires dealing with optimization problems each with dimensions. The challenge is even greater when the desired estimation is to be realized in a network of distributed devices with limited computational capacity.

The max-product algorithm provides a low-complexity method for calculating the desired max-marginals in a distributed setting. This algorithm is built as a parallel iterative message-passing mechanism which returns the max-marginals for a collection of random variables with their joint a posteriori distribution described as a Markov random field over a network with a tree-structured graph representation. If the graph contains loops, however, then the outcomes of the max-product algorithm approximate those max-marginals. Such an approximation is shown to perform well in numerous applications, see, e.g., [21].

Alternatively, a good suboptimal solution for (1) is obtained by using the marginal distributions of the random variables of interest instead of their max-marginals. Specifically, can be estimated as

(4)

where

(5)

denotes the marginal distribution of . Since the computational complexity of the summation in (5) grows rapidly by , these marginals are still challenging to obtain if calculated directly. This is where the sum-product algorithm plays an important role in the desired estimation by providing the required marginals via a low-complexity message-passing iteration.

The use of the sum-product algorithm in distributed detection is discussed in [13, 24, 2]. Specifically, we know how to define sum-product messages and how to use them to build proper decision variables to be used in a distributed binary hypothesis test. Moreover, based on the link between the linear data-fusion and the sum-product algorithm, we know how to optimize the detection performance of the sum-product algorithm. In this paper, we show how to conduct the MAP estimation discussed by using the max-product algorithm and why it can be viewed as a distributed linear data-fusion method as well. Consequently, we show that when the local observations are correlated Gaussian random variables and their correlations are described by an MRF over a factor graph, the optimal performance of the max-product algorithm is obtained by a distributed linear data-fusion scheme. Moreover, this optimal performance is nearly attained by a linear sum-product algorithm optimized based on the first- and second-order statistics of the local observations in the network.

Ii-B Parallel Message-Passing

We consider a pairwise MRF defined on an undirected graph composed of a set of vertices or nodes and a set of edges . Each node corresponds to a random variable and each edge , which connects nodes and , represents a possible correlation between random variables and . This graph can be used to represent the interdependency of local inference results in a network of distributed agents such as a wireless sensor network. In this network, spatially-distributed nodes exchange information with each other in order to solve a statistical inference problem.

The MRF is used to factorize the a posteriori distribution function into single-variable and pairwise terms, i.e.,

(6)

where denotes proportionality up to a multiplicative constant. Each single-variable term captures the impact of the corresponding random variable

on the joint distribution whereas each pairwise term

represents the interdependency of the corresponding pair of random variables and which are connected by an edge in the graph.

In our detection scenario, the main goal of each node, say node , is to find its max-marginal a posteriori distribution . This goal is achieved by the max-product algorithm where the messages sent from node to node in the network are built first by multiplying these three factors together: the local inference result at node , which corresponds to , the correlation between and , i.e., , and, the product of all messages received from the neighbors of node except for node . The result is then maximized over all values of to form the message sent to node . More specifically, at ’th iteration, the message from node to node is formed as

(7)

where denotes the set of neighbors of node except for node . The belief at node at ’th iteration, denoted , is formed by multiplying its local inference result by all the messages received from its neighbors, i.e.,

(8)

which is then used to estimate the desired max-marginal distribution, i.e., . It is more convenient to express in logarithm form as

(9)

where

(10)

We have replaced by equality in our formulations since the proportionality constant turns into an offset value when the expressions are formulated in logarithm format and this offset, as will be seen later, can be merged into the detection threshold with no impact on the proposed analysis.

We adopt the commonly-used exponential model to represent the probability measure defined on , i.e.,

(11)

where when and is a constant for all this model is the classic Ising model. The use of an exponential form in (6) to model the behavior of correlated random variables is a popular choice since, first, it is supported by the principle of maximum entropy [22] and, second, when the graphical model is built in terms of products of functions, these products turn into additive decompositions in the log domain. The a posteriori probability distribution of can be stated as

(12)

where is factorized assuming independence in the local observations conditioned on , i.e.,

(13)

This model is used in [24, 13, 15, 14] based on the fact that is evaluated at node , when calculating the log-likelihood ratio (LLR), solely based on the status of . Consequently, in this detection structure works as a sufficient statistic for , i.e., . From (11), (12), and (13) we have

(14)

The proportionality sign in (14) covers and since does not affect the proposed analysis, we have set for all . By comparing (14) to (6), we obtain

(15)
(16)

Assuming Gaussian observations at the nodes, we have

(17)

where . In this model, the variable of interest is disturbed by zero-mean additive Gaussian noise. Specifically, for , we have where and . In the vector format we have

(18)

where denotes a deterministic but unknown sequence of PU signal samples received at node and denotes the noise samples at node . Hence, .

The state of determines whether the spectrum band sensed by node is free to be used for data communication by the secondary users. Specifically, if then there is no PU signal received at node and the corresponding frequency band is free. Otherwise, the band is occupied and cannot be used by the SUs. Note that may contain a superposition of signals received at node from multiple PUs operating on the same frequency band. Our spectrum sensing scenario is modeled by a binary hypothesis test where for all while maps the state of to the occupancy state of the radio spectrum sensed by node .

The max-marginals obtained by the max-product algorithm are used to conduct a distributed MAP estimation as in (3). This test is conducted at node by evaluating what we refer to as the max-log-likelihood ratio (max-LLR) of and defined as

(19)

That is, if and otherwise.

To realize this test based on the max-product algorithm, after iterations, the approximate max-LLR is built and compared, as a decision variable, to a predefined threshold, i.e.,

(20)

which means that if and otherwise. Note that . To see the impact of messages on the decision variable, we express as

(21)

where denotes the local LLR obtained at node , i.e.,

(22)

while denotes the LLR of the messages at iteration , i.e.,

(23)

By using the signal model in (17), we obtain

(24)

where . Consequently, it is clear that, given , the local LLR behaves as a Gaussian random variable. For simplicity we assume that . Eq. (24) indicates a matched filtering process, a.k.a., coherent detection [11] performed locally at each sensing node. In practice, due to the lack of knowledge about the PU signal ( is unknown) and ease of implementation, energy detection is used as the local sensing scheme. By using energy detection, the local sensing outcome is formed as

(25)

where is set such that , i.e.,

(26)

This approach is practically appealing in the sense that, can be simply calculated in terms of the noise level and without the need for the channel gain and the transmitted power level. Assuming the number of signal samples is large enough [17, 18, 1]

, the central limit theorem states that, given

, the sensor outcome in (25

) follows a Gaussian distribution.

The sum-product algorithm has a similar structure except the operator in (II-B) is replaced by a summation. This message-update rule is given by

(27)

The beliefs made by the sum-product algorithm are denoted in this paper and calculated by (8) in which is replaced by . The sum-product algorithm approximates the marginal distributions of the random variables of interest, i.e., . The detection process is conducted by comparing the resulting decision variable to a predefined threshold, i.e,

(28)

where denotes the detection threshold. Similar to (21), we have

(29)

where while . Through some algebraic manipulations we obtain

(30)

where denotes the transformation applied on the received messages at node to build the message sent to node .

In [13, 24], a simple learning process is used to adapt the MRF parameters ’s based on the detection outcomes. In this method, ’s are set by an empirical estimation of the correlations between the neighboring nodes in a window of time slots, i.e.,

(31)

where , referred to as the learning factor in this paper, is a constant and denotes the indicator function which returns if is true and otherwise. denotes the number of samples used in this training process.

In [2], we argue that ’s can be used as optimization parameters to enhance the performance of the sum-product algorithm in distributed detection. We use the same argument in this paper considering the max-product algorithm. We show that, the same optimization framework can be applied to optimize the system performance when the underlying message-passing mechanism is realized as a max-product algorithm.

In order to properly determine the detection threshold and to characterize the system performance in terms of commonly-used performance metrics, we need to formulate the statistical behavior of the decision variables and . This appears to be a challenging task due to the apparent nonlinearity in both of the message-passing operations discussed. We discuss how to tackle this challenge in the following.

Ii-C Linear Fusion via Message Passing

It is now clear that the nonlinearity in the sum-product messages stems from

. We replace this function with a linear transformation and then optimize the linear message-passing algorithm obtained. Specifically, we use the first-order Taylor series expansion of

to linearize the message-update rule as [2]

(32)

where which is obtained by using , where .

Then, by applying this linear sum-product iteration on (29), we obtain an approximate expression for the decision variable at node , i.e., if , we have

(33)

which shows that the sum-product algorithm builds the decision variable at node (approximately) as a linear combination of all the local LLRs obtained throughout the entire network. Consequently, since the local LLRs are normal random variables, given the state of ’s, the decision variable behaves as a normal random variable as well. We can express this linear fusion in compact form as where determines the impact of on .

Viewing the sum-product algorithm as a linear fusion scheme allows us to formulate the system performance in terms of the false-alarm and detection probabilities. By using these metrics we formulate the system performance optimization effectively and obtain a better detection scheme through optimizing the fusion coefficients ’s in (II-C) along with the detection thresholds ’s. In Section III, we show that the max-product algorithm is a linear fusion scheme as well. This observation leads to two important results. Firstly, assuming the local LLRs to be Gaussian random variables, the decision variables generated by the max-product algorithm are all Gaussian random variables. Secondly, the optimal linear sum-product algorithm proposed in [2] serves as the optimal max-product algorithm for distributed detection as well.

To see the connection between the sum-product and max-product operations let us take a closer look at the messages in the sum-product algorithm which are given by

(34)

which shows that the messages, received at node and combined with the likelihoods and , pass through the following transformation to form the message sent to node

(35)
Fig. 1: provides a piece-wise linear approximation for .

As shown in Fig. 1, due to the highly selective nature of the exponential function, behaves like a operator. Specifically, we see that provides a piece-wise linear approximation for . That is,

(36)

Consequently, we can approximate the message-update rule in the sum-product algorithm as

(37)

which clearly shows that, the message-update rule in the sum-product algorithm is almost the same as its counterpart in the max-product algorithm.

Therefore, we expect the max-product algorithm to work as a linear fusion as well. More specifically, we expect to have where determines the impact of on . In the next section, we formally establish that the max-product algorithm is a distributed linear fusion scheme.

Iii Analysis of the Max-Product Operation

Performance of a binary hypothesis test is commonly measured by two parameters; the probability of detection and the probability of false alarm. These performance metrics are calculated based on the statistical behavior of the decision variable . Specifically, at node we have

(38)
(39)

where denotes the false-alarm probability of node and denotes the corresponding detection probability.

Hence, we need to find the probability distribution of to measure the system performance analytically. To realize this goal, we calculate the outcome of each iteration and show that even though the iteration process involves nonlinear transformations, its outcome is a linear combination of the local LLRs. The analysis of the max-product process provided in this section does not limit the node variables to be binary-valued. We only use binary-valued ’s when evaluating the result of the proposed analysis.

Recall that the local observation at node is represented by while the correlation between the observations at nodes and is captured by . Since in the beginning there is no messages received, i.e., , each node builds its message only based on its own local observation and the correlation of its random variable with the ones of the neighboring nodes. That is, at the messages are created based on

(40)

where is found by solving which leads to

(41)

where, by using , we have

(42)
(43)

Consequently, at the beginning of the iteration, the message sent form node to node is a linear function of two factors: i) the local LLR at node , denoted , and ii) the realization of the random variable concerned at node , i.e., . We see as the outcome of a maximum-likelihood estimation (MLE) process at node . This estimation provides a point at which the likelihood functions at node are evaluated to build a message sent to node . Please make sure to distinguish between the MLE performed locally at each node and the MAP estimation discussed earlier.

As we show in the following, the linear behavior observed in (41) propagates throughout the entire iteration. Specifically, at the ’th iteration, the MLE results at node , which build the messages sent to node , are in the form of linear combinations of and local LLRs obtained at nodes located within less than hops from node . Consequently, given , the decision variable at node (i.e., ) is built by a linear fusion of the local LLRs obtained at node and at all the nodes located within less than hops from node . In other words, the hypothesis test result obtained by the max-product algorithm is equivalent to the one obtained by a distributed linear data-fusion scheme whose scope is increased by every iteration.

We now clarify this observation by solving the iterative optimizations in (II-B). For , we have

(44)

which leads to

(45)

where is found by solving

(46)

which leads to

(47)

where

(48)
(49)

Consequently, is built as a linear function of plus a linear combination of the local LLRs obtained at node and at its one-hop neighbors. More specifically, from (47), (48), and (49) we see that is formed as a linear combination of with ’s for . In addition, note that is a constant whereas is a random variable which captures the statistical behavior of the local observations.

Through iterative calculations for , we see that the MLE result has similar components, i.e., a linear function of plus a linear combination of the local LLRs obtained within less than hops from node , i.e.,

(50)

where is a constant and can be expressed as a linear combination of the local likelihood values, i.e.,

(51)

where denotes the weight of in this linear combination. Moreover, is zero if node is located more than hops away from node . Therefore, by increasing , we expand the maximum radius around node within which the local likelihoods are combined to build . Consequently, to include all the local LLRs in the fusion process, the maximum number of iterations does not need to be greater than the length of longest path in the network graph. This justifies the observation in [13] where the desired detection performance is achieved by only a few iterations.

It is worth noting that, one does not need to perform many iterative calculations to see the linearity of the final result. Starting from in (III), we see that the term is concave quadratic in . Hence, its partial derivative leads to a linear equation which, in turn, leads to a linear expression for in terms of and . Moreover, in order to build from one needs to add some terms, inside the max operator in (II-B), with similar concave quadratic attributes. The only difference is that, these new terms have as their arguments some linear expressions with positive coefficients. Note that in (II-B) is calculated by (50). Since these linear transformations preserve the concave quadratic nature of the whole expression inside the max operator, the maximum in (II-B) is found by solving a linear equation which leads to a linear expression in terms of and .

Based on this observation, we propose a set of formulas to recursively calculate . This calculation is realized by using a quadratic form to represent the messages, which can be expressed as

(52)

The partial derivative of the messages is then a linear expression as

(53)

Now, by solving the following equation

(54)

we link and to and as

(55)
(56)

Hence, by using and we calculate and which are then used to obtain and . This recursive calculation starts from and in (42) and (43).

The fusion weights in (51) are determined in terms of ’s. As we saw in (43) and (49), ’s are determined in terms of ’s which capture the inter-dependencies of the random variables in the MRF. We will further discuss this point and its implications on the system design later.

Eqs. (55) and (56) indicate that, the higher the received SNR level at node , the lower the impact of other nodes on the data sent from node to node . Hence, node relies more on its own local observation when it is operating under good SNR conditions. Otherwise, it relies more on the data received from its neighbors. In addition, according to (48) and (49), each LLR received from a neighbor is scaled by the SNR level perceived at that neighbor. Consequently, the message-update rule in (II-B) works like a maximal-ratio combining (MRC) scheme.

Since the outcomes of the MLEs are derived in closed form, we can now see their impact on the binary hypothesis test. To this end, we show that is a linear combination of the local LLRs. First note that for all we have

(57)
(58)

which are linear expressions in , and . Then, recall that . Consequently, in (23) contains expressions, in the form of (57) and (58), which are linear functions of ’s. To clarify this observation, we focus on here. A similar argument can be made for . We see that,