I Introduction
Many statistical inference problems deal with a large collection of random variables and their statistical interactions. Graphical models, a.k.a., factor graphs, which are commonly used to capture the interdependencies between correlated random variables, are known to provide a powerful framework for developing effective lowcomplexity inference algorithms in various applications found in diverse areas such as wireless communications, image processing, combinatorial optimization, and machine learning, see e.g.,
[13, 18, 15] and the references therein.In many cases of practical importance, a statistical inference problem is solved by calculating marginal distribution functions which characterize the behavior of the random variables or a subset of the random variables represented within a factor graph. These marginals can be well approximated with low computational complexity by using the socalled belief propagation (BP) algorithm, a.k.a., the sumproduct algorithm [17, 36]. This algorithm works as a messagepassing mechanism within a network of interconnected nodes structured according to the graphical model which encodes Bayesian or Markovian interdependencies between the random variables concerned. Specifically, the messages are updated and passed across the network to enable the nodes to exchange their local inference outcomes in order to obtain a more informative global inference result taking into account the statistical interdependencies between the observations made throughout the entire network.
In practice, there are certain obstacles in the application of the BP algorithm to concrete problems [17]. We know that if the BP converges, it is not clear in general whether the results are a good approximation of the exact marginals. Even sometimes the BP does not converge and in these cases gives no approximations at all. Due to the nonlinearity of the messageupdate rule in the BP algorithm, analyzing the statistical behavior of the messages (and beliefs) is certainly a challenge. The lack of understanding of the behavior of the BP algorithm makes the optimization of a BPbased inference system difficult. As explained in detail in Section III
, this messagepassing algorithm works based on factorizing the joint probability distribution of the random variables, observed in the network, whose statistical behavior is described by a factor graph. However, fitting a proper factor graph into a network behavior is not an easy task in general. In
[16] the major challenges in learning a graphical model from a number of statistically independent data samples are explained.Even though several works in distributed inference assume a factor graph to model the network statistics, they often use simple heuristic approaches to learning the parameters of such graphs in the network without directly taking into account the impact of those parameters when optimizing the system performance, see e.g.,
[37, 14, 20, 36]. In this paper, we tackle this issue by using linear datafusion techniques [24, 23, 25, 29] to develop a systematic framework for optimizing the performance of a BPbased distributed inference system. Specifically, through linear approximations, we simplify the messageupdate rule in the BP algorithm and reveal some important aspects of its behavior and demonstrate how its outcome is affected by the network configuration. We then argue that fitting a proper factor graph to the network and running a BP algorithm based on that graph is (approximately) equivalent to designing a linear datafusion scheme. The link between the linear fusion and the BP algorithm enables us to optimize the performance of the BPbased statistical inference systems in different design scenarios.To clarify the proposed approach, we consider the problem of distributed spectrum sensing in cognitive radio (CR) networks. In these networks, the wireless nodes perform spectrum sensing, in bands allocated to the socalled primary users (PU), in order to discover vacant parts of the radio spectrum and establish communication on those temporarily or spatiallyavailable spectral opportunities [2]. In this context, CRs are considered secondary users (SU) in the sense that they have to vacate the spectrum, to avoid making any harmful interference, once the PUs are active. Hence, the spectrum sensing capability plays a crucial role in CR networks. The use of BP for distributed spectrum sensing is discussed in [20, 36] where the dependency among the heterogeneous spectral occupancy states observed by the network nodes is modeled by a binaryvalued pairwise Markov random field (MRF) [33]. We use a similar graphical model and design a distributed spectrum sensing scheme based on a linear BP algorithm optimized in different design scenarios. We show that a linear messagepassing algorithm obtained through linearizing the nonlinear transfer functions in a BP algorithm provides an upper bound on the performance level of the BPbased distributed detection. Consequently, by optimizing the linear version of the BP we can achieve an optimal BP algorithm for distributed detection. In particular, we offer the following contributions:

We show that a distributed inference based on the BP algorithm can be approximated by a distributed linear data fusion scheme.

We propose a linear BP algorithm, derive its convergence conditions, and provide an effective framework for distributed optimization of its performance.

We design a novel cooperative spectrum sensing scheme in CR networks based on the proposed framework and show that it closely obtains the optimal detection performance.

We develop a blind learningoptimization scheme for performance optimization of the BPbased distributed spectrum sensing when no information is available a priori regarding the wireless environment.

We build a blind threshold adaptation mechanism, independent of the proposed linear BP, by which a certain detection performance level is guaranteed when no information is available a priori regarding the wireless environment.
The rest of the paper is organized as follows. In Section II, we review the exiting works to clarify our position with respect to the state of the art. In Section III, we give a brief overview of the BP algorithm and demonstrate the link between the BP and distributed linear datafusion. In Section IV, we discuss how to design a BPbased linear fusion scheme in both centralized and decentralized network configurations. In Section V, we show how to implement the proposed optimization framework when no statistical information is available a priori. In Section VI, we derive the convergence conditions for the proposed linear BP. We illustrate the performance of the proposed method via computer simulations in Section VIII and provide the concluding remarks in Section IX.
Ii Related Work
Messagepassing algorithms offer effective lowcomplexity alternatives to standard optimization methods and have long been used in numerous signalprocessing applications, such as channel decoding, image processing, CDMA systems, distributed detection, etc. These algorithms are typically used when dealing with a group of correlated random variables whose collective impact on the system performance is either difficult to analyze or is computationally expensive to be characterized numerically. A better understating of the statistical behavior of those algorithms normally leads to inference systems with better performance. To this end, several inference techniques have been proposed in the literature, based on analyzing the behavior of the messages, which use some sort of approximation.
Gaussian approximations for message probability density functions (pdf) are used to simplify the analysis of the BPbased decoding algorithm for lowdensity paritycheck (LDPC) codes. The Gaussian approximations used in
[35] are mainly based on empirical observations. In [5, 28], the code block length is assumed long enough in order to approximate the structure of the decoding subgraphs with trees. Consequently, the messages are built as sums of many independent random variables and are approximated as Gaussian random variables. For turbo codes, a onedimensional Gaussian approximation based on extrinsic information transfer chart (EXIT chart) is used in [30, 31, 8, 9]. Since there is no simple formula for updating message pdf’s for turbo decoding, Monte Carlo simulations are used in those works to analyze approximate trajectories for Gaussian messages under turbo decoding.Although the works in channel coding make us more motivated in using an approximate model for the BP algorithm, their results cannot be used in the distributed detection scenario considered in this paper for two strong reasons: first, we cannot rely on empirical results which describe the behavior of the BP algorithm in channel decoding schemes. We are dealing with a different application here. And, second, we do not assume that the factor graph is cyclefree.
In [34]
small perturbations are inserted into a BP algorithm whose messages are formulated by first and secondorder linear approximations. Through analyzing the behavior of the perturbations and their relations to the underlying graph structure, the joint probability densities of the nodes located far away from each other are estimated. In our detection scenario, the main goal is to properly utilize the possible correlations observed between the sensing outcomes made at nodes which are possibly located close to each other.
In [3]
the use of a linear approximate messagepassing (AMP) algorithm is discussed assuming that a linear mixing problem is to be solved. Specifically, the aim in that problem is to reconstruct a sparse vector
from a small vector of linear observations which are disturbed by linear additive noise components , i.e., . It is shown in [3], as an extension of the discussions made in [6, 7], that when the mixing structure is known, the behavior of the AMP algorithm is accurately described by a simple recursive formula referred to as state evolution, provided that the dimensions of the problem are large. The state evolution offers a convenient way to formulate AMP performance. [26, 27] offer a generalization of the AMP, termed GAMP, which can be used when the output channels involve arbitrary, possibly nonlinear, known operations. The BP messages are approximated as i.i.d Gaussian random variables in [26, 27]. These approximations are not justified analytically.It is worth noting that, the AMP and GAMP algorithms are built on a graph structure determined by . That is, in those works the main intention of the messagepassing algorithm is to impose the impact of the mixing process on the collective behavior of the estimates rather than on exploiting the correlations between the elements of (or ). This can be further clarified by noting that all random variables concerned are assumed mutually independent in [26] when formulating the BP algorithm. This assumption, which cannot be justified, serves as the core of the GAMP technique by reducing the vector MAP estimation problem to a sequence of scalar MAP estimations at the inputs and outputs. In our case, such an approach in modeling the behavior of an adhoc sensor network would mean ignoring a major part of the correlations between the observations made by different nodes in the network.
In our distributed detection scenario, if we model the state of the PU transmitters by and the sensing outcomes by then the graph structure depends on the network topology which is not known. In other words, is not known in our problem since specifying its values requires the knowledge of the network topology in detail. Specifically, depends on the channel coefficients experienced at different locations throughout the entire network. The average channel power gains, which specify the average received SNR values, can be obtained if either the PU transmit power levels are known or the sensor locations with respect to the PU transmitters are available. We do not know the sensor locations nor do we know the PU transmit power levels. Hence, in a practicallyappealing detection scenario, we cannot make such assumptions about the network topology or about the behavior of the sources of the signals which are to be detected. Moreover, the AMP algorithm in [3] assumes that the probability distribution of is known. In contrast, we do not assume any prior knowledge about the behavior of the PU transmitters.
We overcome these challenges by assuming Markovian dependencies between the random variables as in [20, 36]. Specifically, we use an MRF to model the statistical behavior of the network and design a BP algorithm with the aim of fitting a pairwise MRF to the network behavior. A pairwise MRF models the joint probability distribution of the random variables of interest in terms of the pairwise correlations observed in the network. These correlations are easy to obtain in practice. In addition, pairwise MRFs fit well into the commonlyused adhoc network configurations in which major network functionalities are conducted through pairwise i.e., onehop, links between the nodes located close to each other. This design method is based on the common assumption that nodes located close enough to each other for onehop communication, experience some levels of correlation between their sensor outcomes.
In Section V, we show how to blindly estimate the required statistics to characterize the radio environment according to the MRF structure discussed. One might argue that the blind estimation proposed in this paper provides the (unknown) mixing structure that we need in the AMP algorithm. This is not the case since the analysis in [3] is proposed for systems with composed of i.i.d entries. Due to the inherent spatiotemporal diversity in a network of distributed agents, we do not view the mixing coefficients as i.i.d random variables. Here we are dealing with correlations observed both in time and in space.
Finally, note that the work in this paper has some similarities, in its overall structure to the works in channel coding which jointly realize codeword estimation with channel parameter estimation, see, e.g., [32]. This joint estimation approach is used in BPbased spectrum sensing systems as well. That is, the model describing the radio environment is updated based on the results of the distributed detection. The use of BP in such systems is mainly intended to reduce the computational complexity of the joint estimation process. We do not intend to alter the overall structure of this joint estimation. What we propose here is to better implement it by better exploitation of the available resources.
Iii Belief Propagation on a Factor Graph
We consider a pairwise MRF defined on a graph composed of a set of vertices or nodes and a set of edges . Node corresponds to a random variable and edge connects nodes and and represents a possible correlation between random variables and . This graph can be used to model the interdependency of local inference results in a network of distributed agents such as a wireless sensor network. In this network, spatiallydistributed nodes exchange information with each other in order to solve a statistical inference problem.
Let denote the vector of random variables and denote the vector of observations in the network where contains observation samples at node . An MRF defined on factor graph is used to factorize the a posteriori distribution function into singlevariable and pairwise terms, i.e.,
(1) 
where denotes proportionality up to a multiplicative constant. The product over couples is only over , for which where refers to the set of neighbors of node in the graph, i.e., . Each singlevariable term captures the impact of
in the joint distribution whereas each pairwise term
represents the interdependency of and connected by an edge in the graph. Please note that, as we will see in the following, is implicitly included in the right side of (1) within the structures of ’s and ’s and also in the multiplicative constant.The main goal of each node, say node , is to find its marginal a posteriori distribution . Since calculating the marginal distributions from the joint distribution function requires multiple integrations, its computational complexity grows prohibitively with the number of variables . However, this goal can be achieved in a distributed lowcomplexity fashion by using the BP algorithm based on factorizing the joint distribution function as in (1). In particular, BP is a parallel messagepassing algorithm where the messages sent from node to node in the network are built based on multiplying three factors together: first, the local inference result at node , which is , second, the correlation between and which is modeled by , and third, the product of all messages received from the neighbors of node except for node . This term is then summed over all values of to create the message. More specifically, the BP message from node to node , at the th iteration, is formulated as
(2) 
where denotes all neighbors of node except for node .
The socalled belief of node regarding , denoted , is formed by multiplying the local inference result at node by all the messages received from its neighbors, i.e.,
(3) 
which is used to estimate (i.e., to approximate) the desired marginal distribution, i.e., . The proportionality signs in (2) and (3) indicate that the beliefs and messages are expressed up to a constant, which can be found by normalizing and so as to sum to 1.
To represent the probability measure defined on , we adopt the commonlyused exponential model [33], i.e.,
(4) 
Note that, when and is a constant for all , this model turns into the wellknown Ising model [33, 16].
An inference scenario of high practical importance is a binary hypothesis test in which all random variables are binaryvalued, i.e., for all . The test can be formulated as . Such an optimization, even if all the joint distributions are available, incurs prohibitive computational complexity. In practice, the test is conducted by estimating the marginal distributions which are then used in a likelihoodratio test (LRT). That is, is solved, which can be realized by evaluating the socalled likelihood ratio of , i.e., if and otherwise.
The BP algorithm is used in such an inference scenario as described in the following. Given
, the local observations are commonly assumed independent from each other. Consequently, we can factorize their conditional probability distribution as
. This model is used in [36, 20, 22, 21] based on the fact that is evaluated at node , when calculating the loglikelihood ratio (LLR), solely based on the status of . Consequently, in this detection structure works as a sufficient statistic for , i.e., . Since the a posteriori probability distribution of can be stated as , from (4) we have(5) 
which leads to
(6) 
and
(7) 
The proportionality sign in (5) includes but is not limited to . The message update rule can be expressed in terms of the belief and message loglikelihood ratios (LLRs) defined, respectively, as
(8)  
(9) 
where can be used as an estimation of the likelihood ratio of and can be used as the message sent from node to node . By this change of variables, we have^{1}^{1}1In (10), we have replaced by and by since can be merged into and can be merged into with no impact on the rest of the analysis.
(10)  
(11) 
where while denotes the LLR of the local observations at node . We refer to as the local sensing outcome at node . After iterations, is used at node as a decision variable to perform an LRT, i.e., if and otherwise.
In order to properly determine the threshold and to formulate the system performance in terms of the commonly used metrics, the probability distribution of is needed. This is a major challenge since, first, the probability distribution of ’s are not known a priori. And second, even if they are known somehow, the nonlinearity of the messageupdate rule in (10) does not allow us to find the distribution of and the resulting performance in closed form. In [20], a BPbased distributed detection method is introduced. However, due to these challenges, the detection threshold is determined through a heuristic approach and the system performance optimization is not discussed. In this paper we build an analytical framework to address these issues.
Eq. (11) shows that the decision variable at node , i.e., , is built by the local LLR obtained at node plus a linear combination of the messages received from the neighboring nodes. The messages, however, pass through a nonlinear transformation , see Fig. 1. In (10) and (11) we clearly see how the nonlinearity of the BP algorithm transforms the likelihood values in the factor graph.
A closer look at (10) reveals that
plays a role similar to the role of the sigmoid activation functions typically used in artificial neural networks
[10]. In certain neuralnetwork structures, each neuron generates an outcome by applying a bounded nonlinear transformation on the sum of its inputs received from other neurons in the network. This transfer function keeps the network stable by smoothening the neuron outcomes as well as by restricting them to reside between an upper and a lower bound. Consequently, the signals passed from one neuron to another become less sensitive to sharp variations in the inputs and the mapping learned by the network does not result in unbounded values for the variables of interest.
We propose to replace the nonlinear functions in the BP algorithm by linear ones. However, we design the proposed linear transformations in the network while taking into account the convergence condition of the distributed messagepassing algorithm obtained. Having linear messageupdate rules in the BP algorithm enables us to analyze the network dynamics and develop effective optimization techniques to enhance system performance.
We realize the proposed linearization by approximating the message transfer function . Specifically, we use the firstorder Taylor series expansion of with respect to its second argument, i.e., , where . This leads to a linear messageupdate rule as
(12) 
where .
By applying this approximation on (10) and (11), while iterating the resulting linear messageupdate rule, we obtain
(13) 
which shows that, given enough time, the overall result of the BP algorithm at a node is obtained, approximately, by a linear combination of all the local LLRs in the network. Note that we do not need the iteration index in this expression since (III) denotes the final result of the iterations.
Let denote a normed space where is defined and let denote a vector containing all the messages at iteration . A general messagepassing algorithm is defined by where indicates the messageupdate rule. Convergence of such an iteration to a socalled fixed point means that the sequence obtained by iterating converges to a unique fixed point for any . Consequently, in (III) denotes (the decision variable made by) the fixed point of the linear BP algorithm in (12) and an approximation of (the decision variable made by) the fixed point attained via the BP in (10).
The linearization in (12) can be interpreted in two ways. One can see it as a linear approximation to the messageupdate rule in (10). Such an interpretation is valid since when
’s are Gaussian (which is a common practical case), one can normalize them without affecting the detection result. Specifically, when the decision variable follows a Gaussian distribution, for a constant falsealarm probability the probability of detection does not change by normalizing the decision variable. Consequently, by proper normalization of
’s, the messages can be made small enough to experience a linear behavior in the transfer functions.Alternatively, one can opt not to normalize the local sensing outcomes and to consider (12) as a new messageupdate rule which is obtained directly by replacing the nonlinear transfer functions in the network by linear ones. Note that when the message values are too large to fall within the linear part of the transfer functions, the saturation behavior in acts as a thresholding process leading to a socalled hard fusion of the local LLRs. By using the linear transfer function in (12), the hard fusion rule is replaced by a soft one in which the sensing outcomes are directly combined without passing through that threshold. Since a soft fusion method outperforms a hard fusion one in general, one can expect a better overall detection result obtained by this linearization. A hard fusion scheme refers to a combination of binaryvalued variables obtained by comparing the local sensing outcomes against (local) decision thresholds. See [4] for a comparison between the hard and soft decision methods.
Remark 1: Regardless of how (12) is interpreted, since the sigmoid functions are now removed, the linear transfer functions have to be designed properly to guarantee the convergence of the new messageupdate rule. We will discuss the convergence conditions in Section VI.
Remark 2: We have configured the linear combination in (III) such that it reveals the effect of the network topology on how the fusion coefficients are arranged. Specifically, for the onehop neighbors of node we have one coefficient affecting the local likelihoods received, for the twohop neighbors we have two coefficients and so on. We will use this observation in Section IVB to develop a decentralized distributed inference method.
From (III) it is clear that, determining ’s for all in the network, based on a set of performance criteria, can be viewed as determining the values of ’s in the MRF. In [20, 36], the BP is designed and analyzed without directly considering the problem of fitting an MRF to the network behavior. Specifically, in those works, ’s are set by an empirical estimation of the correlations between the neighboring nodes in a window of time slots, i.e.,
(14) 
where , referred to as the learning factor in this paper, is considered as a small constant and equals if is true and otherwise. denotes the number of samples used in this training process. We will see in Section VIII why should be small.
Such an approach does not directly take into account the impact of the learned factor graph on the system performance. In this paper, however, we argue that
’s provide extra degrees of freedom which can be used to optimize the system performance. Specifically, (
10) shows that ’s determine the behavior of the transfer functions applied on the BP messages. Consequently, optimizing ’s can be viewed as finding an optimal set of transfer functions for the BP algorithm. Since we can normalize ’s without affecting the local detection performance, the proposed linear transfer functions are included in the feasibility region of this optimization. And, since a soft fusion method is better than a fusion method which clips the local LLRs, the performance of a BP iteration with the proposed linear transfer functions gives an upper bound on the detection performance of the BP algorithm defined by (10). Hence, by using linear transfer functions and by optimizing the resulting messagepassing algorithm, we are optimizing the parameters of the BP algorithm in a distributed detection scenario.Alternatively, note that the proposed linear BP algorithm defined by (12) is the same as the BP algorithm with (10) where ’s are large enough. And, ’s are the degrees of freedom in this design. Consequently, from an optimization perspective, there is no difference between the two messagepassing algorithms.
Iv Linear Fusion Based on Belief Propagation
In this section, we use the link between the BP and linear datafusion to design distributed inference methods in both centralized and decentralized network configurations. In a centralized setting [2], there is a special node in the network, referred to as the fusion center (FC), which collects data from the sensing nodes, coordinates the distributed inference, and builds the overall inference outcome based on the collected data. In a decentralized (i.e., adhoc) network, however, there is no FC and the statistical inference is realized through the socalled peertopeer communication between the nodes.
Iva Centralized Distributed Inference
Eq. (III) can compactly be stated as or, in matrix form, as
(15) 
where , , and where . Assuming as the objective function corresponding to a certain application and as one of constraint functions indicating the requirements to be met by the system design, the optimal linear fusion in the network can be obtained by
(P1) 
This is a generic linear fusion problem whose structure depends on the application at hand. We further clarify our points by considering a distributed spectrum sensing scenario where each node, say node , performs a binary hypothesis test on the decision variable to decide the availability of the radio spectrum.
In this scenario, is a binaryvalued random variable representing the occupancy state of the radio spectrum where node is located. Let’s say, indicates that the spectrum band sensed by node is occupied whereas
indicates the availability of that band for the secondary use. The signal samples used to form the test statistics at node
can be expressed as(16) 
where
denotes a zeromean complexvalued Gaussian random variable which models the additive white noise at node
while , also a zeromean complexvalued Gaussian random variable, represents the signal transmitted by the th PU transmitter with . The channel gain from the th PU transmitter to node , denoted , is assumed constant during the sensing time interval. denotes a binary random variable which represents the state of the th PU transmitter, i.e., the th PU transmitter is on when and is off otherwise. is also assumed constant during the sensing time interval. According to this model, means that no PU signal is received at node , i.e., for all and the spectrum is available for the secondary use.Given the state of , if we assume as in [20] that is a vector of i.i.d. complex Gaussian signal samples, then we have
(17) 
where denotes the received signaltonoise ratio (SNR) at node . Therefore, one needs to know the channel (power) gain, the transmitted signal power, and the noise level to obtain the probability distribution of .
In order to keep the resulting falsealarm rate below a predefined constraint , in [20] in (17) is replaced by
(18) 
where is set such that , which leads to
(19) 
This approach is practically appealing in the sense that, can be simply calculated in terms of the noise level and without the need for the channel gain and the transmitted power level.
The proposed structure remains intact if we use instead of (17). All other variables in (17) along with in (18) can be absorbed either into the coefficients in (III) or into the detection threshold . Consequently, we can justify here why the BP algorithm works well in [20] where energy detection is adopted as the local sensing method. As we show in Section VIII, in (18) works well only for small values of . We propose a new threshold adaptation scheme in Section VB, which can always guarantee the falsealarm rate of the BP algorithm fall below a predefined upper bound without the need for any statistics other than the local noise level.
The performance of the binary hypothesis test discussed is formulated by the socalled falsealarm and detection probabilities defined, respectively, at node as and . The proposed transfer functions are linear and, given the status of ’s, our ’s are Gaussian random variables. Consequently, we can express the proposed system falsealarm and detection probabilities in closed form. These probabilities at node depend not only on the state of but also, in general, on the state of all other ’s being sensed throughout the entire network. We take these interdependencies into account by the total probability theorem. Specifically, at node and for we have
(20) 
where , , and for , and . is the socalled function. Note that contains all ’s except for . It is clear that and . By solving or we obtain a value for which guarantees the falsealarm or detection probability at node be, respectively, equal to or . The fact that is Gaussian comes from (III).
In a CR network, these parameters determine the secondary network throughput and the interference level caused on the PUs. Specifically, the higher the falsealarm probability the more vacant spectrum bands are mistakenly treated as occupied, leading to a loss in the network throughput. Consequently, the opportunistic spectral utilization of the th band is measured by . In addition, the higher the misdetection probability the higher the interference level imposed on the primary users. Hence, the interference level on the th band is measured by . In a rational design strategy, interference on PUs is associated with a cost.
We collect these performance metrics regarding the entire network in vectors and . Consequently, we can express the network performance by the following quantities
(21)  
(22) 
where denotes an allone vector, denotes the aggregate network throughput calculated over all spectrum bands sensed across the whole network and denotes the aggregate interference caused on the PUs operating on those bands, where and while denotes the throughput achieved by using the th band. The cost of interference caused on the th band is denoted by which is included in .
Now, the optimal sensing performance is obtained by maximizing the aggregate system throughput subject to a constraint on the aggregate interference as well as the per channel falsealarm and misdetection constraints, i.e.,
(P2)  
where and impose perchannel constraints on the falsealarm and misdetection probabilities. The sequential optimization method in [23], which maximizes the socalled deflection coefficient [12] of the detector, can be used to solve (P2) for a good suboptimal performance.
The deflection coefficient associated with is defined as
(23) 
where, from (15), and . Hence, to maximize the deflection coefficients in the network, we only need and for all and . These statistics are clearly easier to obtain than , , and in (20).
Availability of the statistics discussed may require a centralized configuration in which an FC collects data from the nodes. As explained in [19, Sec. IIC], the standard preambles or synchronization symbols, which typically reside within the PU signals, can facilitate the estimation of the desired statistics in a centralized setting. In Section IVB, we show how to optimize the system performance in a decentralized network and when those statistics are not available a priori.
Remark 3: Due to the nonlinearity of the message update rule in (10), the system falsealarm and detection probabilities are not available in closed form when the BP algorithm is used and, therefore, the performance optimization in (P2) or even guaranteeing a predefined level of performance in a BPbased system analytically is a very challenging task, if possible.
IvB Decentralized Distributed Inference
Messagepassing algorithms are of special interest in the decentralized network configurations where there is no FC, the communication between nodes is limited to onehop neighbors, and, typically, there are no statistics available a priori regarding the wireless environment.
In order to build a decentralized distributed inference, we make three particular observations here. Firstly, note that for all . Moreover, as shown in Section VII, we can use ’s as the fusion coefficients, where , without affecting the detection probability obtained. Scaling down the coefficients is an effective method to mitigate the impact of graph cycles on the system performance.
Secondly, our MRF in (1) captures the correlations between the random variables by a set of pairwise links. In particular, if node is connected to node through node , the correlation between and is accounted for in (1) by two factors and multiplied together within . A similar process is observed in (III) where the impact of on , which depends on the correlation between and , is determined by multiplying two factors and corresponding, respectively, to the link from node to node and the link from node to node .
Thirdly, the fusion process in (III) is inherently symmetric. That is, each node performs the same local fusion on the data received from its neighbors and passes the result on along the path towards node . We use this symmetric structure, which is clarified in (24), to distribute the load of the desired optimization across all nodes in the network. Specifically, we assign each node the task of optimizing its own local fusion process. In this manner, each node has to deal with a few variables and with the statistics received from its onehope neighbors only. The fact that makes this collection of localized optimizations a reliable distributed approach to the system performance optimization. The reason is that ’s received from neighbors with longer distance are multiplied by higherorder terms in the datafusion process. Consequently, the system inherently favors, at each node, the data received from the close neighbors over the data from the moredistant ones.
(24) 
Therefore, firstly, we further approximate (III) by keeping the firstorder terms, i.e.,
(25) 
We use , where , when optimizing the coefficients. In the vector form we have where contains all ’s such that and contains the local likelihood ratios at node and at all its neighbors.
Remark 4: The approximation in (25) is only used to realize the optimization of ’s in a decentralized configuration. The resulting coefficients are then used in a linear messageupdate rule as in (12). Hence, the overall impact of the distributed linear BP obtained is still approximated by (III).
Secondly, we optimize by considering the onehop neighbors of node . In this manner, the correlation between and is captured in while the correlation between and is accounted for by . Hence, both of the correlations concerned are taken into account in the system design by the multiplication in (III) while each node sees its immediate neighbors only. This optimization can be performed conveniently in a distributed fashion since each node has to deal with only a few variables.
Contrary to the linear fusion method discussed in Section IVA, here we have an iterative messagepassing algorithm whose convergence needs to be guaranteed. In Section VI, we show that the convergence is guaranteed when we have which is a convex constraint on the coefficients. Therefore, the proposed decentralized linear fusion scheme is obtained by solving the following optimization at node , ,
(P3)  
where , , and denote the applicationspecific local objective function, the local constraint functions, and the number of constraints at node , respectively. The major advantage of this optimization is that the local objective and constraint functions are calculated in terms of the statistics of and these statistics are now conveniently obtained due to the linearity of the proposed message update rule.
It is worth noting that, considering the overall system performance, the use of (25) does not mean that the impact of the higherorder terms on is ignored. Instead, the task of optimizing those terms are assigned to the other nodes in the network. It is clear that by this approximation we achieve a suboptimal solution. However, this solution distributes the optimization load well over the entire network by exploiting the symmetry in the overall datafusion process achieved. Moreover, the use of (25) is supported by the observations made in [4] which investigates the performance of different fusion methods operating on erroneous data. In particular, it is shown in [4] that a soft fusion scheme is rather robust against the errors in the local test summaries reported by the cooperating nodes. Accordingly, we consider the impact of the truncation in (25) as a reporting error in the network and expect the system to tolerate it significantly. Taking advantage of this robustness in the linear fusion scheme, we break down the desired optimization into a set of small local optimizations which can be conducted collectively in a decentralized distributed setting. The numerical results in Section VIII support the effectiveness of the proposed optimization.
In the rest of the paper, we use to denote the coefficients in the linear approximation of the BP algorithm, i.e., , to distinguish those coefficients from the ones obtained by the proposed optimization. In addition, we use to contain all ’s such that .
In a spectrum sensing scenario, the proposed optimization is built by the wellknown NeymanPearson hypothesis test [12] at each node. Specifically, at each node we aim at maximizing the detection probability subject to an upper bound on the falsealarm probability, i.e.,
(P4)  
where and are derived, approximately, in a decentralized fashion as described in the following.
The conditional probabilities in (20), which govern the statistical behavior of the test summary at node , take into account any possible correlations between and all other ’s. Some nodes may, in fact, be located far away from node and their observations may have, if any, a little or negligible impact on the detection result at node . Hence, we simplify the calculations by limiting the summation in (20) to include only the correlations between and ’s observed by the onehop neighbors of node . These statistics can be obtained by each node in a network with a decentralized configuration. The proposed approximation is expressed as
(26) 
where is a vector which contains ’s with while for , we have and . Note that denotes the number of onehop neighbors of node while and denote an estimation of the first and secondorder conditional statistics of the test summary at node given the value of and its immediate neighbors. And, is approximated as in (25).
Based on (25) we have and . Hence, the use of the deflection coefficient here can simplify the design even further compared to solving (P4). Maximizing the deflection coefficient is a classic lowcomplexity approach to optimal linear fusion which is considered in e.g., [24, 23, 25].
Remark 5: Since each node is typically connected to only a few neighbors, regardless of the scale of the network, each node has to deal only with a few variables when running the proposed optimization (and, as discussed later, when estimating the required statistics). Consequently, the scalability of the proposed detection scheme does not depend on the lowcomplexity optimization techniques in [24, 23, 25].
IvC Discussion
The use of the BP algorithm, linear datafusion, and MRFs in statistical inference systems cover a wide range of applications and design scenarios in the literature. However, these methods are normally investigated separately and in different contexts. In this paper, we intend to provide a more comprehensive view and shed light on the potential links between these effective tools. Therefore, we aim at presenting our ideas in an as generic of a framework as possible to help a reader possibly interested in a different application better understand the links discussed. The use of generic expressions to denote the objective and cost functions in (P1) and (P3) serves this purpose.
However, when interpreting the results of the proposed optimization framework, it should be noted that we focus on distributed detection in a wireless network. Consequently, some limitations may arise from the use of certain tools in this paper such as the type of the MRF model adopted or as the falsealarm and detection probabilities used to measure the system performance. Nevertheless, in the context of distributed detection, the models we use in this paper do not restrict the practicality of the design proposed. All the assumptions regarding the MRF, local sensing method, performance metrics, optimization scenarios, availability of channel statistics, network configuration, and the possibility of onehop communications, etc. are in line with the design scenarios widelyused in the literature.
In fact, the proposed distributed optimization is based on the linear fusion structure in (III) and its symmetric nature as clarified in (24). The linear fusion discussed is mainly based on (1) and (4) which together build a relatively generic model for a distributed binary hypothesis test. Due to the reasons discussed in Section II, modeling the joint pdf of the variables in terms of pairwise correlations as in (1) is a commonlyused approach in distributed detection. In addition, the use of an exponential form in (4) to model the behavior of correlated binaryvalued random variables is a popular choice since, first, it is supported by the principle of maximum entropy [33] and, second, when the graphical model is built in terms of products of functions, these products turn into additive decompositions in the log domain.
The use of the falsealarm and detection probabilities, as performance metrics, only affects the structure of the objective and constraint functions discussed. As clarified in Remark 5, when using the proposed distributed optimization, each deals with a relatively small optimization problem and, therefore, the type of the objective and constraint functions in (P4) are not of major concern. Hence, we do not lose the generality of the proposed distributed optimization by the use of these metrics. Our main goal in such formulations is to facilitate the design of the CR network since the falsealarm and detection probabilities are directly linked to the CR network throughput and the level of interference experienced by the PUs, respectively. This link is explained in Section IVA.
V Blind Distributed Detection
We propose the notions of adaptive linear belief propagation and detector calibration here to realize a nearoptimal distributed detection performance when there is no information available a priori regarding the statistical behavior of the radio environment.
Va Offline Learning and Optimization
As we saw in Section IV, in order to optimize the fusion weights in the proposed linear messagepassing algorithm, we need the first and secondorder statistics of the local sensing outcomes. Since we do not assume any prior knowledge about the channel statistics or the behavior of the PUs, we have to estimate those parameters based on the detection outcome. This is a challenging task since the detection outcome itself depends on those estimates. Erroneous detection outcomes lead to erroneous estimates of the desired statistics which, in turn, deteriorate the detection results. We overcome this challenge by implanting an adaptation process within the linear BP algorithm, which is run offline on a set of data samples collected and processed for channel estimation and detector optimization purposes. This adaptation is realized by an offline linear BP algorithm whose outcome is a set of nearoptimal coefficients used in the main linear BP algorithm which is responsible for spectrum sensing in real time.
In a blind detection scheme the only source of information is the detector outcome. Hence, we have to use and to estimate and for all for all . These estimates are erroneous in nature due to the nonzero falsealarm and misdetection probabilities in the system. As an example note that, to estimate node needs to calculate an average over the samples of received when . However, node does not exactly know when and it can only calculate . When the sensing outcome indicates , it may be, in fact, a misdetection and including the sample in the averaging process may lead to a significant error in that case. Note that samples can vary widely in size depending on the status of . Similar errors can also occur due to the falsealarm incidents.
Therefore, we need to enhance the quality of sensing in order to have highquality estimates of the desired statistics which, in turn, are used to achieve a better sensing quality. Clearly, we are facing a loop here. Hence, we design an iterative learningoptimization loop to optimize the proposed messagepassing algorithm. This loop works as an offline linear BP algorithm which operates on a window of samples of .
In the following, we use (instead of ) to denote the iteration index of the offline linear BP algorithm in order to distinguish this iteration from the main linear BP algorithm. In addition, we use to denote a window of samples of stored for the proposed offline data processing and to denote the offline sensing outcomes, while we use and to denote respectively the fusion coefficients and the detection threshold at node at (offline) iteration . For notational convenience, we do not explicitly show the timing of the samples here. However, it should be noted that by and we are referring, respectively, to samples of and corresponding to . It is also worth noting the difference between and . Throughout the paper, we use to denote the sensing outcomes resulted from the main linear BP. However, is used here to denote the offline sensing outcomes based on which the desired statistics are estimated.
Adaptive Linear Belief Propagation for Detector Optimization 
Input: , , , 
Output: Nearoptimal coefficients for linear BP 

1. Let and initialize by comparing against ; 
2. while 
3. for node 
4. Calculate and for all ; 
5. Solve (P4) to find coefficients and threshold ; 
6. end 
7. Run the resulting linear BP on to find ; 
8. ; 
9. end 
10. for node 
11. for 
12. if 
13. ; 
14. else 
15. ; 
16. end 
17. end 
18. end 
19. Output for ; 
Algorithm I provides a pseudocode of the proposed adaptation algorithm in which pernode computations are clarified. The adaptation is designed in a way that each node processes samples of only from its immediate neighbors. Hence, in the proposed distributed optimization no further information is used at each node other than those assumed available in [20].
The loop starts with initializing by the local sensing process on . Specifically, for , initial offline sensing outcome is created at node by comparing against the corresponding local threshold calculated by (19
) which only requires the noise variance. The noise levels can be measured in practice
[20]. At iteration , all samples of are used to calculate and for all , . These estimates are then fed into (P4) to find the coefficients and thresholds . The resulting linear BP is then run over to find . The iteration stops after either a predetermined number of cycles or when there is no significant change in the updated parameters and statistics concerned. Lines 1 to 9 in Algorithm I describe this iteration.Due to the multipath fading or shadowing effects typically associated with the wireless environment, we may experience very low SNR levels at some sensing nodes. In such cases, the signals to be detected may be buried under a heavy noise and the channel estimation process may not be able to reliably capture the correlations between the desired random variables. In order to ensure that the system performance is not degraded in such conditions – to below the performance level of the BP algorithm – we compare the coefficients obtained by the adaptive linear BP against their counterparts in the linearized version of the main BP algorithm. If a coefficient obtained by the offline iteration is too small compared to its counterpart in the main BP, then that coefficient is not used in the adaptation and the one from the main BP is used instead. Lines 10 to 18 in Algorithm I illustrate this process. Specifically, if is greater than a predefined value denoted by , then is used as the final fusion weight. Otherwise, is used. The final coefficients , , are then used in the main linear BP algorithm for spectrum sensing.
The proposed adaptation scheme enables us to rely on the effectiveness of the BP algorithm in capturing the interdependencies of the random variables in low SNR regimes while enhancing its performance by the linear fusion techniques whenever there is a room for such an enhancement. This is a hybrid design in which the two detection techniques work together. The BP algorithm properly takes into account the outcome of the nodes in low SNR conditions while the linear fusion enhances the performance by injecting more information about the environment into the BP algorithm and this information is obtained from the nodes operating in better conditions. It is reasonable to assume that different nodes operate in different conditions due to the inherent spatial diversity in a wireless network.
It is worth noting that, the time is modeled as slotted in this paper and we refer to the proposed adaptation process as offline to distinguish it from the main linear BP algorithm which is executed at the beginning of every time slot. Unlike the main linear BP iteration (or the legacy BP), Algorithm I is not run at every time slot. Instead, it is executed with a longer periodicity and in step with the changing characteristics of the radio environment. Since we assume that the statistical behavior of the environment changes slowly, the period over which Algorithm I is repeated is much longer than the timeslot duration. Specifically, if the spectrum sensing is performed at , , with being the slot duration, then Algorithm I is executed at for , while
Comments
There are no comments yet.