Neural Enhanced Belief Propagation on Factor Graphs

03/04/2020 ∙ by Victor Garcia Satorras, et al. ∙ 16

A graphical model is a structured representation of locally dependent random variables. A traditional method to reason over these random variables is to perform inference using belief propagation. When provided with the true data generating process, belief propagation can infer the optimal posterior probability estimates in tree structured factor graphs. However, in many cases we may only have access to a poor approximation of the data generating process, or we may face loops in the factor graph, leading to suboptimal estimates. In this work we first extend graph neural networks to factor graphs (FG-GNN). We then propose a new hybrid model that runs conjointly a FG-GNN with belief propagation. The FG-GNN receives as input messages from belief propagation at every inference iteration and outputs a corrected version of them. As a result, we obtain a more accurate algorithm that combines the benefits of both belief propagation and graph neural networks. We apply our ideas to error correction decoding tasks, and we show that our algorithm can outperform belief propagation for LDPC codes on bursty channels.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 2

page 3

page 4

Code Repositories

Channel-Coding-Project

LDPC Code Implementation of the G.hn ITU G9960 standard proposed by the ITU-T. Jan 2017 – Apr 2017


view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Graphical models (bishop2006pattern; murphy2012machine) are a structured representation of locally dependent random variables, that combine concepts from probability and graph theory. A standard way to reason over these random variables is to perform inference on the graphical model using message passing algorithms such as Belief Propagation (BP) (pearl2014probabilistic; murphy2013loopy)

. Provided that the true generative process of the data is given by a non-loopy graphical model, BP is guaranteed to compute the optimal (posterior) marginal probability distributions.

However, in real world scenarios, we may only have access to a poor approximation of the true distribution of the graphical model, leading to sub-optimal estimates. In addition, an important limitation of belief propagation is that on graphs with loops BP computes an approximation to the desired posterior marginals or may fail to converge at all.

In this paper we present a hybrid inference model to tackle these two limitations. We cast our model as a message passing method on a factor graph that combines messages from BP and from GNNs. The GNN messages are learned from data and complement the BP messages. The GNN receives as input the messages from BP at every inference iteration and delivers as output a refined version of them back to BP. As a result, given a labeled dataset, we obtain a more accurate algorithm that outperforms either Belief Propagation or Graph Neural Networks when run in isolation in cases where Belief Propagation is not guaranteed to obtain the optimal marginals.

Belief Propagation has demonstrated empirical success in a variety of applications: Error correction decoding algorithms (mceliece1998turbo)

, combinatorial optimization in particular graph coloring and satisfiability

(braunstein2004survey), inference in markov logic networks (richardson2006markov)

, the Kalman Filter is a special case of the BP algorithm

(yedidia2003understanding; welch1995introduction) etc. One of its most successful applications is Low Density Parity Check codes (LDPC) (gallager1962low) an error correction decoding algorithm that runs BP on a loopy bipartite graph. LDPC is currently part of the Wi-Fi 802.11 standard, it is an optional part of 802.11n and 802.11ac, and it has been adopted for 5G, the fifth generation wireless technology that began wide deployment in 2019. Despite being a loopy algorithm, its bipartite graph is typically very sparse which reduces the number of loops and increases the cycle size. As a result, in practice LDPC has shown excellent results in error correction decoding tasks and performs close to the Shannon limit in Gaussian channels.

However, a Gaussian channel is an approximation of the more complex noise distributions we encounter in the real world. Many of these distributions have no analytical form, but we can approximate them from data. In this work we show the robustness of our algorithm over LDPC codes when we assume such a non-analytical form. Our hybrid method is able to closely match the performance of LDPC in Gaussian channels while outperforming it for deviations from this assumption (i.e. a bursty noise channel (gilbert1960capacity; kim2018communication)). The three main contributions in our work are:

  • We define a graph neural network that operates on factor graphs (FG-GNN).

  • We present a new hybrid inference algorithm, Neural Enhanced Belief Propagation (NEBP), that refines BP messages using the FG-GNN.

  • We apply our method to an error correction decoding problem for a non-Gaussian (bursty) noise channel and show clear improvement on the Bit Error Rate over existing LDPC codes.

2 Background

2.1 Factor Graphs

Factor graphs are a convenient way of representing graphical models. In a factor graph, each factor defines the dependencies between a subset of variables . The global probability distribution is defined as the product of all of these factors as shown in equation 1, where is the normalization constant of the probability distribution. A visual representation of a Factor Graph is shown in the left image of Figure 2.

(1)

2.2 Belief Propagation

Belief Propagation (bishop2006pattern), also known as the sum-product algorithm, is a message passing algorithm that performs inference on graphical models by locally marginalizing over random variables. It exploits the structure of the factor graph, allowing more efficient computation of the marginals.

Belief Propagation directly operates on factor graphs by sending messages (real valued functions) on its edges. These messages exchange beliefs of the sender nodes about the receiver nodes, therby transporting information about the variable’s probabilities. We can distinguish two types of messages: messages which go from variable to factor and messages which go from factor to variable.

Variable to factor: is the product of all incoming messages to variable from all neighbor factors except for factor .

(2)

Factor to variable: is the product of the factor itself with all its incoming messages from all variable neighbor nodes except for marginalized over all associated variables except .

(3)

To run the Belief Propagation algorithm, messages are initialized with uniform probabilities, and the two above mentioned operations are then recursively run until convergence. One can subsequently obtain marginal estimates by multiplying all incoming messages from the neighboring factors:

(4)

From now on, we simplify notation by removing the argument of the messages function. In Figure 1 we can see the defined messages on a factor graph where black squares represent factors and blue circles represent variables.

Figure 1: Belief Propagation on a Factor Graph

2.3 LDPC codes

In this paper we will apply our proposed method to error correction decoding using LDPC codes. LDPC (Low Density Parity Check) codes (gallager1962low; mackay2003information) are linear codes used to correct errors in data transmitted through noisy communication channels. The sender encodes the data with redundant bits while the receiver has to decode the original message. In an LDPC code, a parity check sparse matrix is designed, such that given a code-word of bits the product of and is constrained to equal zero: . can be interpreted as an adjacency matrix that connects variables (i.e. the transmitted bits) with factors, i.e. the parity checks that must sum to 0. The entries of are if there is an edge between a factor and a variable, where rows index factors and columns index variables.

For a linear code , is the number of variables and

the number of factors. The prior probability of the transmitted code-word is:

(5)

Which can be factorized as

(6)

At the receiver we get a noisy version of the code-word, . The noise is assumed to be i.i.d, therefore we can express the probability distribution of the received code-word as as . Finally we can express the posterior distribution of the transmitted code-word given the received ones as:

(7)

Equation 7 is a product of factors, where some factors in (eq. 6) are connected to multiple variables expressing a constraint among them. Other factors are connected to a single variable expressing a prior distribution for that variable. A visual representation of this factor graph is shown in the left image of Figure 2.

Finally, in order to infer the transmitted code-word given , we can just run (loopy) Belief Propagation described in 2.2 on the Factor Graph described above (equation 7). In other words, error correction with LDPC codes can be interpreted as an instance of Belief Propagation applied to its associated factor graph.

3 Related Work

One of the closest works to our method is (satorras2019combining)

which also combines graphical inference with graph neural networks. However, in that work, the model is only applied to the Kalman Filter, a hidden Gaussian Markov model for time sequences, and all factor graphs are assumed to be pair-wise. In our case, we run the GNN in arbitrary Factor Graphs, and we hybridize Belief Propagation, which allows us to enhance one of its main applications (LDPC codes). Other works also learn an inference model from data like Recurrent Inference Machines

(putzky2017recurrent) and Iterative Amortized Inference (marino2018iterative). However, in our case we are presenting a hybrid algorithm instead of a fully learned one. Additionally in (putzky2017recurrent) graphical models play no role.

Our work is also related to meta learning (schmidhuber1987evolutionary) (andrychowicz2016learning) in the sense that it learns a more flexible algorithm on top of an already existing one. It also has some interesting connections to the ideas from the consciousness prior (bengio2017consciousness) since our model is an actual implementation of a sparse factor graph that encodess prior knowledge about the task to solve.

Another interesting line of research concerns the convergence of graphical models with neural networks. In (mirowski2009dynamic)

, the conditional probability distributions of a graphical model are replaced with trainable factors.

(johnson2016composing)

learns a graphical latent representation and runs Belief Propagation on it. Combining the strengths of convolutional neural networks and conditional random fields has shown to be effective in image segmentation tasks

(chen2014semantic; zheng2015conditional).

More closely to our work, (yoon2018inference) trains a graph neural network to estimate the marginals in Binary Markov Random Fields (Ising model) and the performance is compared with Belief Propagation for loopy graphs. In our work we are proposing a hybrid method that combines the benefits of both GNNs and BP in a single model. In (nachmani2016learning) some weights are learned in the edges of the Tanner graph for High Density Parity Check codes, in our case we use a GNN on the defined graphical model and we test our model on Low Density Parity Check codes, one of the standards in communications for error decoding. A subsequent work (liu2019neural) uses the model from (nachmani2016learning) for quantum error correcting codes.

Recently, (zhang2019factor) also presented a model to run graph neural networks on factor graphs. However, in our case we simply adjust the Graph Neural Network equations to the factor graph scenario as a building block for our hybrid model (NEBP).

4 Method

Figure 2: Visual representation of a LDPC Factor Graph (left) and its equivalent representation for the Graph Neural Network (right). In the Factor Graph, factors are displayed as black squares, variables as blue circles. In the Graph Neural Network, nodes associated to factors are displayed as black circles. Nodes associated to variables are displayed as black circles.

4.1 Graph Neural Network for Factor Graphs

We will propose a hybrid method to improve believe propagation by combining it with Graph Neural Networks (GNNs). Both methods can be seen as message passing on a graph. However, where BP sends messages that follow directly from the definition of the graphical model, messages sent by GNNs must be learned from data. To achieve seamless integration of the two message passing algorithms, we will first extend GNNs to factor graphs.

Graph Neural Networks (bruna2013spectral; defferrard2016convolutional; kipf2016semi) operate on graph-structured data by modelling interactions between pairs of nodes. A graph is defined as a tuple , with nodes and edges . Employing a similar notation as (gilmer2017neural), a GNN defines the following edge and node operations on the graph:

GNN


Table 1: Graph Neural Network equations.

The message passing procedure is divided into two main steps: from node embeddings to edge embeddings , and from edge to nodes . Where is the embedding of a node , is the edge operation, and is the embedding of the edge. First, the edge embeddings are computed, which one can interpret as messages, next we sum all node incoming messages. After that, the embedding representation for node , , is updated through the node function . Values and are optional edge and node attributes respectively.

We can easily run a GNN on factor graphs with only pair-wise factors (i.e. a factor graph where each factor contains only two variables). For example, in (yoon2018inference) a GNN on pair-wise factor graphs was defined, where each variable from the factor graph was a node in the GNN, and each factor connecting two variables represented an edge in the GNN. Properties of the factors were associated with edge attributes .

The mapping between GNNs and Factor Graphs becomes less trivial when each factor may contain more than two variables. We can then no longer consider each factor as an edge of the GNN. In this work we propose special case of GNNs to run on factor graphs with an arbitrary number of variables per factor.

Similarly to Belief Propagation, we first assume that a Factor Graph is a bipartite graph with two type of nodes , variable-nodes and factor-nodes , and two types of edge interactions, depending on if they go from factor-node to variable-node or vice-versa. With this graph definition, all interactions are again pair-wise (between factor-nodes and variable-nodes in the bipartite graph).

A mapping between a factor graph and the graph we use in our GNN is shown in Figure 2. All factors from the factor graph are assumed to be factor-nodes in the GNN. We make an exception for factors connected to only a single variable which we simply consider as attributes of that variable in order to avoid redundant nodes.

Once we have defined our graph, we use the GNN notation mentioned in Table 1, and we re-write it specifically for this new graph in the following Table 2. From now on we reference these new equations as FG-GNN.

FG-GNN
Table 2: Factor Graph Neural Network

Notice that in the GNN we did not have two different kinds of variables in the graph and hence we only needed one edge function (but notice that the order of the arguments of this function matters so that a message from is potentially different from the message in the reverse direction). For the FG-GNN however, we now have two types of nodes, which necessitate two types of edge functions, and , depending on whether the message was sent by a variable or a factor node.

In addition, we also have two type of node embeddings and for the two types of nodes and .

Again we sum over all incoming messages for each node, but now in the node update we have two different functions, for the factor-nodes and for the variable-nodes. The optional edge attributes are now labeled as , , and the node attributes and .

4.2 Neural Enhanced Belief Propagation

Figure 3: Graphical illustration of our Neural Enhanced Belief Propagation algorithm. Three modules are depicted in each iteration {BP, FG-GNN, Comb.}. Each module is associated to each one of the three lines from equation 8.

Now that we have defined the FG-GNN we can introduce our hybrid method that runs co-jointly with Belief Propagation on a factor graph, we denote this new method Neural Enhanced Belief Propagation (NEBP). At a high level, the procedure is as follows: after every Belief Propagation iteration, we input the BP messages into the FG-GNN. The FG-GNN then runs for two iterations and updates the BP messages. This step is repeated recursively for N iterations. After that, we can compute the marginals from the refined BP messages.

We first define the two functions and . takes as input the factor-to-node messages , then runs the two BP updates eqns. 2 and 3 respectively and outputs the result of that computation as . We initialize

as uniform distributions.

The function runs the equations displayed in table 2. At every iteration we give it as input the quantities , and . is initialized randomly as

by sampling from a normal distribution. Moreover, the attributes

and are provided to the function as the messages and obtained from . The outputs of

are the updated latent vector

and the message computed as part of the FG-GNN algorithm of Table 2. All other variables computed inside are kept internal to this function. We define as two iterations of the algorithm to match the way information is propagated in BP.

The node attributes and may contain some properties about the factor and the node respectively. Regarding the node attributes they also contain the message from a singleton factor to a variable, i.e. a factor that is only connected to one variable. As shown in Figure 2, factors that are connected to a single variable do not involve a special node in the FG-GNN since it would be redundant with the variable node. For this reason, we input these messages as a node property.

Finally, takes as input the node embeddings and the edge embeddings , and outputs the refinement for the current message estimates . This function encompasses two MLPs: One takes as input the node embeddings and outptus the refinement for singleton factor messages . The second MLP takes as input the edge embeddings and outputs a refinement for the rest of messages .

In summary, the hybrid algorithm thus looks as follows:

(8)

Since must be proportional to a probability distribution we compute the absolute value after summing with the output. A visual representation of the Neural Enhanced Belief Propagation algorithm is displayed in Figure 3. Each one of the modules from the image {BP, FG-GNN, Comb.} is associated to each one of the lines from equation 8 respectively.

After running the algorithm for N iterations. We obtain the estimate by using the same operation as in Belief Propagation (eq. 4), which amounts to taking a product of all incoming messages to node , i.e. . From these marginal distributions we can compute any desired quantity on a node.

4.3 Training and Loss

The loss is computed from the estimated marginals and ground truth , which we assume known during training.

(9)

During training we back-propagate through the whole multi-layer estimation model (with each layer an iteration of the hybrid model), updating the FG-GNN and weights. The number of training iterations is chosen by cross-validating. In our experiments we use the binary cross entropy loss.

Figure 4:

Bit Error Rate (BER) with respect to the Signal to Noise Ratio (SNR) for different bursty noise values

.

5 Experiments

In this section we compare the performance of Belief Propagation, FG-GNNs, and our Neural Enhanced BP (NEBP) in an error correction task where Belief Propagation is also known as LDPC 2.3.

Model details:

In both the FG-GNN and the NEBP, we used two layer multilayer perceptrons (MLP) for the edge functions

and defined in section 4. The node update functions and

are also composed of two layer MLPs, this time followed by a Gated Recurrent Unit

(chung2014empirical). In the hybrid model (NEBP), the two MLPs encompassed in

are also two layer MLPs. In all cases, the number of hidden features is 64 and all activation functions are ’Selus’

(klambauer2017self) except for which uses ’ReLUs’ (xu2015empirical).

5.1 Low Density Parity Check codes

LDPC codes, explained in section 2.3 are a particular case of Belief Propagation run on a bipartite graph for error correction decoding tasks. Bipartite graphs contain cycles, hence Belief Propagation is no longer guaranteed to converge nor to provide the optimal estimates. Despite this lack of guarantees, LDPC has shown excellent results near the Shannon limit (mackay1996near) for Gaussian channels.

LDPC assumes a channel with an analytical solution, commonly a Gaussian channel. In real world scenarios, the channel may differ from Gaussian, leading to sub-optimal estimates and some channels may not even have an analytical solution to run Belief Propagation on. An advantage of neural networks is that, in such cases, they can learn a decoding algorithm from data.

In this experiment we consider the bursty noisy channel from (kim2018communication), where a signal is transmitted through a standard Gaussian channel , however this time, a larger noise signal is added with a small probability . More formally:

(10)

Where is the received signal, and

follows a Bernoulli distribution such that

with probability and with probability . In our experiments, we set as done in (kim2018communication). This bursty channel describes how unexpected signals may cause interference in the middle of a transmitted frame. For example, radars may cause bursty interference in wireless communications. In LDPC, the SNR is assumed to be known and fixed for a given frame, yet, in practice it needs to be estimated with a known preamble (the pilot sequence) transmitted before the frame. If bursty noise occurs in the middle of the transmission, the estimated SNR is blind to this new noise level.

Dataset: We use the parity check matrix ”96.3.963” from (mackay2009david) for all experiments, with variables and factors, i.e. a transmitted code-word contains 96 bits. The training dataset consists of pairs of received and transmitted code-words . The transmitted code-words are used as ground truth for training the decoding algorithm. The received code-words r are obtained by transmitting through the bursty channel from equation 10. We generate samples for SNR = {0, 1, 2, 3, 4}. Regarding the bursty noise

, we randomly sample its standard deviation from a uniform distribution

. We generate a validation partition of 500 code-words (100 code-words per SNR value). For the training partition we keep generating samples until convergence, i.e. until we do not see further improvement in the validation accuracy.

Training procedure: We provide as input to the model the received code-word and the SNR for that code-word. These values are provided as node attributes described in section 4. We run the algorithms for 20 iterations and the loss is computed as the cross entropy between the estimated and the ground truth . We use an Adam optimizer (kingma2014adam) with a learning rate

and batch size of 1. As a evaluation metric we compute the Bit Error Rate (BER), which is the number of error bits divided by the total amount of transmitted bits. The number of test code-words we used to evaluate each point from our plots (Figure

4) is at least , where is the number of bits per code-word and is the estimated Bit Error Rate for that point.

Baselines: Beside the already mentioned methods (FG-GNN and standard LDPC error correction decoding), we also run two extra baselines. The first one we call ”Bits baseline”, which consists of independently estimating the bit that maximizes . The other baseline, called ”LDPC-bursty”, is a variation of LDPC, where instead of considering a SNR with a noise level , we consider the noise distribution from equation 10

such that now the noise variance is

. We do this to provide a more fair comparison to our learned methods, because even if we are blind to the value, we know there may be a bursty noise with probability and .

Results: In Figure 4 we show six different plots for each of the values {0, 1, 2, 3, 4, 5}. In each plot we sweep the SNR from 0 to 4. Notice that for the bursty noise is non-existent and the channel is equivalent to an Additive White Gaussian Noise channel (AWGN). LDPC has analytically been designed for this channel obtaining its best performance here. The aim of our algorithm is to outperform LDPC for while still matching its performance for . As shown in the plots, as increases, the performance of NEBP and FG-GNN improves compared to the other ones, with NEBP always achieving the best performance, and getting close to the LDPC performance for the AWGN channel (). In summary, the hybrid method is more robust than LDPC, obtaining competitive results to LDPC for AWGN channels but still outperforming it when bursty interferences are present. The FG-GNN instead, obtains relatively poor performance compared to LDPC for small , demonstrating that belief propagation is still a very powerful tool compared to pure learned inference for this task. But despite its poor performance, the FG-GNN shows robustness as we increase , and our hybrid method, NEBP, is able to combine the benefits of both Belief Propagation and FG-GNN to achieve the best performance. Finally, LDPC-bursty shows a more robust performance as we increase but it is significantly outperformed by NEBP in bursty channels, and it also performs slightly worse than LDPC for the AWGN channel ().

Figure 5: Bit Error Rate (BER) with respect to value for a fixed SNR=3.

In order to better visualize the decrease in performance as the burst variance increases, we sweep over different values for a fixed SNR=3. The result is shown in Figure 5. The larger , the larger the BER. However, the performance decreases much less for our NEBP method than for LDPC and LDPC-bursty. In other words, NEBP is more robust as we move away from the AWGN assumption. We want to emphasize that in real world scenarios, the channel may always deviate from gaussian. Even if assuming an AWGN channel, its parameters (SNR) must be estimated in real scenarios. This potential deviations make hybrid methods a very promising approach.

6 Conclusions

In this work, we presented a hybrid inference method that enhances Belief Propagation by co-jointly running a Graph Neural Network that we designed for factor graphs. In cases where the data generating process is not fully known (e.g. the parameters of the graphical model need to be estimated from data), belief propagation doesn’t perform optimally. Our hybrid model in contrast is able to combine the prior knowledge encoded in the graphical model (albeit with the wrong parameters) and combine this with a (factor) graph neural network with its parameters learned from labeled data on a representative distribution of channels. Note that we can think of this as meta-learning because the FG-GNN is not trained on one specific channel but on a distribution of channels and therefore must perform well on any channel sampled from this distribution without knowing its specific parameters.

We tested our ideas on a state-of-the-art LDPC implementation with realistic bursty noise distributions. Our experiments clearly show that the neural enhancement of LDPC improves performance both relative to LDPC and relative to FG-GNN as the variance in the bursts gets larger.

We believe that ’neural augmentation’ of existing, hand designed engineering solutions is a very powerful paradigm, especially in sectors of our economy that are ’engineering heavy’ (e.g. manufacturing, chip design etc.). Hybrid methods of this kind, meta-learned to perform well on a wide variety of tasks and embracing excellent existent engineering solutions, are robust, explainable and data efficient.

References