Blind Channel Equalization using Variational Autoencoders

03/05/2018 ∙ by Avi Caciularu, et al. ∙ Tel Aviv University 0

A new maximum likelihood estimation approach for blind channel equalization, using variational autoencoders (VAEs), is introduced. Significant and consistent improvements in the error rate of the reconstructed symbols, compared to constant modulus equalizers, are demonstrated. In fact, for the channels that were examined, the performance of the new VAE blind channel equalizer was close to the performance of a nonblind adaptive linear minimum mean square error equalizer. The new equalization method enables a significantly lower latency channel acquisition compared to the constant modulus algorithm (CMA). The VAE uses a convolutional neural network with two layers and a very small number of free parameters. Although the computational complexity of the new equalizer is higher compared to CMA, it is still reasonable, and the number of free parameters to estimate is small.



There are no comments yet.


page 1

page 2

page 3

page 4

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Following the amazing success of deep learning methods in various tasks, these techniques have recently been considered in some communication problems. For example, in [1, 2, 3, 4] deep learning methods were considered to the problem of channel decoding, in [5] the authors proposed an autoencoder as a communication system for short block codes, and in [6] deep learning-based detection algorithms were used when the channel model is unknown.

Our work considers transmission over a noisy intersymbol interference (ISI) channel with an unknown impulse response. Equalization methods for ISI channels using neural networks have been dealt with extensively in the literature [7]. In this paper we consider the case where the input sequence is also unknown, and blind channel equalization is required. Following the blind equalization step, one can apply decision directed equalization, using the blind equalization estimation as an initial value. Blind channel equalization is a special type of blind deconvolution where the input is constrained to lie in some known discrete constellation with known statistics. The standard approach to tackle this problem is the constant modulus algorithm (CMA) [8, 9],[10]. Blind neural network-based algorithms using the constant modulus (CM) criterion have also been proposed in the literature [11].

In this work we propose a new approach to blind channel equalization using the maximum likelihood (ML) criterion. The ML criterion has already been used for blind channel equalization [12, 13, 14]

(and references therein). However, the proposed solutions use the expectation maximization (EM) algorithm or an approximated EM, that require an iterative application of the forward-backward or Viterbi algorithms. The complexities of these algorithms are exponential in the channel memory size, which may be prohibitive. As an alternative, in this paper we propose an approximated ML estimate using the variational autoencoder (VAE) method

[15, 16]

. VAEs are widely used in the literature of deep learning for unsupervised and semi-supervised learning, and as a generative model to a given observations data. We demonstrate significant and consistent improvements in the quality of the detected symbols compared to the baseline blind equalization algorithms. In fact, for the channels that were examined, the performance of the new VAE blind channel equalizer (VAEBCE) was close to the performance of a non-blind adaptive linear minimum mean square error (MMSE) equalizer

[17]. The new equalization method enables lower latency acquisition of an unknown channel response. Although the computational complexity of the new VAEBCE is higher compared to CMA, it is still reasonable, and the number of free parameters to estimate is small.

Ii Problem setup

The communication channel is modeled as a convolution of the input, , with some causal, finite impulse response (FIR), time invariant filter, , of size , followed by the addition of white Gaussian noise


This is the equivalent model of in the end to end communication system shown in Fig. 1, where the sampling is performed at the symbol rate.

Fig. 1: End to end communication system model

The equalizer in Fig. 1 reconstructs an estimate of the transmitted symbol sequence, . Now, suppose that we observe a finite window of measurements data . For clarity of presentation we assume that the input signal is causal ( for ). We refer to this assumption later. Equation (1) can be written compactly for the measurements collected in as


where is the transmitted message, and is an i.i.d. sequence of additive white Gaussian noise. Throughout the paper we assume QPSK modulation, although the derivation can be extended to other constellations. Hence,

, and the above vectors can be written as combinations of real (

) and imaginary () components, so that, , and . Each element of the noise sequence,

, is complex Gaussian with statistically independent real and complex components, and with variance

. Given , and

are statistically independent, normally distributed. The conditional density function of

is . The conditional density function of is . Thus, for , the conditional density of given can be expressed as


Iii Proposed model

We propose using ML estimation of the channel impulse response, . That is, we search for the vector and channel noise variance, , that maximize 111The default base of the logarithms in this paper is .. The ML estimate has strong asymptotic optimality properties, and in particular asymptotic efficiency [13]. For the CMA criterion, on the other hand, one can only claim asymptotic consistency [18]. However, applying the accurate ML criterion to our problem is very difficult since should first be expressed as a multi-dimensional integral

where we integrate over all possible input sequences and where


since we assume a uniformly distributed transmitted sequence. However, for this kind of problem, it has been shown in various applications that it is possible to dramatically simplify the estimation problem by using the variational approach for ML estimation

[15, 16]. By the VAE approach, instead of directly maximizing over , one maximizes a lower bound as follows. It can be shown [15] that logp_θ (y) ≥E_q_Φ(xy) [-logq_Φ(xy)+logp_θ (x,y)] = ⏟-D_KL [q_Φ(xy) —— p(x)]_A + ⏟E_q_Φ(xy) [logp_θ(yx)]_B =Δ -L(θ,Φ, y) where denotes the Kullback Leibler distance between two density functions, and is an arbitrary parametrized (by ) conditional density function. Now, instead of directly maximizing , one maximizes the lower bound over and . In fact, it can be shown [15] that by searching over and all possible conditional densities , one obtains the ML estimate of . Typically, both and are implemented by neural networks. In our problem, is given in (4), and the encoder, , is given in (3). We use the following model for the decoder, .

Recalling that and

, this is a multivariate Bernoulli distribution with statistical independence between components.

In our implementation of the decoder, which acts as the equalizer, we used a fully convolutional network (FCN) architecture with two convolutional layers, each with two output channels, corresponding to the real and imaginary parts of the convolution as in [19]. The input and output layers are also separated to two channels corresponding to the real and imaginary components of the input,

, and the output probabilities,

. The convolutional layers are both one dimensional (1D) as in [19]

, and with a residual connection as in


. The non-linear activation function of the first layer is a SoftSign function defined by

, which, in our experiments, proved to converge faster than other functions such as LeakyReLU and tanh. The non-linear activation function of the second layer is a sigmoid function, that ensures that the outputs are in

, and so they represent valid probability values. Our decoder neural network is depicted in Fig. 2.

Fig. 2: Our equalizer’s architecture using simple fully convolutional ResNet block. Each convolution output is listed as


We now derive an explicit expression for the loss that needs to be minimized with respect to both and (alternatively, needs to be maximized).


For the term we have


where , the entropy of , is given by


For the term we have


We now compute the term

analytically. This is possible due to the special structure of the problem, since the generator model is linear. This analytic computation cannot be implemented for VAEs in general. Instead, when the random variable

in the model is continuous (e.g., a Gaussian random variable), the reparameterization trick is used [15]. For discrete (as in our problem), the reparametrization trick cannot be applied. Recently, approximations for discrete have been proposed in [21]. First, by the definition of we have,


where denotes the complex conjugate. Now,


Hence, for the case where we have


We also have


Using (16), (19) and (20) in (15), it is straight-forward to obtain an explicit expression for . However, in order to compute the third term in the summation over efficiently, we use the fact that


which follows from (17) and (20). It is now straightforward to use (16), (19), (20) and (23) in (15), and obtain


where and , the second and third terms in the summation over in (15), are given by




Now, we need to minimize with respect to and . We start with the minimization with respect to . Now, is independent of , and depends on as described in (13). It is easy to see (by setting the derivative of with respect to to zero), that the optimal value of is given by . Hence, up to an additive constant (which does not influence the gradients of the learned parameters

), the loss function

(using the optimal ) is given by


where is given in (7)-(10), and is given in (24), (26) and (30).

We assumed that the input signal is causal. In reality, we are considering a block of measurements of the signal at some arbitrary time. Therefore the above causality assumption does not hold. However, the edge effect decays as increases. The causality assumption is equivalent to

zero-padding of

on the left, such that the convolution with according to (2) results in of size

. Alternatively (supposing odd

for simplicity), we assume that . Accordingly, we apply zero-padding of by both on the left and on the right. After the convolution in (2), is once again of size . We used this second approach in our experiments, although the performance was similar to the performance of the first approach.

In all our experiments we used a mini-batch operation mode, where for each gradient descent parameters update step, we considered the gradient of the loss (31), using only a sub-sequence of the training data, (each update with a different sub-sequence).

Note that our loss function, , consists of a data entropy term, which we wish to maximize due to the i.i.d assumption of the symbols, and an autoencoder distortion term. Also note that our method also provides an estimated channel response, as part of the learning process.

Simulation Results

We implemented our blind equalizer using the Tensorflow framework

[22] which provides automatic differentiation of the loss function. Our algorithm was compared with the adaptive CMA [23], and with the neural network CMA (NNCMA) [11] blind equalization algorithms. We also compared to the linear neural network in [24], but for clarity we did not include these results in the graphs since the blind NNCMA outperformed the linear neural network. In addition, we compared the performance to the adaptive MMSE [17] non-blind equalizer that observes the actual transmitted sequence. The baseline algorithms use a single pass over the data for training. In order to improve performance, they were modified to have sufficiently many passes over the data. In all our experiments, we used the Adam algorithm [25] to minimize our loss function. The complexity of Adam is similar to that of plain gradient descent. Note, for all experiments and all blind equalization methods, that one can recover the transmitted bits only up to some unknown delay and rotation of the constellation, which for QPSK means that we need to examine four different possible rotations (. The results presented in the following experiments were obtained by averaging over 20 independent training data. For each training data, we used 10,000 test data symbols to calculate the symbol error rate (SER) defined by

where , is a single transmitted QPSK test symbol, is the corresponding estimated symbol, and is the indicator function.

In all our experiments, we used the same FCN decoder architecture in Fig. 2, with a filter with five complex coefficients in the first layer, and a filter with two complex coefficients in the second layer. Hence, the total number of free parameters in the model was the size of the estimated channel impulse response in the encoder in addition to only 14 () real parameters in the FCN decoder.

In our first set of experiments, we compared our model to the baseline algorithms at various noise levels, using the following non-minimum phase channels taken from [26, 24, 11],

We generated 2000 QPSK random symbols as the training set. Then we applied convolution with the channel impulse response, and added white Gaussian noise at a signal to noise ratio (SNR) in the range dB – dB. The SNR is defined by . To train the model, for each update step, we sampled from the training set a mini-batch of a single sub-sequence of length . Figs. 3, 4 and 5 present SER results for , and , respectively.

Fig. 3: SER vs. SNR for the equalization algorithms. The channel is .
Fig. 4: SER vs. SNR for the equalization algorithms. The channel is .
Fig. 5: SER vs. SNR for the equalization algorithms. The channel is .

As can be seen, the new VAEBCE significantly outperforms the baseline blind equalizers, and is quite close to the performance of the non-blind adaptive MMSE.

In our following experiment, we compared the SER of the equalization algorithms as the number of training symbols varied from to . For each update step we sampled from the training set a mini-batch of a single sub-sequence of length . We used the channel impulse response above. Fig. 6 presents the results for SNR=10dB.

Fig. 6: SER vs. number of training symbols for the equalization algorithms. dB. The channel used is .

Again, the new VAEBCE algorithm significantly outperforms the baseline blind equalization algorithms.

Recall that, in accordance with our loss function, as part of the model training we also learn an estimated channel impulse response. We now assess the robustness to the length of the estimated channel impulse response. Denote by and , the actual and estimated channel impulse responses, respectively. First suppose that the length of is set equal to the length of . In general (up to delay and rotation, as was noted above), when the SER after equalization was small, we observed a small distance, . This distance was monotonically decreasing with the SNR. When the length of was smaller than the length of , the model appeared to learn such that it was nearly equal to the central part of . When the length of was larger than the length of , the model appeared to learn an approximately zero-padded (both on the left and on the right) version of . In Fig. 7 we reevaluated the SER results when the length of was twice the length of , and did not observe a significant degradation.

Fig. 7: SER vs. SNR for different lengths of for the VAEBCE. The channel used is .

Finally, we report on the number of parameter updates required for convergence of our VAEBCE algorithm. We generated the data as described in the first experiment. To train the model, we sampled a mini-batch of a single sub-sequence of length out of the given training symbols. Then we let the algorithm train until convergence was achieved. The number of iterations for the channel is reported in Fig. 8. As either or the SNR increase, the number of required iterations decreases.

Fig. 8: Number of parameter updates vs. SNR for different . The channel used is .

For the channel , the number of iterations was similar. Note that using the ML equalization algorithms in [12, 13, 14], the Viterbi or forward-backward algorithms would require a trellis diagram with states, each time step. Hence, our method provides an attractive alternative.


We introduced a novel algorithm for blind channel equalization using VAE (VAEBCE). We showed significantly improved SER performance compared to the baseline CMA-based blind channel equalization algorithms. In particular, VAEBCE required significantly less training symbols for the same SER measure. In fact, the performance of the new VAEBCE equalizer was close to the performance of the supervised linear adaptive MMSE equalizer. Our equalizer is a simple FCN. This should be contrasted with alternative ML blind equalization methods, that require a trellis-based equalizer which may be much more costly to implement. Future research should extend our method to generalized setups such as higher constellations (by replacing the output sigmoid in our FCN with a softmax).


This research was supported by the Israel Science Foundation, grant no. 1082/13. We would like to thank Sarvraj Singh Ranhotra for sharing with us the simulations code in [26].