I Introduction
Following the amazing success of deep learning methods in various tasks, these techniques have recently been considered in some communication problems. For example, in [1, 2, 3, 4] deep learning methods were considered to the problem of channel decoding, in [5] the authors proposed an autoencoder as a communication system for short block codes, and in [6] deep learningbased detection algorithms were used when the channel model is unknown.
Our work considers transmission over a noisy intersymbol interference (ISI) channel with an unknown impulse response. Equalization methods for ISI channels using neural networks have been dealt with extensively in the literature [7]. In this paper we consider the case where the input sequence is also unknown, and blind channel equalization is required. Following the blind equalization step, one can apply decision directed equalization, using the blind equalization estimation as an initial value. Blind channel equalization is a special type of blind deconvolution where the input is constrained to lie in some known discrete constellation with known statistics. The standard approach to tackle this problem is the constant modulus algorithm (CMA) [8, 9],[10]. Blind neural networkbased algorithms using the constant modulus (CM) criterion have also been proposed in the literature [11].
In this work we propose a new approach to blind channel equalization using the maximum likelihood (ML) criterion. The ML criterion has already been used for blind channel equalization [12, 13, 14]
(and references therein). However, the proposed solutions use the expectation maximization (EM) algorithm or an approximated EM, that require an iterative application of the forwardbackward or Viterbi algorithms. The complexities of these algorithms are exponential in the channel memory size, which may be prohibitive. As an alternative, in this paper we propose an approximated ML estimate using the variational autoencoder (VAE) method
[15, 16]. VAEs are widely used in the literature of deep learning for unsupervised and semisupervised learning, and as a generative model to a given observations data. We demonstrate significant and consistent improvements in the quality of the detected symbols compared to the baseline blind equalization algorithms. In fact, for the channels that were examined, the performance of the new VAE blind channel equalizer (VAEBCE) was close to the performance of a nonblind adaptive linear minimum mean square error (MMSE) equalizer
[17]. The new equalization method enables lower latency acquisition of an unknown channel response. Although the computational complexity of the new VAEBCE is higher compared to CMA, it is still reasonable, and the number of free parameters to estimate is small.Ii Problem setup
The communication channel is modeled as a convolution of the input, , with some causal, finite impulse response (FIR), time invariant filter, , of size , followed by the addition of white Gaussian noise
(1) 
This is the equivalent model of in the end to end communication system shown in Fig. 1, where the sampling is performed at the symbol rate.
The equalizer in Fig. 1 reconstructs an estimate of the transmitted symbol sequence, . Now, suppose that we observe a finite window of measurements data . For clarity of presentation we assume that the input signal is causal ( for ). We refer to this assumption later. Equation (1) can be written compactly for the measurements collected in as
(2) 
where is the transmitted message, and is an i.i.d. sequence of additive white Gaussian noise. Throughout the paper we assume QPSK modulation, although the derivation can be extended to other constellations. Hence,
, and the above vectors can be written as combinations of real (
) and imaginary () components, so that, , and . Each element of the noise sequence,, is complex Gaussian with statistically independent real and complex components, and with variance
. Given , andare statistically independent, normally distributed. The conditional density function of
is . The conditional density function of is . Thus, for , the conditional density of given can be expressed as(3) 
Iii Proposed model
We propose using ML estimation of the channel impulse response, . That is, we search for the vector and channel noise variance, , that maximize ^{1}^{1}1The default base of the logarithms in this paper is .. The ML estimate has strong asymptotic optimality properties, and in particular asymptotic efficiency [13]. For the CMA criterion, on the other hand, one can only claim asymptotic consistency [18]. However, applying the accurate ML criterion to our problem is very difficult since should first be expressed as a multidimensional integral
where we integrate over all possible input sequences and where
(4) 
since we assume a uniformly distributed transmitted sequence. However, for this kind of problem, it has been shown in various applications that it is possible to dramatically simplify the estimation problem by using the variational approach for ML estimation
[15, 16]. By the VAE approach, instead of directly maximizing over , one maximizes a lower bound as follows. It can be shown [15] that logp_θ (y) ≥E_q_Φ(x — y) [logq_Φ(x — y)+logp_θ (x,y)] = ⏟D_KL [q_Φ(x — y) —— p(x)]_A + ⏟E_q_Φ(x — y) [logp_θ(y — x)]_B =Δ L(θ,Φ, y) where denotes the Kullback Leibler distance between two density functions, and is an arbitrary parametrized (by ) conditional density function. Now, instead of directly maximizing , one maximizes the lower bound over and . In fact, it can be shown [15] that by searching over and all possible conditional densities , one obtains the ML estimate of . Typically, both and are implemented by neural networks. In our problem, is given in (4), and the encoder, , is given in (3). We use the following model for the decoder, .Recalling that and
, this is a multivariate Bernoulli distribution with statistical independence between components.
In our implementation of the decoder, which acts as the equalizer, we used a fully convolutional network (FCN) architecture with two convolutional layers, each with two output channels, corresponding to the real and imaginary parts of the convolution as in [19]. The input and output layers are also separated to two channels corresponding to the real and imaginary components of the input,
, and the output probabilities,
. The convolutional layers are both one dimensional (1D) as in [19], and with a residual connection as in
[20]. The nonlinear activation function of the first layer is a SoftSign function defined by
, which, in our experiments, proved to converge faster than other functions such as LeakyReLU and tanh. The nonlinear activation function of the second layer is a sigmoid function, that ensures that the outputs are in
, and so they represent valid probability values. Our decoder neural network is depicted in Fig. 2.We now derive an explicit expression for the loss that needs to be minimized with respect to both and (alternatively, needs to be maximized).
Denote
For the term we have
(5)  
(6)  
(7) 
where , the entropy of , is given by
(8)  
(9)  
(10) 
For the term we have
(11)  
(12)  
(13) 
We now compute the term
analytically. This is possible due to the special structure of the problem, since the generator model is linear. This analytic computation cannot be implemented for VAEs in general. Instead, when the random variable
in the model is continuous (e.g., a Gaussian random variable), the reparameterization trick is used [15]. For discrete (as in our problem), the reparametrization trick cannot be applied. Recently, approximations for discrete have been proposed in [21]. First, by the definition of we have,(14)  
(15) 
where denotes the complex conjugate. Now,
(16) 
Hence, for the case where we have
(17)  
(18)  
(19) 
We also have
(20) 
Using (16), (19) and (20) in (15), it is straightforward to obtain an explicit expression for . However, in order to compute the third term in the summation over efficiently, we use the fact that
(21)  
(22)  
(23) 
which follows from (17) and (20). It is now straightforward to use (16), (19), (20) and (23) in (15), and obtain
(24) 
where and , the second and third terms in the summation over in (15), are given by
(25)  
(26) 
and
(27)  
(28)  
(29)  
(30) 
Now, we need to minimize with respect to and . We start with the minimization with respect to . Now, is independent of , and depends on as described in (13). It is easy to see (by setting the derivative of with respect to to zero), that the optimal value of is given by . Hence, up to an additive constant (which does not influence the gradients of the learned parameters
), the loss function
(using the optimal ) is given by(31) 
where is given in (7)(10), and is given in (24), (26) and (30).
We assumed that the input signal is causal. In reality, we are considering a block of measurements of the signal at some arbitrary time. Therefore the above causality assumption does not hold. However, the edge effect decays as increases. The causality assumption is equivalent to
zeropadding of
on the left, such that the convolution with according to (2) results in of size. Alternatively (supposing odd
for simplicity), we assume that . Accordingly, we apply zeropadding of by both on the left and on the right. After the convolution in (2), is once again of size . We used this second approach in our experiments, although the performance was similar to the performance of the first approach.In all our experiments we used a minibatch operation mode, where for each gradient descent parameters update step, we considered the gradient of the loss (31), using only a subsequence of the training data, (each update with a different subsequence).
Note that our loss function, , consists of a data entropy term, which we wish to maximize due to the i.i.d assumption of the symbols, and an autoencoder distortion term. Also note that our method also provides an estimated channel response, as part of the learning process.
Simulation Results
We implemented our blind equalizer using the Tensorflow framework
[22] which provides automatic differentiation of the loss function. Our algorithm was compared with the adaptive CMA [23], and with the neural network CMA (NNCMA) [11] blind equalization algorithms. We also compared to the linear neural network in [24], but for clarity we did not include these results in the graphs since the blind NNCMA outperformed the linear neural network. In addition, we compared the performance to the adaptive MMSE [17] nonblind equalizer that observes the actual transmitted sequence. The baseline algorithms use a single pass over the data for training. In order to improve performance, they were modified to have sufficiently many passes over the data. In all our experiments, we used the Adam algorithm [25] to minimize our loss function. The complexity of Adam is similar to that of plain gradient descent. Note, for all experiments and all blind equalization methods, that one can recover the transmitted bits only up to some unknown delay and rotation of the constellation, which for QPSK means that we need to examine four different possible rotations (. The results presented in the following experiments were obtained by averaging over 20 independent training data. For each training data, we used 10,000 test data symbols to calculate the symbol error rate (SER) defined bywhere , is a single transmitted QPSK test symbol, is the corresponding estimated symbol, and is the indicator function.
In all our experiments, we used the same FCN decoder architecture in Fig. 2, with a filter with five complex coefficients in the first layer, and a filter with two complex coefficients in the second layer. Hence, the total number of free parameters in the model was the size of the estimated channel impulse response in the encoder in addition to only 14 () real parameters in the FCN decoder.
In our first set of experiments, we compared our model to the baseline algorithms at various noise levels, using the following nonminimum phase channels taken from [26, 24, 11],
We generated 2000 QPSK random symbols as the training set. Then we applied convolution with the channel impulse response, and added white Gaussian noise at a signal to noise ratio (SNR) in the range dB – dB. The SNR is defined by . To train the model, for each update step, we sampled from the training set a minibatch of a single subsequence of length . Figs. 3, 4 and 5 present SER results for , and , respectively.
As can be seen, the new VAEBCE significantly outperforms the baseline blind equalizers, and is quite close to the performance of the nonblind adaptive MMSE.
In our following experiment, we compared the SER of the equalization algorithms as the number of training symbols varied from to . For each update step we sampled from the training set a minibatch of a single subsequence of length . We used the channel impulse response above. Fig. 6 presents the results for SNR=10dB.
Again, the new VAEBCE algorithm significantly outperforms the baseline blind equalization algorithms.
Recall that, in accordance with our loss function, as part of the model training we also learn an estimated channel impulse response. We now assess the robustness to the length of the estimated channel impulse response. Denote by and , the actual and estimated channel impulse responses, respectively. First suppose that the length of is set equal to the length of . In general (up to delay and rotation, as was noted above), when the SER after equalization was small, we observed a small distance, . This distance was monotonically decreasing with the SNR. When the length of was smaller than the length of , the model appeared to learn such that it was nearly equal to the central part of . When the length of was larger than the length of , the model appeared to learn an approximately zeropadded (both on the left and on the right) version of . In Fig. 7 we reevaluated the SER results when the length of was twice the length of , and did not observe a significant degradation.
Finally, we report on the number of parameter updates required for convergence of our VAEBCE algorithm. We generated the data as described in the first experiment. To train the model, we sampled a minibatch of a single subsequence of length out of the given training symbols. Then we let the algorithm train until convergence was achieved. The number of iterations for the channel is reported in Fig. 8. As either or the SNR increase, the number of required iterations decreases.
Conclusion
We introduced a novel algorithm for blind channel equalization using VAE (VAEBCE). We showed significantly improved SER performance compared to the baseline CMAbased blind channel equalization algorithms. In particular, VAEBCE required significantly less training symbols for the same SER measure. In fact, the performance of the new VAEBCE equalizer was close to the performance of the supervised linear adaptive MMSE equalizer. Our equalizer is a simple FCN. This should be contrasted with alternative ML blind equalization methods, that require a trellisbased equalizer which may be much more costly to implement. Future research should extend our method to generalized setups such as higher constellations (by replacing the output sigmoid in our FCN with a softmax).
Acknowledgment
This research was supported by the Israel Science Foundation, grant no. 1082/13. We would like to thank Sarvraj Singh Ranhotra for sharing with us the simulations code in [26].
References
 [1] E. Nachmani, Y. Be’ery, and D. Burshtein, “Learning to decode linear codes using deep learning,” in 54’th Annual Allerton Conf. On Communication, Control and Computing, September 2016, pp. 341–346, arXiv preprint arXiv:1607.04793.
 [2] E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, and Y. Be’ery, “Deep learning methods for improved decoding of linear codes,” IEEE J. Selected Topics in Signal Proc., Special Issue on Machine Learning for Cognition in Radio Communication and Radar, 2017, accepted for publication, arXiv preprint arXiv:1706.07043.
 [3] T. Gruber, S. Cammerer, J. Hoydis, and S. ten Brink, “On deep learningbased channel decoding,” Conference on Information Sciences and Systems, 2017.
 [4] S. Cammerer, T. Gruber, J. Hoydis, and S. ten Brink, “Scaling deep learningbased decoding of polar codes via partitioning,” arXiv preprint arXiv:1702.06901, 2017.
 [5] T. J. O’Shea and J. Hoydis, “An introduction to machine learning communications systems,” arXiv preprint arXiv:1702.00832, 2017.
 [6] N. Farsad and A. Goldsmith, “Detection algorithms for communication systems using deep learning,” arXiv preprint arXiv:1705.08044, 2017.
 [7] K. Burse, R. N. Yadav, and S. C. Shrivastava, “Channel equalization using neural networks: A review,” IEEE Transactions on Systems, Man, and Cybernetics – Part C: Applications and Reviews, vol. 40, no. 3, pp. 352–357, 2010.
 [8] D. Godard, “Selfrecovering equalization and carrier tracking in twodimensional data communication systems,” IEEE Transactions on Communication, vol. 28, no. 11, pp. 1867–1875, 1980.
 [9] J. Treichler and B. Agee, “A new approach to multipath correction of constant modulus signals,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. 31, no. 2, pp. 459–472, 1983.
 [10] R. Johnson, P. Schniter, T. J. Endres, J. D. Behm, D. R. Brown, and R. A. Casas, “Blind equalization using the constant modulus criterion: A review,” Proceedings of the IEEE, vol. 86, no. 10, pp. 1927–1950, 1998.
 [11] C. You and D. Hong, “Nonlinear blind equalization schemes using complexvalued multilayer feedforward neural networks,” IEEE Transactions on Neural Networks, vol. 9, no. 6, pp. 1442–1455, 1998.
 [12] M. Ghosh and C. L. Weber, “Maximumlikelihood blind equalization,” Optical Engineering, vol. 31, no. 6, pp. 1224–1229, 1992.
 [13] L. Tong and S. Perreau, “Multichannel blind identification: From subspace to maximum likelihood methods,” Proceedings of the IEEE, vol. 86, no. 10, pp. 1951–1968, 1998.
 [14] H. A. Cirpan and M. K. Tsatsanis, “Maximum likelihood blind channel estimation in the presence of doppler shifts,” IEEE Transactions on Signal Processing, vol. 47, no. 6, pp. 1559–1569, 1999.
 [15] D. P. Kingma and M. Welling, “Autoencoding variational Bayes,” International Conference on Learning Representations, 2014.

[16]
D. J. Rezende, S. Mohamed, and D. Wierstra, “Stochastic backpropagation and approximate inference in deep generative models,”
International Conference on Machine Learning
, 2014.  [17] Y. Gong, X. Hong, and K. F. AbuSalim, “Adaptive MMSE equalizer with optimum taplength and decision delay,” 2010.
 [18] O. Shalvi and E. Weinstein, “New criteria for blind deconvolution of nonminimum phase systems (channels),” IEEE Transactions on information theory, vol. 36, no. 2, pp. 312–321, 1990.

[19]
T. J. O’Shea, L. Pemula, D. Batra, and T. C. Clancy, “Radio transformer networks: Attention models for learning to synchronize in wireless systems,” in
Signals, Systems and Computers, 2016 50th Asilomar Conference on. IEEE, 2016, pp. 662–666. 
[20]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
, 2016, pp. 770–778. 
[21]
C. J. Maddison, A. Mnih, and Y. W. Teh, “The concrete distribution: A continuous relaxation of discrete random variables,”
International Conference on Learning Representations, 2017.  [22] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin et al., “Tensorflow: Largescale machine learning on heterogeneous distributed systems,” arXiv preprint arXiv:1603.04467, 2016.
 [23] S. Abrar and A. K. Nandi, “An adaptive constant modulus blind equalization algorithm and its stochastic stability analysis,” IEEE Signal Processing Letters, vol. 17, no. 1, pp. 55–58, 2010.
 [24] Y. Fang and T. W. S. Chow, “Blind equalization of a noisy channel by linear neural network,” IEEE transactions on neural networks, vol. 10, no. 4, pp. 918–924, 1999.
 [25] D. Kingma and J. Ba, “Adam: A method for stochastic optimization,” International Conference on Learning Representations, 2015.
 [26] S. S. Ranhotra, A. Kumar, M. Magarini, and A. Mishra, “Performance comparison of blind and nonblind channel equalizers using artificial neural networks,” in 2017 9’th International Conference on Ubiquitous and Future Networks (ICUFN). IEEE, 2017, pp. 243–248.
Comments
There are no comments yet.