I-a Background and Motivations
Detection of modulated signals based on noisy channel observations in the presence of interference is one of the most basic building blocks in communication systems. It has been a long-standing challenge to design a high-accuracy and low-complexity signal detection method that performs well for general communication systems. Intensive research endeavors have been focused on exploiting special structures of communication systems to design efficient signal detection methods. For example, the channel sparsity in massive multiple-input multiple-output (MIMO) systems [zhang2017blind] and cloud radio access networks [fan2017scalable] has been utilized to design message-passing-based detection algorithms with low complexity. Liu et al. proposed a discrete first-order detection method for large-scale MIMO detection with provable guarantees based on the independent and identically (i.i.d.) distributed channel coefficients [liu2017discrete]. In this paper, we focus on banded linear systems, in which the channel matrices are banded matrices. The banded structure of a system can be caused by, e.g., inter-carrier interference (ICI) and inter-symbol interference (ISI). For instance, in frequency selective channels, ISI arises between adjacent received symbols, yielding a banded channel matrix [leus2011estimation]. Similarly, 2-D magnetic recording (TDMR) systems typically suffer from 2-D banded ISI caused by a combination of down-track ISI and intertrack interference at the read head [carosino2015iterative]. In doubly selective channels, orthogonal frequency division multiplexing (OFDM) systems may experience significant ICI from adjacent subcarriers, which implies that the channel matrix in the frequency domain can be approximated as a banded matrix [liu2015banded]
. Traditional detectors ignoring the banded structure will lead to inferior performance. For example, detectors designed for interference-free systems will cause a low estimation accuracy. Meanwhile, detectors designed for general interference systems usually have very high computational complexity.
The banded structure of the channel has been extensively studied to reduce the complexity in signal detection. For example, the well-known Bahl-Cocke-Jelinek-Raviv (BCJR) algorithm [cocke1974optimal] can be employed in a banded system to achieve the optimal maximum a posterioriprobability (MAP) detection. Nevertheless, this approach is disadvantageous in communication systems with large signal dimensions due to its intrinsic serial algorithm and exponential complexity in band width. In [rugini2005simple], Rugini et al. proposed to reduce the complexity of the linear maximum mean square error (LMMSE) detector through LDL factorization. However, there exists a considerable performance gap between the linear detector and the MAP detector. Iterative algorithms including iterative MMSE [schniter2004low] and belief propagation [ochandiano2011iterative] have been proposed as near-optimal solutions. These iterative algorithms typically require a large number of iterations to obtain an estimate with high accuracy. Moreover, it is difficult to efficiently implement the iterative algorithms in parallel, which significantly limits the computational efficiency. In a nutshell, there is a fundamental tradeoff between computational complexity and detection accuracy in signal detection problems. It is highly desirable to design a detection algorithm that achieves both high accuracy and low complexity for banded linear systems, which is the focus of this paper.
Motivated by the recent advances in deep learning [lecun2015deep]
, we aim to design high-accuracy low-complexity signal detectors based on deep neural networks (DNNs). Instead of using a general DNN, we propose to design the detector based on a convolutional neural network (CNN) that consists of only convolutional layers. The reasons for CNN-based signal detection are explained as follows. First, it is well known that DNNs with fully connected layers suffer from the curse of dimensionality, i.e., the number of tunable parameters significantly grows as the system size increases. In a CNN, all neurons in a layer share the same set of tunable parameters, which addresses the curse of dimensionality. Secondly, a DNN with fully-connected layers has to be retrained once the system size changes. In contrast, when the tunable parameters are well-trained, a CNN can be applied to systems with different sizes without the need of retraining.
Despite the advantages of being scalable and robust to the system size, it is nontrivial to employ CNN for signal detection. The success of CNN is based on the assumption that if one set of parameters is useful to extract a feature at a certain spatial position, then the same set of parameters is also useful to extract the feature at other positions. Such shift-invariance assumption, although holds in many computer vision problems, does not hold in a signal detection problem. To address this challenge, we propose a novel CNN-based detection architecture consisting of three modules: an input preprocessing module, a CNN module, and an output postprocessing module. The input preprocessing module reorganizes the input (i.e., the channel matrix and the received signals in this paper) based on the banded structure to obtain the shift-invariance property. Then, the shift-invariant input is fed into the CNN, the output of which is processed through the output postprocessing module to give an estimate of the transmitted signals. To the best of our knowledge, our work is the first attempt to design a CNN-based detector for banded linear systems.
We conduct extensive numerical experiments to show that the CNN-based detector performs much better than existing detectors with comparable complexity. Moreover, the proposed CNN demonstrates outstanding robustness for different system sizes. It achieves a high accuracy even if there is a mismatch between the system sizes in the training set and the testing set. In addition, we extend the proposed CNN-based detector to near-banded channels, such as 1-D near-banded channels in doubly selective OFDM systems and 2-D near-banded channels in TDMR systems with 2-D ISI. Specifically, we propose a cyclic CNN (CCNN) for 1-D near-banded channels, and propose a 2-D CNN-based detector for 2-D near-banded channels. Through simulations, we show that the proposed detector still performs well in these systems, where the channel matrix is not in a strictly banded structure.
In summary, the benefit of the proposed CNN-based detector is at least fourfold.
The proposed CNN approach relieves the burden to establish a sophistical mathematical model for the communication system, since it provides a universal detector that automatically adapts to any channel and noise distributions.
The CNN-based detector achieves much better error performance than the other detectors with comparable computational complexity, and is ideally constructed for parallel computing.
Thanks to the parameter-sharing property, the proposed CNN is robust to mismatched system sizes in the training set and the testing set.
The CNN-based detector can be readily extended to systems without a strictly banded structure. As such, the proposed CNN approach sheds lights on how to design CNN-based algorithms for other problems in communication systems with a near banded structure.
I-C Related Work
Recently, there have been two threads of research on the application of deep learning for signal detection in communication systems. The first thread is to design deep learning based detectors by unfolding existing iterative detection algorithms. That is, the iterations of the original algorithm are unfolded into a DNN with each iteration being mimicked by a layer of the neural network. Instead of predetermined by the communication model (i.e., the channel matrix, the modulation scheme, the distribution of noise, etc.), the updating rule at each layer is controlled by some tunable parameters, which are learned based on the training data. For example, [gregor2010learning] unfolded two well-known algorithms, namely iterative shrinkage and thresholding algorithm (ISTA) [beck2009fast] and approximate message passing (AMP) [rangan2011generalized], for a fixed channel matrix. It is shown that the proposed neural networks significantly outperform the original algorithms in both computational time and accuracy [gregor2010learning]. The second thread is to treat the transmission procedure as a black box, and utilize conventional DNNs for signal detection. [ye2018power] showed that a fully connected neural network is able to detect signals for various channel realizations. Specifically, [ye2018power] utilized deep learning to realize joint channel estimation and signal detection in OFDM systems, where the channel matrix is diagonal. It is demonstrated that the deep learning approach achieves a higher detection accuracy than existing model-based detection approaches with comparable complexity. In [farsad2018neural], Farsad et al.
presented a recurrent neural network (RNN) for detection of data sequences in a Poisson channel model, which is applicable to both optical and chemical communication systems. The proposed RNN can achieve a performance close to the Viterbi detector with perfect CSI.
Besides signal detection, deep learning has demonstrated its potential in other areas of communication systems. Nachmani et al. studied the problem of channel decoding through unfolding traditional belief propagation (BP) decoders [nachmani2016learning]. Most recently, Liang et al. proposed an iterative belief propagation-CNN architecture for channel decoding under a certain noise correlated model [liang2018iterative]. A standard BP decoder is used to estimate the coded bits, followed by a CNN to remove the estimation errors of the BP decoder, and obtain a more accurate estimation. In [dorner2017deep], Dorner et al. presented an end-to-end communication system to demonstrate the feasibility of over-the-air communication with deep neural networks. As shown in [dorner2017deep], the performance is comparable with traditional model-based communication systems.
The rest of the paper is organized as follows. In Section II, we present the system model as well as its extensions to near-banded systems, and discuss the challenges of utilizing traditional DNNs to detect signals. In Section III, we propose the CNN-based detector based on the banded structure of the channel matrix, and illustrate the robustness of the proposed detector. In Section IV, we extend the proposed detector to near-banded systems. In Section V, the performance of the proposed deep learning approach is evaluated in different channel models, and is compared with existing algorithms. In Section V, we also show the performance of the proposed CNN in practical OFDM systems and TDMR systems. Conclusions and future work are presented in Section VI.
Ii System Model
Ii-a Linear Banded Systems
In this paper, we consider a linear channel model with the received signal written as
where is the channel matrix,
is the vector of transmitted signals111For simplicity, we use BPSK as the modulation method, but the proposed deep learning approach can be readily extended to systems with other modulation methods., and is the noise vector. Furthermore, we assume that the channel matrix is a banded matrix with bandwidth . That is,
where and are the th entries of and , respectively.222In (3), we assume for and . We assume perfect channel state information at the receiver, i.e., the channel matrix is exactly known by the receiver.
The banded system in (3) may be idealized in practical scenarios. We next introduce two near-banded systems with the channel matrices obtained from real applications. We will show that the CNN-based detector can be readily modified to handle the near-banded systems.
Ii-B Near-Banded Systems
Ii-B1 1-D Near-Banded systems
In certain systems, such as systems with a doubly selective OFDM channel, in addition to the non-zero entries on the diagonal band, the channel matrix has non-zero entries in the bottom-left corner and the top-right corner due to the non-negligible ICI [schniter2004low]. The structure of the channel matrix is shown in Figure 2, where the entries of the channel matrix satisfy
Ii-B2 2-D Near-Banded systems
A TDMR system usually suffers from 2-D banded ISI modeled by convolving the data with a 2-D spatial impulse response [wu2003iterative]. The output of the channel is a matrix with the -th element given by
where is the noise, is a 2-D read head impulse response, and is the number of elements over which the ISI extends in each dimension. As shown in Figure 3, the TDMR ISI system is actually a 2-D extension of the banded linear system. That is, each received signal in a TDMR system is a linear combination of the neighbouring transmitted signals in the 2-D space.
Signal detection in a near-banded system is usually more challenging due to the more complicated structure of the interference. As shown in Section IV, the proposed CNN-based detector can be readily extended to these near-banded systems, and hence is more flexible than traditional model-based detectors.
Ii-C Architecture of a DNN-Based Detector
In this subsection, we briefly introduce the architecture of a DNN-based detector. As shown in Figure 4, the DNN-based detector treats both the channel matrix and the received signal as input and outputs a vector of estimated symbols . This implies that once well-trained, the proposed DNN-based detector can adapt to various channel realizations. Moreover, unlike most existing detection approaches based on the probability model of the system in (1
), the DNN based approach does not rely on the probability distributions of the channel coefficients and the noise. Instead, the proposed neural networks are able to learn the model information from the training data.
Typically, a DNN may consist of fully-connected layers, densely-connected layers, convolutional layers, or their mixture. Due to the huge amount of connections between neurons, a DNN with fully-connected or densely-connected layers suffers from the curse of dimensionality, and does not scale well to large systems. More specifically, the number of weights and biases associated with each fully-connected or densely-connected neuron grows linearly with the size of the input. This means that the total number of tunable parameters increases quadratically with the size of input, which renders it difficult to train a DNN for a large system. Moreover, a DNN has to be retrained once the system size changes, because the number of tunable parameters varies with the system size. Noticeably, the DNN training is a time-consuming task, as it usually involves a large amount of data and requires high computational complexity. To deal with these challenges, we propose to detect signals through a DNN that consists of only convolutional layers (or called CNN). In a CNN, all neurons in a layer share the same set of tunable parameters, implying that the number of tunable parameters does not scale with the system size. Nonetheless, to achieve good performance with a CNN, the input is required to have shift-invariant properties, and the convolutional filter is required to be carefully designed. In the next section, we introduce the proposed CNN-based detector for strictly banded linear systems. The extension to near-banded systems will be discussed in Section IV.
Iii CNN-Based Detector
In this section, we first describe the design details of the CNN-based detector. Then, we demonstrate the robustness of the proposed detector in the sense of adapting to various system sizes.
To address the challenges discussed in Section II, we propose to use a CNN consisting of only convolutional layers for signal detection in a banded linear system. In a convolutional layer, each neuron is only connected to a small portion of neurons in the previous layers, and all neurons in a layer share the same set of parameters (i.e., weights and biases). This significantly reduces the total number of parameters in learning. CNN is a very efficient class of DNNs for solving problems with a large-sized input, such as image/video recognition [he2016deep]kim2014convolutional], speech recognition [abdel2014convolutional], etc. Nonetheless, the success of CNN is based on the shift-invariance assumption. That is, if one set of parameters is useful to extract a feature at a certain spatial and temporal position, then it is also useful to extract the feature at other positions. Such assumption generally holds in image, video, and audio inputs. However, it does not necessarily hold in the signal detection problem over the channel in (1). For example, directly shifting the channel matrix will significantly change the transmission model and thereby change the detection result. Hence, in the proposed CNN-based detector, the input as well as the tunable convolutional filter needs to be appropriately organized before fed into the CNN. As illustrated in Figure 5
, we propose a CNN-based detector that consists of three modules: an input preprossing module, an CNN module, and an output postprocessing module. The input preprocessing module is used to reorganize the input to ensure the shift-invariance property. The CNN module is a CNN to extract the features from the shift-invariant input. The output postprocessing module is applied to obtain an estimate of the transmitted signals based on the features extracted by the CNN. In the following subsections, we will discuss the detailed design of the three modules.
Iii-a Input Preprocessing
In the input preprocessing module, we use an input reshaping approach to ensure the shift-invariance property of the input and .333The realization of the input preprocessing module to achieve shift-invariance is not unique. The input reshaping approach proposed in this paper is just an example.
As illustrated in Figure 6(a), we reshape the channel coefficients and the received signals into a vector . Recall that the channel matrix is a banded matrix, with the non-zero entries confined to a diagonal band. Hence, we only need to store the non-zero entries on the band into the vector . Specifically, the non-zero channel coefficients and the received signal corresponding to receiving position are stored in a vector with the entries given by
where and represent the real and imaginary parts of the complex input, respectively. Then, vector is fed as an input into the subsequent CNN module. With the above preprocessing, the input vector has a certain shift-invariance property. For example, if we shift the input vector by (i.e., the length of a subvector ), we only need to shift the output vector by to obtain the same input-output relationship. With the preprocessing, a CNN can be employed to extract features of the input.
As shown in Figure 6(b), the CNN module consists of multiple convolutional hidden layers and one convolutional output layer. The input is and the output is the symbols , where
where is the input, and is the output of the activation function. To map the output to interval
, we choose the sigmoid function as the activation function for the output layer:
where and are the learnable weight and bias of the first layer, and with for or . As such, each filter takes subvectors as the input. This setting is based on the observation that each subvector is strongly correlated with neighbouring subvectors due to the banded structure of channel . Hence, we propose to extract features from every consecutive subvectors. Similarly, in the th layer (), the filter is performed over subvectors with stride size and filter size . To summarize, the structure of a CNN is determined by its number of layers and the filter depth in each layer. These parameters need to be decided before training the network. As shown in simulations later, such a CNN outperforms a DNN consisting of fully-connected layers in both accuracy and complexity.
A conventional CNN typically consists of convolutional layers as well as pooling layers and fully-connected layers. However, the fully-connected layers and pooling layers are not used in our design for the following reasons. First, a pooling layer is typically used after a convolutional layer to perform a downsampling operation along the spatial dimensions. Recall that in the proposed CNN, the filter in the convolutional layer is used to extract features for each receiving position, which means that every output of the filter is useful. Discarding features will cause performance loss. Second, the fully-connected layers involve high complexity and are also difficult to train. As shown in the simulation section, the fully-connected layers do not provide any performance gain over the convolutional layers. Hence, we have not included any pooling layers and fully-connected layers in the proposed CNN. Dropout and batch normalization are also very important components in the conventional CNN architecture. However, we have tested their performance and found that they do not provide any gain either.
Iii-C Output Postprocessing
In the output postprocessing module, we map the output of the CNN to the estimate of the transmitted signals. Recall that we use the sigmoid function as the activation function of the output layer. As such, the output of the CNN lies in the interval . Here, we use an indicator function to map the continuous value of output to a discrete estimate of the transmitted signal :