Recently, deep learning (DL) has made remarkable achievements in the fields of computer vision and natural language processing and it has been adopted for application in channel decoding. The data-driven DL approach in converted the decoding task into the pure idea of learning to decode by optimizing the general black box fully connected deep neural network (FC-DNN). Despite the advantage of one-shot decoding (i.e., no iterations), the FC-DNN based decoder is short of expert knowledge, which in turn renders the FC-DNN decoder unaccountable and fundamentally restricted by its dimensionality. Training any neural network in practice is impossible because the training complexity increases exponentially along with block length (e.g., for a Turbo code with length of , different codewords exist) 3]. The aforementioned data-driven decoding methods count on a large amount of data to train numerous parameters, thereby converging slowly and suffering a high computational complexity.
To address the aforementioned issues, the model-driven DL approach can be used instead. The concept of a “soft” Tanner graph was proposed in , where weights were assigned to the Tanner graph of the belief propagation (BP) algorithm to obtain a deep neural network (DNN). These weights were learned to properly weight messages transmitted in Tanner graph, thereby improving the performance of BP algorithm. A large number of multiplications were required in ; thus, the authors in  proposed a min-sum algorithm with trainable offset parameters to reduce the computational complexity of the algorithm. The aforementioned DNN-based BP decoder was transformed into an RNN architecture in  named BP-RNN decoder by unifying the weights in each iteration, thereby reducing the number of parameters without sacrificing the performance. In addition, a trainable relaxation factor was introduced to improve the performance of this BP-RNN decoder.
In sum, two central limitations are inherent in current DL-based decoding methods. First, existing data-driven approaches rely on vast training parameters. Second, the aforementioned model-driven decoding algorithms are all based on the BP algorithm; however, whether these algorithms could be applied to sequential codes (e.g., Turbo code) to improve the performance remains unknown. To address the limitations, this paper presents TurboNet, a novel model-driven DL architecture for turbo decoding that combines DL with the traditional max-log-maximum a posteriori (MAP) algorithm. TurboNet is constructed based on the domain knowledge in turbo decoding algorithms and employing several learnable parameters. More specifically, the original iterative structure is unfolded to obtain an “unrolled” (i.e., each iteration is considered separately) structure, and the max-log-MAP algorithm is parameterized. With the design, the parameters can be determined via training data more efficiently than the existing black box FC-DNN  and RNN  architectures. Our TurboNet decoder exhibits better performance compared with the traditional max-log-MAP algorithm for turbo decoding with different code rates (i.e., and ) and contains considerably fewer parameters compared with the neural BCJR decoder proposed in . Furthermore, the proposed TurboNet decoder shows strong generalizations; that is, TurboNet is trained at a special signal-to-noise ratio (SNR) and outperforms the max-log-MAP algorithm at a wide range of SNRs.
To obtain the model-driven DL architecture for turbo decoding, we briefly describe the system model in Section II-A and the traditional max-log-MAP algorithm in Section II-B. The architecture and details of TurboNet are elaborated in Section II-C, including a redefined function that evaluates network loss.
Ii-a System Model
At the transmitter, a binary information sequence is encoded by a turbo encoder that contains two identical recursive systematic convolutional encoders (RSCEs). The generator matrix of the RSCE is , where and . The feedthrough passes one block of information bits , , to the output of the encoder, which are then referred to as systematic bits . The first RSCE generates a sequence of parity bits from the systematic bits, and the second RSCE generates a sequence of parity bits from , which is an interleaved sequence of the systematic bits. is the set of all RSCE states. The codeword consisting of bits is then modulated and transmitted over an additive white Gaussian noise (AWGN) channel. At the receiver, a soft-output detector computes reliability information in the form of log-likelihood ratios (LLRs) for the transmitted bits. The resulting LLRs
indicate the probability of the corresponding bits being a binary 1 or 0.
Ii-B Max-Log-MAP Algorithm
A traditional turbo decoder introduced in  contains two soft-input soft-output (SISO) decoders, which have the same structure. Therefore, we only introduce decoder 1 in detail as follows. The MAP algorithm is used in decoder 1 to compute a posteriori LLR for information bit as shown as follows:
where and represent the states of the encoder at time and , respectively; and the sequence is made up of the LLRs of systematic bits and corresponding parity bits.
is the set of ordered pairscorresponding to all state transitions caused by data input and is similarly defined for . All of the aforementioned state transitions are given in detail in the Table I.
On the basis of the Bayes formula, we obtain
where and can be computed through the forward and backward recursions :
with initial conditions , for , and , for . Moreover, is computed as follows :
where is the a priori probability LLR for the bit . Given that is the sum of the systematic bit LLR , the a priori probability LLR , and the extrinsic LLR , we obtain
which can be used as the a priori probability LLR input of the subsequent SISO decoder 2 after it is interleaved.
The log-MAP algorithm introduced in  evaluates and in logarithmic terms using the Jacobian logarithmic function , as shown as follows:
where , , and represent the logarithmic values of , , and , respectively. The a posteriori LLRs for information bits are computed as
In Section II-C, we provide an alternative graphical representation called neural max-log-MAP algorithm to replace the traditional max-log-MAP algorithm.
Ii-C TurboNet Architecture
The traditional iterative structure is unfolded and each iteration is represented by a DNN decoding unit to obtain an “unrolled” structure shown in Fig. 1, which is equivalent to iterations. denotes the a priori probability LLR calculated by max-log-MAP algorithm with iterations and denotes the a posteriori LLR calculated by max-log-MAP algorithm with iterations, where . The structure of the DNN decoding unit in Fig. 1 is shown in Fig. 2.
Fig. 3 shows that subnet 1, which is based on neural max-log-MAP algorithm, consists of layers, of which are hidden layers. Subnet 2 has the same structure as subnet 1. The details of the subnet architecture are elaborated as follows:
Ii-C1 Input Layer
The input layer of the proposed network consists of neurons, and the output of all neurons constitutes the set , where .
Ii-C2 Hidden Layer 1
The first hidden layer consists of neurons, and the output of all neurons constitutes the set , where is the set of ordered pairs corresponding to all state transitions caused by data input . Some neuron corresponding to in this layer is connected to neurons that corresponding to , , and in the input layer, where and .
Ii-C3 Hidden Layer from 2 to K
Each layer of the following hidden layers contains 16 neurons. For the th hidden layer, the output of all neurons constitutes the set , where
is the set of neuron outputs for all odd positions in theth hidden layer, is the set of neuron outputs for all even positions in the th hidden layer, and . For some , some neuron corresponding to in the th layer is connected to all neurons corresponding to elements in the set in layer and all neurons corresponding to elements in the set in the first hidden layer, where and ; some neuron corresponding to in the th layer is connected to all neurons corresponding to elements in the set in layer and all neurons corresponding to elements in the set in the first hidden layer, where and .
Ii-C4 Hidden Layer
The last hidden layer consists of neurons, and the output of all neurons constitutes the set . Some neuron corresponding to in the last hidden layer is connected to all neurons corresponding to elements in the set , , and , where .
Ii-C5 Output Layer
The output layer consists of neurons, and the output of all neurons constitutes the set . Some neuron corresponding to is connected to the neuron corresponding to in hidden layer and the neurons corresponding to , in the input layer.
We assign weights to part of the edges in Fig. 3
. These weights will be trained using the stochastic gradient descent (SGD) algorithm. Therefore, we can calculate, , and as follows:
Turbo code usually has a large block size. For example, the minimum message bit length of Turbo code in the long-term evolution (LTE) standard is 40, and the maximum is 6144. Therefore, parameterizing (10) and (11) will cause the neural network in Fig. 3 to be extremely “deep”, which may lead to gradient vanishing or gradient exploding. On this basis, we do not introduce any trainable parameters.
Given that the output of the th DNN decoding unit is
, the sigmoid functionis added, such that the final network output is in the range of . Generally, the mean square error and binary cross-entropy can be used to calculate the network loss with and but for the following two reasons:
The magnitude of the a posteriori LLR calculated by the traditional max-log-MAP algorithm is usually greater than 10, whereas the sigmoid function is nearly close to 1 and 0 when . Therefore, gradient vanishing is likely to occur if the loss is calculated with ;
The loss of the network mainly comes from the occurrence of a few error bits. Hence, the loss of the network becomes extremely small, thereby making the entire network difficult to train.
A redefined loss function computed as (16) is used to evaluate the loss of TurboNet
where represents the a posteriori LLR obtained by TurboNet consisting of decoding units, and represents the a posteriori LLR calculated by the traditional log-MAP algorithm with given iterations.
The goal is to make the loss of the network as small as possible by training the parameters . The final decoding results can be obtained by hard decision, as shown as follows:
By setting all weights to 1, the results of (13)-(15) are the same as the original max-log-MAP algorithm. Hence, the performance of the neural max-log-MAP algorithm will not be inferior to the max-log-MAP algorithm by training the network parameters. Moreover, the complexity of TurboNet is similar to the turbo decoder using the max-log-MAP algorithm.
Iii Simulation Results
Iii-a Parameter Settings
TurboNet was constructed on top of the TensorFlow framework, and an NVIDIA GeForce GTX 1080 Ti GPU was used for accelerated training. We trained TurboNet for Turbo codes (40, 132) and (40, 92) on randomly generated training data obtained over an AWGN channel at 0 dB SNR. TurboNet was composed of three DNN decoding units that corresponded with three full iterations. The loss function (16) was used with the target LLR being the log-MAP algorithm with six iterations. We trained TurboNet with SGD and the ADAM optimizer  with a batch size of 500. The learning rate was .
Iii-B Performance Analysis
Iii-B1 BER Performance
Fig. 4 indicates that the BER of TurboNet that consists of three decoding units is lower than those of the max-log-MAP algorithm and the log-MAP algorithm with three iterations at all SNR ranges. Notably, TurboNet also outperforms the max-log-MAP algorithm with five iterations in almost all cases. Fig. 5 shows that TurboNet containing three decoding units outperforms the max-log-MAP algorithm with three iterations at all SNR ranges. The BER performance of TurboNet is comparable to that of the log-MAP algorithm with three iterations and the max-log-MAP algorithm with five iterations under most circumstances. These results suggest that TurboNet can still work when handling punctured Turbo code with a high code rate.
Here, we detail about the SNR of the training data and the iteration number of the log-MAP algorithm in (16), which are closely related to the training of TurboNet.
The training data and the test data can have a similar distribution; thus, one might use the same SNR for testing and training, which is restricted because the precise SNR might not be available. Moreover, TurboNet is equivalent to traditional max-log-MAP algorithm by setting all weights to 1. Therefore, TurboNet is not able to learn to handle noise when the SNR is too high. However, if the SNR is set too low, the max-log-MAP algorithm has poor error correction capabilities and thus cannot learn effectively. So we deliberately reduced the SNR such that TurboNet could learn more robust error correction. Hence, we keep the SNR of the training data at a single 0 dB, which can help TurboNet learn to correct errors as much as possible.
Notably, the target LLR values in (16) are generated by the log-MAP algorithm with a fixed iteration number . If the iteration number is large, then TurboNet can learn accurate information. However, should not be too large because TurboNet only contains three decoding units. If is too large, then a large gap will exist between TurboNet and the log-MAP algorithm, thereby decreasing the generalization capability of TurboNet. Therefore, we set , which is exactly twice the number of the decoding units.
The improvement of BER is achieved by properly configuring the weight, such that the logarithmic term in the Jacobian logarithmic function is compensated appropriately. In addition, the LLRs are related to the channel conditions; thus, the part of the channel information might be learned by TurboNet, thereby making these LLRs used precisely.
|# of parameters||Time|
|Max-Log-MAP (5 iterations)||-||2.3e-4s|
|Neural BCJR in (3 units)||3.85M||5.89e-3s|
|TurboNet (3 decoding units)||17.8K||1.39e-4s|
Iii-B2 Computational Complexity
Table II compares the complexities of the decoders in terms of the number of parameters and time consumption required to complete a single-forward pass of one codeword. TurboNet has a lower computational cost and exhibits relatively faster computation speed with considerably fewer parameters compared with the data-driven neural BCJR decoder . The SISO decoder in the neural BCJR decoder is replaced by two bidirectional LSTM layers, and the number of hidden units in each LSTM layer is 800. In addition, TurboNet shows lower latency compared with the max-log-MAP algorithm with five iterations.
In this work, we demonstrate the benefits of the proposed TurboNet decoder architecture compared with traditional turbo decoder based on the max-log-MAP algorithm. In TurboNet, the original iterative structure is unfolded, and each iteration is represented as a DNN decoding unit. We obtain a neural max-log-MAP algorithm by assigning weights to the max-log-MAP algorithm. Moreover, a redefined loss function is used to improve the training process. The BER performance of TurboNet is improved compared with the max-log-MAP algorithm without increasing computational complexity. The error correction capability of the proposed TurboNet can be further improved by applying advanced DL technology, and we hope this letter encourages future research in this direction.
-  T. Gruber, S. Cammerer, J. Hoydis, and S. ten Brink, “On deep learning-based channel decoding,” in Proc. IEEE 51st Annu. Conf. Inf. Sciences Syst., Mar. 2017, pp. 1-6.
-  X.-A. Wang and S. B. Wicker, “An artificial neural net Viterbi decoder,” IEEE Trans. Commun., vol. 44, no. 2, pp. 165-171, Feb. 1996.
-  H. Kim, Y. Jiang, R. B. Rana, S. Kannan, S. Oh, and P. Viswanath, “Communication algorithms via deep learning,” arxiv preprint arXiv:1805.09317, 2018.
-  E. Nachmani, Y. Be’ery, and D. Burshtein, “Learning to decode linear codes using deep learning,” in Proc. IEEE Annu. Allerton Conf. Commun., Control, and Computing, 2016, pp. 341-346.
-  L. Lugosch and W. J. Gross, “Neural offset min-sum decoding,” in Proc. 2017 IEEE Int. Symp. Inf. Theory, Jun. 2017, pp. 1361-1365.
-  E. Nachmani, E. Marciano, L. Lugosch, W. J. Gross, D. Burshtein, and Y. Be’ery, “Deep learning methods for improved decoding of linear codes”, IEEE J. Sel. Topics Signal Process., vol. 12, no. 1, pp. 119-131, Feb. 2018.
-  3rd Generation Partnership Project; Technical Specification; Evolved Universal Terrestrial Radio Access (E-UTRA); Multiplexing and Channel Coding (Release 9) 3GPP Organizational Partners TS 36.212, Rev. 8.3.0, May 2008.
-  C. Berrou, A. Glavieux, and P. Thitimajshima, “Near Shannon limit error-correcting coding and decoding: Turbo-codes,” in Proc. Int. Conf. Communications, May 1993, pp. 1064-1070.
-  L. Bahl, J. Cocke, F. Jelinek, and J. Raviv, “Optimal decoding of linear codes for minimizing symbol error rate,” IEEE Trans. Inf. Theory, vol. IT-20, no. 2, pp. 284-287, Mar. 1974.
-  S. Talakoub, L. Sabeti, B. Shahrrava, and M. Ahmadi, “An improved Max-Log-MAP algorithm for turbo decoding and turbo equalization,” IEEE Trans. Instrum. Meas., vol. 56, no. 3, pp. 1058-1063, June 2007.
-  P. Robertson, P. Hoeher, and E. Villebrun, “Optimal and sub-optimal maximum a posteriori algorithms suitable for turbo decoding,” Eur. Trans. Telecommun., vol. 8, no. 2, pp. 119-125, 1997.
-  J. A. Erfanian, S. Pasupathy, and G. Gulak, “Reduced complexity symbol detectors with parallel structures for ISI channels,” IEEE Trans. Commun., vol. 42, no. 2-4, pp. 1661-1671, Feb.-Apr. 1994.
-  D. P. Kingma and J. Ba. (2014). “Adam: A method for stochastic optimization.” arxiv preprint arXiv:1412.6980, 2014.