In communication systems, many physical components have nonlinear responses, which distort signals traveling through them. For example, the power amplifier in radio transmitters, the Mach-Zender interferometer (MZI) modulator in optical communication systems, driving amplifiers for the MZI modulator, digital-analog/analog-digital converters, and the optical fiber channel itself are all sources of nonlinear distortion.
In these scenarios, digital signal processing can be used to equalize nonlinearity-induced distortions at the transmitter and at the receiver. When compensating for the nonlinearity at the transmitter, channel noise is not present, and hence, noise cannot be enhanced as part of equalization. However, since the transmitter does not have any means to measure the distorted signal, additional feedback communication may be required for weight optimization. On the other hand, placing the nonlinearity equalizer at the receiver does not require feedback. For example, such an approach can compensate at the receiver for time-varying nonlinear distortion caused by a change in operation, temperature of the physical components, or stress on the fiber.
Nonlinear equalization at the receiver has to compensate for nonlinearities in the presence of noise, and this induces a noise enhancement problem. The noise enhancement in the digital filter is similar to that in nonlinear refractive index material, which is caused by four wave mixing in optical fiber  or modulation instability . More recently, digital back-propagation (DBP)  has been well studied as a compensation method for optical fiber link nonlinearity. In particular, the noise enhancement in a DBP equalizer for fiber has been evaluated in . Thus, reducing the noise enhancement in the nonlinear equalizer is key to improving its performance. However, this can be a challenging problem since there is a trade-off between nonlinearity compensation and noise enhancement.
To add to the complexity, the nonlinear response in the transmission system components can add a memory effect. Therefore, an algorithm that compensates for this memory effect is required. A common technique used for this compensation is the Volterra series approximation. A Volterra series approximation is a natural extension of the classical Taylor series approximation of linear systems to nonlinear systems. In the Volterra series approximation for the system output, in addition to the convolution of the input signal with the system’s linear impulse response, the system output includes a series of nonlinear terms that contain products of increasing order of the input signal with itself. It can be shown that these polynomial extension terms allow for close approximations to the output for a large class of nonlinear systems, which basically encompasses all systems with scalar outputs that are time-invariant and have finite memory  . For this reason, the Volterra series approximation for the system output has been used in methods to compensate for the nonlinearity introduced by the transmitter components , and by the fiber optical channel itself . Such methods are referred to as Volterra equalizers.
Recently, there have been many works on applying machine learning and neural networks (NNs) to digital communication systems. For example, machine learning has been used for sequence detection, channel decoding of low-density parity-check (LDPC) codes  , and joint source-channel coding [10, 11]. Machine learning has also been used to compensate for the nonlinearity that may be introduced during communication. Some examples include compensating for the nonlinearity caused by transmitter clipping effects  as well as those caused by the fiber channels itself . There are also several works that consider iterative integration of NNs with belief propagation (BP) channel decoding. In , convolutional NN (CNNs) are used to remove correlated noise from the output of the BP decoder. In , iterative decoding with NNs and LDPC codes is proposed, which helps to improve and compensate for severe inter-carrier distortion of orthogonal frequency division multiplexing (OFDM) signals caused by the transmitter clipping.
In this paper, we focus on nonlinearity compensation at the receiver. We first focus on a Volterra equalizer and derive an expression for the noise figure for distortion compensation that results in a common measure of signal-to-noise ratio (SNR) degradation. This derivation provides a mathematical expression for the noise enhancement effect of this equalizer, and supports our numerical analysis of the trade-off between nonlinearity compensation and noise enhancement. Using these results, we then propose a method to optimize the training SNR for the Volterra equalizer. We then propose an alternative approach to the Volterra equalizer that results in the best system performance. This approach alternates between NN equalization for compensation of the nonlinearity and BP for noise removal and channel decoding. In particular, we implement BP iterations as a nontrainable NN layer. This allows us to define the loss function that is used for training the NN equalizer as the cross entropy loss at the output of the BP step, which results in considerable gains in terms of performance. Finally, we evaluate the performance of this newly proposed approach, and demonstrate that it leads to a 1.7 dB gain versus no equalization, and outperforms the Volterra equalizer with the optimal training SNR by 0.6 dB.
Ii System Description
Figure 1(a) shows the digital transmission system assumed in this paper. The transmitter has a forward error correction (FEC) encoder, a bit-to-symbol mapping module, and a pulse shaping block. Let be the number of information bits per symbol for each of the in phase, , and quadrature, , components of the signal. The binary information bits for the -th bit of the -th symbol at the FEC encoder output is denoted as , and the amplitude for the -th symbol at the bit-to-symbol mapper output is defined as . Then, the pulse shaping block converts the single sample per symbol (SPS) waveform into multiple SPS. In this paper, 2 SPSs are used.
The pulse shaped signal is then fed into a nonlinearity block, which captures the nonlinearity that could be introduced during transmission and signal propagation. This block could either introduce a memory-less nonlinearity or a nonlinearity with memory. In this paper, a memory-less nonlinearity is modeled as a sinusoidal transfer function given the simplicity of such functions as well as their generality in modeling nonlinearities in the system. Note that although the introduced nonlinearity is memoryless, when considered in combination with pulse shaping at the transmitter and the linear filter at the receiver, the nonlinear system response will have memory.
White Gaussian noise (WGN) is added to the signal in the channel, and at the receiver, a linear filter, a nonlinear equalizer, and a BP FEC decoder are used to recover the bits. The linear filter uses a pulse shaping function and an adaptive finite impulse response (FIR) filter to convert multiple-SPS signals into a single SPS signal. The amplitude level of the -th symbol of the linear filter output is denoted as . Another function of the adaptive FIR filter is to eliminate linear distortion. More precisely, this filter mitigates inter-symbol interference (ISI) by minimizing the mean square error (MSE) of its output. In this paper, it is assumed that the pulse shaping functions of the transmitter and the receiver are both root raised cosine functions. Since the nonlinearity is modeled by the sinusoidal function, the total transfer function including the channel and linear filters is symmetric.
We consider two differed nonlinear equalization techniques: (A) A Volterra equalizer which is applied to the amplitude level signal, and (B) NNs with and without BP decoder feedback. The output amplitude levels of the Volterra equalizer and the soft decision (SD) inputs are denoted by . The outputs of the SD, NN, and BP FEC decoder have the form of a bit-wise log likelihood ratio (LLR), and they are denoted by , , and , respectively. In SD, the output LLR is calculated as , where is a parameter chosen based on the noise power, and and denote the sets of all possible for which or , respectively.
In the rest of the paper, the sequences of , ,
are denoted by column vectors, , and , respectively.
Fig. 1(b) shows an approximated system setup to the blocks in the red dashed line in Fig. 1(a), which is used to derive analytic expressions for the noise figure in Section IV. In this setup, noise is added after the linear filter. The signal component of the output of the linear filter is denoted as W, and the noise components that is added to W is denoted as Z. The output of this equivalent approximation is denoted as as in the original system setup.
Fig. 1(c) shows a system setup where the blocks inside the light blue dashed lines in Fig. 1(a) are replaced by those in Fig. 1(c). In this setup, the noise component, which is added in the channel, is removed just before the Volterra equalizer and is added again to the output of the Volterra equalizer. That is, the noise is added only after the Volterra equalizer. Therefore, there will be no noise enhancement by the Volterra equalizer. Note that in order to keep the operating conditions of the adaptive FIR filter exactly the same between the two systems, the FIR filter is applied to both the received noisy signal as well as to the noise component itself before the filter outputs are subtracted to remove the noise. We use this system setup in Section V to evaluate the noise enhancement in Volterra equalization.
Iii Algorithms for Nonlinearity Equalization
In this section, we discuss different techniques for nonlinear equalization. First, we describe the Volterra equalizer, which is the most widely-used currently. Then, we describe a new nonlinear equalization technique based on neural networks.
Iii-a Volterra Equalizer
Since the Volterra equalizer considers the memory order of the nonlinearity, the output symbol of the Volterra equalizer is generated from consecutive input symbols of the index range of with the -th symbol at its center. Therefore, the filter length of the equalizer is defined as
symbols. Because here we assume a sinusoidal nonlinear transfer function as described in the previous section, which is odd symmetric, 2nd order product terms can be neglected. Thus, the input vectors required for the-th output symbol generation are a set of 1st-order terms and 3rd-order product terms, which are row vectors of and given by
where superscript and are used to refer to the terms being of 1st and 3rd order, respectively, and denotes the set of all possible combinations of 3 integers in the range of , defined as . The notation denotes the vector elements . By defining the weights of the 1st and 3rd order product terms as column vectors and , the output of the Volterra equalizer is given by the inner product of the weights and the input vectors as follows:
All the input terms of the Volterra equalizer can be orthogonalized to each other, as was shown in . Therefore, when and
are optimized by minimizing the MSE between the estimated value and the transmitted value,, the optimum weights satisfy the following condition:
The sequence of the estimated symbols are given by , where two matrices defined by , and are introduced. Using this representation, the optimum weights satisfying (2) are given by
Iii-B Neural network with belief propagation feedback
In order to suppress the noise enhancement phenomena, a method to remove or cancel out white noise components of the input signal before or during the nonlinear equalization is required. One promising method to remove white noise is the BP decoding using the parity bit information. In SD decoding, the BP process is used iteratively to improve the LLR of each bit using those of other bits connected to identical check nodes. By successively using NNs with BP decoder steps, noise enhancement during nonlinear equalization is expected to be suppressed. This suppression of noise enhancement in nonlinear equalization leads to improvements in the input LLR at each BP step, and hence better performance.
Fig. 2 shows the block diagram of nonlinear equalization operating with the BP decoder.
The input to the NN equalizer has 2 streams of LLRs: the LLR after the linear filter , and the LLR feedback from the previous BP decoder step . The first NN stage is fed with only . Then the LLR output from NN is fed into the BP decoder. The calculation flow of each stage of the NN is explained below. Since the goal is equalization of the nonlinearity with memory, the NN output LLR of the -th symbol is generated by considering the input LLRs of the time window of consecutive symbols centered on the -th symbol:
Here, integers and are used for the number of bit and symbol positions, respectively, and the same notation is used hereafter. Since the input to the NN consists of consecutive symbols, a CNN is used in the first layer of the network as a sliding window that slides across the symbol sequence. The length of the window is designed depending on the memory length . The output of the CNN layer is defined as , where identifies the node in the next layer, and is given by:
is the activation function. Here, the functionis used to transform and limit the NN input variable range to within
. This is because the LLR value can range from negative to positive infinity, and additionally its probability distribution can vary depending on link conditions and the number of BP decoder iterations.
The output of the first layer is then fed into a second layer, which is a fully connected layer. The output of the second layer is defined as , where denotes the node in the next layer, is the number of output nodes from the first layer, and is the activation function. Finally, the output layer is another fully connected layer which is given by , where is the number of outputs from the previous layer, and is the activation function.
In this paper, the ReLU function is used forand , and linear activation is used for the final layer; .
In order to train the NN and BP as a combined block, each iteration step of the the BP decoder is also implemented as a NN layer. At each step the BP receives three LLRs: an LLR from the NN , the updated LLRs , and from the previous iteration of the BP step. The output LLR of the BP step is calculated by summing and , which is then passed to the next NN nonlinearity equalization step. Therefore, the BP step can be represented as
We can use pairs of non-negative integers (,) to identify the variable node for the input LLR of the -th bit of the -th symbol, one of whose edge is connected to a check node of identification number , as . Similarly, we define a set of non-negative numbers for the check node, one of whose edge is connected to the variable node of identification pair number , as . Additionally, the LLR denotes a belief message passing from the check node of identification number to the variable node of . Similarly, denotes a belief message passing from the variable node of to the check node of . The BP decoder is repeated times for each NN stage, and is repeated times after the final NN stage.
The loss function is defined as a cross entropy between Tx bit-wise probability and bit-wise conditional probability of for a given BP output LLR . Thus, the loss of the NN for the -th bit is defined by ensemble over and summation over of , and it is given by:
where is used.
The final loss is defined by the sum of all losses for all bits. In this paper, the modulation format of 64QAM is used, where each of the and dimensions has 8 levels, and yields =3. Since the most significant bit, , is mapped to the sign of the corresponding symbol amplitude , and its penalty of nonlinear distortion is negligible, the NN for the most significant bit is not implemented in this paper. Thus, NNs are configured for the second significant bit and the least significant bit only, and the final loss for each stage of the NN is given by .
Iv Noise enhancement of nonlinear equalization
Noise enhancement can occur during nonlinear equalization. For example, let’s assume that the input of the nonlinear equalizer has a signal component and a noise component , with , , and , where the noise power is much lower than the signal power. Then the 3rd order product includes the pure signal term , and all other terms have a noise component. Since we have assumed that the signal power is much larger than then noise power, the dominant noise terms will be , , and . Since each of these terms includes a product of noise and signal terms, they can be viewed as noise amplification or enhancement by the signal.
In this section, we analyze the approximate model shown in Fig. 1(b). Recall that the input signal to the nonlinear filterconsists of pure signal component and noise component . Since we consider symmetric channels, and odd symmetric nonlinear response, the mean of the signal is zero, i.e. . We assume that the additive noise is zero mean, is independent for different symbols, i.e., for , and is independent of signal, i.e., for any , . With these assumptions, the system model will be an approximation of the model in Fig. 1(a), since the linear filter can introduce dependence between and for . This approximation allows us to analytically evaluate the noise enhancement in the Volterra equalizer and hence provides an analytical result to compare against our numerical simulation. By substituting in (1) with , the Volterra equalizer output is given by
The output can be separated into the compounded signal component , which consists of only pure signal components of , and the compounded noise component , which includes only the terms that consist of the noise terms . These compounded signal and noise terms are given by
The signal power and the noise power of the Volterra equalizer output are given by the ensembles of their field squares, and , respectively, as follows:
where we have dropped the terms that included the square of in the assumption that , and this is reasonable considering that a ratio is observed to be 0.067 under the assumed nonlinearity.
We use the noise figure  , which is defined as a ratio of the SNR at the input of the filter over the SNR at the output of the filter, to describe the noise enhancement feature of the nonlinear equalizer. Using (4), the noise figure for the Volterra equalizer is given by
where , , , and are used. The notations and denote the signal component of and , and they are defined by and as row vectors, respectively. In the derivation of (5), we used the fact that for and for .
Note that the noise figure depends only on the statistics of the input signal, since the Volterra equalizer weights, and in (3), depend on statistics of the input signal.
V simulation results
In this section, we evaluate the performance of the nonlinear equalization techniques described in the previous sections. First, we use the analytical results on the noise enhancement properties of the Volterra equalizer to evaluate the trade-off between noise enhancement and nonlinearity reduction, and propose a new and better guideline for training the Volterra equalizer. We then compare the Volterra equalizer to our proposed approach: iterative NN equalization combined with BP noise removal.
All our numerical simulations are performed using the system model shown in the Fig. 1(a). For the nonlinear equalizer, we use for both the Volterra and NN equalizers, and we use for the NN. We use a standardized DVB-S.2 LDPC channel codes with the code rate of 0.8 . The total number of iterations of the BP decoder is 50, with . For the linear filters, the adaptive FIR filter length of 17 is used, and the roll off factor of both the Tx and Rx pulse shapes is 0.2.
V-a Nonlinear equalization and impact of white noise
In order to demonstrate the noise enhancement property of the Volterra equalizer, we consider two systems. In particular, the systems in Fig. 1(a) and 1(c) are considered and compared. Recall that in Fig. 1(c) noise is added after the Volterra equalizer, and hence, the Volterra equalizer only compensates for the nonlinearity without enhancing the noise.
Fig. 3(a) shows the result of this comparison. The black curve shows the BER of a system with no nonlinearity and no Volterra equalization. The blue curve on the right corresponds to the system in Fig. 1(a) where the nonlinearity is compensated by the Volterra equalizer. In this system the Volterra equalizer also enhances the noise. The red curve in the middle correspond to the system in Fig. 1(c), where the noise is added after the Volterra equalizer. Therefore, there is no noise enhancement by the Volterra equalizer; it only compensates for the nonlinearity. Thus, the difference between the black and the red curve corresponds to a nonlinearity penalty (NL-penalty), which cannot be removed by the Volterra equalizer, while the difference between the red and blue curve shows the Volterra equalizer noise enhancement penalty (NE-penalty).
In the rest of this subsection, we define the required SNR as the SNR required to achieve a post-BP BER performance of . Fig. 3(a) is one example of a post-BP BER plot, where the SNR that is used in training the Volterra equalizer weights is 19 dB. In this particular example, the NE-penalty, and the NL-penalty are observed to be 0.35 dB, and 0.37 dB, respectively, and their total is 0.72 dB.
Fig. 3(b) shows the NL-penalty, the NE-penalty, and their sum for various training SNR values. We thus see the trade-off between the NE-penalty and the NL-penalty as a function of the SNR that is used to train the weights of the Volterra equalizer. In particular, we observe that when a high SNR (e.g., 35dB) is used to train the Volterra equalizer weights, it removes most of the signal distortion associated with the nonlinearity, at a cost of significantly enhancing the noise. Similarly, when a lower SNR (e.g., 16.5 dB) is used for training the Volterra equalizer, the noise enhancement is reduced at the cost of not removing all the signal distortion associated with the nonlinearity. Interestingly, the total penalty remains relatively constant for training SNR values of more than 17dB, suggesting that it is best to train the Volterra equalizer at a high SNR. In Fig. 3(b), we also plot the noise figure derived in Section IV (i.e., the dashed line). As can be seen, the derived noise figure is a good approximation to the NE-penalty of the Volterra equalizer. Finally, this figure demonstrates that there is a training SNR for the Volterra equalizer such that the total SNR penalty is minimized.
V-B Noise figure suppression by NN and BP
In this subsection, comparison between the NN with BP feedback, which was proposed in section III, and the Volterra equalizer are presented based on simulations. Figure 4 shows the result for the post-BP BER as a function of the received signal SNR. The triangle solid line plot is a system with no nonlinearity. The triangle dashed line plot show the case where only the linear filter is used without any nonlinear equalization. Two results are shown for the Volterra equalizer: one where the Volterra equalizer is trained at each received SNR, and one where it is trained at the optimal SNR. We also consider two NN equalizers to compensate for the nonlinearity. First, we consider a single stage NN equalizer that is applied only once before BP feedback. Second a three stage iterative NN, where the first stage is before BP, and the second and third NN stages are after a few BP steps followed by more BP steps. As can be seen, the proposed iterative NN-BP equalization achieves the best performance with 0.6 dB gain compared to the best Volterra equalizer, and 1.7 dB gain compared to the case where there is no equalization to compensate for the nonlinearity.
We derived an analytic model of the noise figure for Volterra equalizers, which can be used to evaluate its noise enhancement. Using this model, the training SNR for the Volterra equalizer, which results in a better performance compared to training at each specific SNR, can be obtained. Next, we proposed a new NN scheme for nonlinear equalization, where a BP step is implemented as a non-trainable NN layer, followed by another NN equalizer. This allows us to jointly decode and equalize to compensate for the nonlinearity. We show that the 3 stage NN equalizer with BP is better than the Volterra equalizer with optimal training SNR by 0.6 dB and has 1.7 dB gain compared to a nonlinear system with no nonlinearity compensation.
-  D. Marcuse, “Bit-error rate of lightwave systems at the zero-dispersion wavelength,” J. Lightwave Technol., vol. 9, no. 10, pp. 1330–1334, 1991.
-  V. Zakharov and L. Ostrovsky, “Modulation instability: The beginning,” Physica D: Nonlinear Phenomena, vol. 238, no. 5, pp. 540 – 548, 2009.
-  E. Ip and J. M. Kahn, “Compensation of dispersion and nonlinear …” J. Lightwave Technol., vol. 26, no. 20, pp. 3416–3425, 2008.
-  P. Serena, “Nonlinear signal–noise interaction in optical links with nonlinear …” J. Lightwave Technol., vol. 34, no. 6, pp. 1476–1483, 2016.
-  M. Schetzen, The Volterra and Wiener theories of nonlinear systems. Wiley, 1980.
-  A. Zhu, P. J. Draxler et al., “Open-loop digital predistorter for rf power …” Trans. Microw. Theory Technol., vol. 56, no. 7, pp. 1524–1534, 2008.
-  G. Shulkind and M. Nazarathy, “Nonlinear digital back propagation compensator …” Opt. Express, vol. 21, no. 11, pp. 13 145–13 161, 2013.
-  N. Farsad and A. Goldsmith, “Neural network detection of data …” Trans. Signal Process., vol. 66, no. 21, pp. 5663–5678, 2018.
E. Nachmani, E. Marciano et al.
, “Deep learning methods for improved …”J. Sel. Topics Signal Process., vol. 12, no. 1, pp. 119–131, 2018.
-  N. Farsad, M. Rao, and A. Goldsmith, “Deep learning for joint source-channel coding of text,” in 2018 IEEE ICASSP, 2018, pp. 2326–2330.
-  E. Bourtsoulatze, D. B. Kurka, and D. Gündüz, “Deep joint source-channel coding for wireless …” CoRR, vol. abs/1809.01733, 2018.
-  Y. He, M. Jiang, and C. Zhao, “A neural network aided approach for LDPC coded DCO-OFDM with …” CoRR, vol. abs/1809.01022, 2018.
-  T. Koike-Akino, D. S. Millar et al., “Fiber nonlinearity equalization with multi-label …” in Advanced Photonics 2018. OSA, 2018, p. SpM4G.1.
-  F. Liang, C. Shen, and F. Wu, “An iterative bp-cnn architecture …” J. Sel. Topics Signal Process., vol. 12, no. 1, pp. 144–159, 2018.
-  C. . Tseng and E. J. Powers, “Application of orthogonal-search method to volterra …” in 1993 IEEE ICASSP, vol. 4, April 1993, pp. 512–515.
-  H. T. Friis, “Noise figures of radio receivers,” Proceedings of the IRE, vol. 32, no. 7, pp. 419–422, 1944.
-  ETSI, “Digital video broadcasting (dvb); second generation framing structure, channel coding …” EN 302 307-1, v.1.3.1, 2013.