In a single-mode optical fiber, narrowband signals propagate according to the nonlinear Schrödinger equation (NLSE) [1, p. 40]. This is schematically illustrated in Fig. 1. In the absence of noise, the transmitted signal can thus be recovered by solving an initial value problem (IVP) using the received signal as a boundary condition. In practice, the received signal first passes through an analog-to-digital converter and the IVP can then be solved via receiver digital signal processing (DSP). This approach is referred to as digital backpropagation (DBP) and was inspired by a similar idea where optical components were used for the processing . DBP was first studied as a transmitter pre-distortion technique [3, 4].
A major issue with DBP is the large computational burden associated with a real-time DSP implementation. Thus, various techniques have been proposed to reduce its complexity [5, 6, 7, 8, 9, 10, 11, 12, 13]
. In essence, the task is to approximate the solution of a partial differential equation using as few computational resources as possible. We approach this problem from a machine-learning perspective. In contrast to, e.g.,[14, 7, 10], we focus on deep learning and deep neural networks (NNs), which have attracted tremendous interest in recent years . Our approach is to obtain a multi-layer computation graph similar to a deep NN by applying the split-step Fourier method (SSFM) . This can be seen as an example of a more general methodology where domain knowledge is used to generate computation graphs with many layers .
Deep NNs have achieved record-breaking performance for various tasks such as speech or object recognition . In order to explain this success, the authors in  argue that most data of practical interest is generated by some form of hierarchical or Markov process, often obeying physical principles such as locality and symmetry. This makes it plausible that there exist efficient multi-layer computation graphs that can approximate these processes with few parameters. Our design choices are directly motivated by such considerations. In particular, our computation graph exploits the hierarchical problem structure that is introduced by the transmission process. Moreover, we choose the linear operators in the graph to be short and symmetric finite impulse response (FIR) filters.
This paper is a continuation of our work outlined in a recent summary paper . It contains several novel contributions. Most importantly, we provide a theoretical justification for the deep learning approach. In particular, while FIR filter design for chromatic dispersion has been studied extensively in the past [19, 5, 20, 21, 22, 23], the designed filters have shown relatively poor efficiency when used in a split-step method for DBP due to truncation errors. Indeed, we argue that, for computational efficiency, the filters used in each step should be different and that they should be optimized jointly. We also compare our approach with multiple truncation methods (for the filter coefficients) and with “few-step” perturbation approaches.
Ii Digital Backpropagation
We assume that the signal is launched into an optical fiber where it propagates according to the NLSE as shown in Fig. 1. After distance , the received signal is low-pass (LP) filtered and sampled at to give a sequence of samples . Our goal is to efficiently recover the signal (or a sampled version thereof) from .
Ii-a Split-Step Fourier Method
The popular SSFM is based on a block-wise receiver processing. To that end, assume that we collect
received samples into a vector. Consider now the time-discretized NLSE
where represents the sampled waveform at position along the fiber, , is the discrete Fourier transform (DFT) matrix, , is the -th DFT angular frequency, and is defined as the element-wise application of . To derive the SSFM, the fiber is conceptually divided into segments of length . Then, it is assumed that for sufficiently small , the effects stemming from the two terms on the right hand side of (1) can be separated. More precisely, for , (1) is linear with solution , where . For , the solution is , where is the element-wise application of . Alternating between these two operators for leads to the block diagram shown in the top part of Fig. 2.
The degree to which the obtained vector constitutes a good approximation of is now a question of choosing , , and . In practice, is typically an integer multiple of the baud rate and is chosen to minimize the overhead in overlap-and-save techniques for continuous data transmission. Increasing leads to a more accurate approximation, but also increases complexity, as discussed in the next section.
Ii-B Implementation Complexity and Few-Step Approaches
Ignoring the complexity of the nonlinear steps, the SSFM can be implemented using DFT/IDFT pairs, utilizing the fast Fourier transform. On the other hand, a linear equalizer can be implemented using a single DFT/IDFT pair. Based on this reasoning, the SSFM is at least times more complex than linear equalization. This motivates a number of approaches that focus on reducing the number of steps, see, e.g., [6, 8, 11] and references therein.
Iii Deep Feed-Forward Neural Networks
Deep feed-forward NNs map an input vector to an output vector by alternating between affine transformations and pointwise nonlinearities , [17, Eq. (6)]. This is illustrated in the bottom part of Fig. 2. The matrices and vectors are the network weights and biases, respectively, and
While the similarity between the two computation graphs in Fig. 2 is apparent, there are, however, important differences. The one that is most relevant for this paper is the sparsity level of the linear operators. In order to be computationally efficient, deep NNs are typically designed to have very sparse weight matrices in most of the layers, whereas the linear propagation operator in the SSFM is a dense matrix.
In that regard, one may argue that (1) is a “computationally inefficient” time-discretization of the NLSE, in the sense that it relates local propagation changes to all time instances. A different time-discretization approach is via partial discretization or finite-difference methods. Indeed, finite-difference methods can be more computationally efficient than the SSFM in some applications [1, Sec. 2.4.2]. However, to the best of our knowledge, finite-difference methods have not been studied for real-time DBP. One reason for this might be that many methods that show good performance are implicit, i.e., they require solving a system of equations at each step. This makes it challenging to satisfy a real-time constraint.
Iv Filter Design for Chromatic Dispersion
According to the NLSE, chromatic dispersion acts as an all-pass filter with frequency response , where , , and is the transmission distance. Various approaches have been proposed to approximate this response (over a fixed bandwidth) with an FIR filter. For example, since the inverse Fourier transform of can be computed analytically, filter coefficients may be obtained through direct sampling and truncation . Other approaches include frequency-domain sampling (FDS) , wavelets , and least-squares (LS) [22, 23].
Iv-a Parameter Efficiency in Split-Step Methods
Time-domain FIR filtering has been suggested for DBP in, e.g., [5, 20, 21, 12, 13]. In the SSFM, approximating with a short FIR filter can be interpreted as a truncation of to obtain a sparse banded matrix.
To estimate the required filter length, one may use the fact that chromatic dispersion leads to a group delay difference of over a bandwidth and distance . Normalizing by the sampling interval , this confines the memory to
samples. For example, we have ps/km, km, and GHz for the system studied in . The receiver bandwidth is GHz, but it is limited by an LP filter with 3-dB cutoff at GHz. Thus, FIR filters with 4–12 taps should be sufficient. However, 70-tap filters are required to obtain acceptable accuracy using FDS . Similar observation apply to the results in [21, 12, 13, 20], i.e., the required filter length is significantly longer than predicted by (2).
Iv-B Joint Filter Optimization
In previous work, a single filter or filter pair is designed and then used repeatedly in the SSFM. In this case, the truncation error accumulates coherently, leading to an undesired overall magnitude response as illustrated in Fig. 3. The effect is well known and a simple way to control it is by increasing the filter length. We propose instead to optimize all filters jointly.
In [12, 13], filter coefficient quantization is studied for time-domain DBP. They highlight the effect of correlated quantization errors and propose random dithering  and co-optimization of quantization levels of filter pairs . While this does not address the truncation error problem directly, it does alleviate it somewhat.
In this section, we illustrate how a joint filter optimization can be done in a way such that the problem admits a (possibly suboptimal) solution strategy via iteratively solving a set of weighted LS problems. This approach is simple and provides valuable insight into the problem. The optimized coefficients are then used as the initial starting point for the gradient-based deep learning approach discussed in the next section.
For simplicity, it is assumed that each of the filters for has taps. The generalization to unequal filter lengths is straightforward. Let be the discrete-time Fourier transform of . We use to denote an objective, i.e., the symbol may be interpreted as “should be close to”. The standard filter design uses the same objective for each of the filters, i.e., each filter should approximate, as closely as possible, the chromatic dispersion transfer function over some frequency range. In this case, one finds that all filters should be the same. In particular, after discretizing the problem with for , one may use standard techniques to solve the linear LS problem , where with and is an DFT matrix.
On the other hand, by sacrificing some accuracy for the individual frequency responses, it may be possible to achieve a better combined response of neighboring filters and also a better overall response. This leads to the set of objectives
Keeping the coefficients for all but one filter constant, (3) can be written as a standard weighted LS problem. Since, e.g., , we have in the discretized problem, where denotes element-wise multiplication. Hence, one obtains
where is the number of objectives, are weights, are constant vectors representing the influence of other filters and are the discretized objective vectors. A simple strategy for the joint optimization is then to solve (4) for each of the filters in an iterative fashion. The weights can be chosen based on a suitable system criterion.
We assume , where are the data symbols, is the pulse shape, and is the baud rate. For the block-wise processing, the estimated symbol vector is obtained by passing (see Fig. 2) through a digital matched filter (MF) followed by a phase-offset rotation. The mean squared error is then used as a criterion to be minimized. Assuming that is constant for all , this is equivalent to maximizing the effective signal-to-noise ratio (SNR) .
V Learned Digital Backpropagation
In , we have proposed to use deep learning for the joint filter optimization. The resulting method is referred to as learned DBP (LDBP). For LDBP, the computation graph of the SSFM is modified by interpreting all matrices as tunable parameters corresponding to the filters , similar to the weight matrices in a deep NN. The nonlinearities are changed to which act element-wise using , where is a tunable parameter.
The computation graph including the MF and phase-offset rotation is implemented in TensorFlow. All parametersare optimized by using many pairs of input and desired–output examples and adjusting the parameters such that the loss decreases. For this, we use the built-in Adam optimizer with a mini-batch size of 30 and a fixed learning rate. To find a good starting point for the filter coefficients, we employ the LS method described in Sec. IV-B. While it is possible to use random starting points, we observe that a better final solution can be obtained with pre-optimized coefficients.
Vi Results and Discussion
We revisit the parameters in , using a different LP filter and transmit signals. Extensions to wavelength division multiplexing (WDM) systems and higher baud rates are discussed below. The optical link consists of spans of km fiber and an amplifier is inserted after each span to compensate for the signal attenuation. All parameters are summarized in Fig. 4.111A Butterworth LP filter and QPSK modulation are assumed in . Forward propagation is simulated with 6 samples/symbol using the SSFM with 50 steps per span (StPS), i.e., .
LDBP uses 1 StPS (i.e., ), alternating between 5-tap and 3-tap filters. The effective SNR after training is shown in Fig. 4 by the green line (triangles). As a reference, we show the performance of linear equalization (red) and DBP with 1 StPS using frequency-domain filtering (blue). The linear equalizer uses LS-optimal coefficients with constrained out-of-band gain (LS-CO) . LDBP achieves a peak SNR of 21.9 dB using total taps. After increasing the filter lengths to 7 and 5 (127 total taps), one obtains essentially the same peak SNR as frequency-domain filtering.
Vi-a Comparison to Other Truncation Methods
The performance of FDS (circles) and LS-CO (squares) is shown in Fig. 4 as a comparison. The same filter is used in each step and the length is chosen such that the peak SNR is around 22 dB. For this, 15-tap filters are required for FDS (351 total taps) and 9-tap filters for LS-CO (201 total taps). This is roughly 5 and 3 times more than required for LDBP.
In , 70-tap filters based on FDS are required for similar accuracy. This is likely due to the higher oversampling factor used (3 samples/symbol). While a higher oversampling factor may increase the maximum SNR achievable via DBP, it can also adversely affect the performance if truncation errors are taken into account. In general, it is difficult to predict how truncation errors affect the SNR in a nonlinear system.
Vi-B Complexity Compared to Linear Equalization
We use multiplications as a surrogate for complexity and assume that the exponential function is implemented with a look-up table, similar to . For the nonlinear steps, one needs to square each sample (2 real multiplications), multiply by , and compute the phase rotation (4 real multiplications). This gives real multiplications per sample. For the linear steps, one has to account for 13 filters with 5 taps and 12 filters with 3 taps. All filters have symmetric coefficients and can be implemented using a folded structure with -normalization as shown in Fig. 5. This gives real multiplications per sample. In comparison, the fractionally-spaced linear equalizer in  requires 188 real multiplications per data symbol operating at samples/symbol. Thus, LDBP requires 3.5 times more multiplications per symbol. For the same oversampling factor as LDBP, the linear equalizer has taps (cf. (2) with GHz). This leads to real multiplications with a folded implementation. Thus, LDBP requires around 2 times more operations. If the linear equalizer is implemented in the frequency domain, the number of real multiplication is reduced to per sample (see, e.g., [11, Sec. 4]), which increases the estimated complexity overhead factor to 6.
Vi-C Comparison to Few-Step Approaches
The enhanced SSFM (ESSFM) modifies the nonlinear step based on a logarithmic perturbation . As a result, the sampled intensity waveform is filtered before applying the nonlinear phase shift. This gives the same functional form as previous approaches (e.g., [8, 6]
), albeit with potentially different performance due to different choices or heuristics for the filter coefficients used in the modified nonlinear steps.
Excluding the overhead due to overlap-and-save techniques, one ESSFM step requires real multiplications per sample, where is the filter length in the modified nonlinear steps [11, Sec. 4, single pol.]. We perform 4 ESSFM steps with , which gives roughly the same number of multiplications as LDBP. The filter coefficients are optimized from data as suggested in . The performance is shown by the grey line (diamonds) in Fig. 4. The ESSFM achieves a smaller peak SNR by around dB than LDBP.
Vi-D Deep Learning Interpretation
The performance of only the linear steps in LDBP after training reverts approximately to that of the linear equalizer, as shown by the dotted green line (crosses) in Fig. 4. This leads to an intuitive interpretation of the task that is accomplished by deep learning. In particular, the optimized filter coefficients represent an approximate factorization of the overall linear inverse fiber response. At first, this may seem trivial because the linear matrix operator can be factored as with for arbitrary to represent shorter propagation distances. However, the factorization task becomes nontrivial if we also require the individual operators to be “cheap”, i.e., implementable using short filters.
We also experimented with factoring the -transform polynomial of the -tap linear equalizer into a cascade of -tap filters. However, this gives no control over the individual filter responses, other than the choice of how to distribute the overall gain factor. Moreover, it is not obvious how to achieve a good ordering of sub-filters in the SSFM.
Vi-E Wavelength Division Multiplexing and Higher Baud Rates
In a WDM system, the performance improvements of ideal single-channel (or few-channel) DBP are limited due to nonlinear interference from neighboring channels. This implies that it may be desirable to sacrifice some accuracy (i.e., target a lower effective SNR), and further simplify the design of LDBP, e.g., by pruning additional filter taps. A relaxed accuracy requirement also leaves some margin for practical impairments such as noise caused by filter coefficient quantization .
The memory introduced by chromatic dispersion increases quadratically with the considered bandwidth and linearly with the transmission distance, see (2). For longer links and/or higher baud rates, this seems to favor frequency-domain equalization (e.g., a DFT-based linear equalizer) over time-domain equalization in terms of complexity. On the other hand, the Kerr effect and its compensation are naturally described in the time domain. One possible approach to achieve a good performance–complexity trade-off is through digital sub-band processing. This entails a potential performance loss (due to possibly uncompensated sub-band interference), but it also reduces the effective system memory per sub-band. A closer investigation of this trade-off for LDBP is the subject of ongoing research.
We have considered the problem of reducing the complexity of DBP to facilitate a real-time DSP implementation. Our approach, called learned DBP (LDBP), is based on a multi-layer computation graph generated by the SSFM with many steps. Computational efficiency is achieved by using, in each step, very short and symmetric FIR filters that are jointly optimized with deep learning. Numerical results show that for a single-channel transmission scenario, LDBP can achieve a favorable performance–complexity trade-off compared to other filter design methods and perturbation-based “few-step” DBP.
-  G. P. Agrawal, Nonlinear Fiber Optics, 4th ed. Academic Press, 2006.
-  C. Paré, A. Villeneuve, P.-A. A. Bélanger, and N. J. Doran, “Compensating for dispersion and the nonlinear Kerr effect without phase conjugation,” Optics Letters, vol. 21, no. 7, pp. 459–461, 1996.
-  R.-J. Essiambre and P. J. Winzer, “Fibre nonlinearities in electronically pre-distorted transmission,” in Proc. European Conf. Optical Communication (ECOC), Glasgow, UK, 2005.
-  K. Roberts, C. Li, L. Strawczynski, M. O’Sullivan, and I. Hardcastle, “Electronic precompensation of optical nonlinearity,” IEEE Photon. Technol. Lett., vol. 18, no. 2, pp. 403–405, Jan. 2006.
-  E. Ip and J. M. Kahn, “Compensation of dispersion and nonlinear impairments using digital backpropagation,” J. Lightw. Technol., vol. 26, pp. 3416–3425, Oct. 2008.
-  L. B. Du and A. J. Lowery, “Improved single channel backpropagation for intra-channel fiber nonlinearity compensation in long-haul optical communication systems.” Opt. Express, vol. 18, no. 16, pp. 17 075–17 088, Jul. 2010.
-  T. S. R. Shen and A. P. T. Lau, “Fiber nonlinearity compensation using extreme learning machine for DSP-based coherent communication systems,” in Proc. Optoelectronics and Communications Conf. (OECC), Kaohsiung, Taiwan, 2011.
-  D. Rafique, M. Mussolin, M. Forzati, J. Mårtensson, M. N. Chugtai, and A. D. Ellis, “Compensation of intra-channel nonlinear fibre impairments using simplified digital back-propagation algorithm.” Opt. Express, vol. 19, no. 10, pp. 9453–9460, Apr. 2011.
-  A. Napoli, Z. Maalej, V. A. J. M. Sleiffer, M. Kuschnerov, D. Rafique, E. Timmers, B. Spinnler, T. Rahman, L. D. Coelho, and N. Hanik, “Reduced complexity digital back-propagation methods for optical communication systems,” J. Lightw. Technol., vol. 32, no. 7, 2014.
-  A. M. Jarajreh, E. Giacoumidis, I. Aldaya, S. T. Le, A. Tsokanos, Z. Ghassemlooy, and N. J. Doran, “Artificial neural network nonlinear equalizer for coherent optical OFDM,” IEEE Photon. Technol. Lett., vol. 27, no. 4, pp. 387–390, Feb. 2015.
-  M. Secondini, S. Rommel, G. Meloni, F. Fresi, E. Forestieri, and L. Poti, “Single-step digital backpropagation for nonlinearity mitigation,” Photon. Netw. Commun., vol. 31, no. 3, pp. 493–502, 2016.
-  C. Fougstedt, M. Mazur, L. Svensson, H. Eliasson, M. Karlsson, and P. Larsson-Edefors, “Time-domain digital back propagation: Algorithm and finite-precision implementation aspects,” in Proc. Optical Fiber Communication Conf. (OFC), Los Angeles, CA, 2017.
-  C. Fougstedt, L. Svensson, M. Mazur, M. Karlsson, and P. Larsson-Edefors, “Finite-precision optimization of time-domain digital back propagation by inter-symbol interference minimization,” in Proc. European Conf. Optical Communication, Gothenburg, Sweden, 2017.
-  C. Monterola and C. Saloma, “Solving the nonlinear Schroedinger equation with an unsupervised neural network,” Opt. Express, vol. 9, no. 2, pp. 72–84, Jul. 2001.
-  Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, 2015.
-  K. Gregor and Y. Lecun, “Learning fast approximations of sparse coding,” in Proc. Int. Conf. Mach. Learning, 2010.
-  H. W. Lin, M. Tegmark, and D. Rolnick, “Why does deep and cheap learning work so well?” J. Stat. Phys., vol. 168, no. 6, 2017.
-  C. Häger and H. D. Pfister, “Nonlinear interference mitigation via deep neural networks,” in Proc. Optical Fiber Communication Conf. (OFC), San Diego, CA, 2018.
-  S. J. Savory, “Digital filters for coherent optical receivers,” Opt. Express, vol. 16, no. 2, pp. 804–817, 2008.
-  L. Zhu, X. Li, E. Mateo, and G. Li, “Complementary FIR filter pair for distributed impairment compensation of WDM fiber transmission,” IEEE Photon. Technol. Lett., vol. 21, no. 5, pp. 292–294, Mar. 2009.
-  G. Goldfarb and G. Li, “Efficient backward-propagation using wavelet- based filtering for fiber backward-propagation,” Opt. Express, vol. 17, no. 11, pp. 814–816, May 2009.
-  A. Eghbali, H. Johansson, O. Gustafsson, and S. J. Savory, “Optimal least-squares FIR digital filters for compensation of chromatic dispersion in digital coherent optical receivers,” J. Lightw. Technol., vol. 32, no. 8, pp. 1449–1456, Apr. 2014.
-  A. Sheikh, C. Fougstedt, A. Graell i Amat, P. Johannisson, P. Larsson-Edefors, and M. Karlsson, “Dispersion compensation FIR filter with improved robustness to coefficient quantization errors,” J. Lightw. Technol., vol. 34, no. 22, pp. 5110–5117, Nov. 2016.