Machine learning techniques[1, 2, 3] have been recently applied to optical communications systems to deal with various issues such as network monitoring[4, 5, 6], traffic control[7, 8, 9, 10], signal design[11, 12, 13, 14, 15], and nonlinearity compensation[16, 17, 18, 19, 20, 21]. Since the fiber nonlinearity is a major limiting factor to the achievable information rates[22, 23, 24], mitigating nonlinearity has been of great importance to realize high-speed, reliable, and long-reach optical communications. Conventionally, a number of model-based nonlinear equalizers to compensate for fiber distortion were investigated, e.g., maximum-likelihood sequence equalizer (MLSE)[25, 26, 27], turbo equalizer (TEQ) [28, 29, 30], Volterra series transfer function (VSTF) [33, 32]
, and digital backpropagation (DBP)[35, 38, 37, 36]. However, those nonlinear equalizations are computationally complex and susceptive to model parameter mismatch in general. Recent data-driven approaches motivated by deep learning can favorably replace such traditional model-based methods as the use of deep neural networks (DNN) allows flexible statistical analysis of complicated fiber-optic systems without relying on specific models. In the past few years, DNN has shown its high potential in nonlinear performance improvement, e.g., [16, 17, 18, 19, 20, 21, 12, 13, 14, 15].
Nonetheless, most existing work did not appropriately account for practical interaction with forward error correction (FEC) codes. For example, multi-class soft-max cross-entropy loss is often used to train DNN, which is relevant only when nonbinary FEC codes are assumed. For more practical bit-interleaved coded modulation (BICM) systems, it was found in  that binary cross-entropy (BCE) loss can improve accuracy and scalability to high-order quadrature-amplitude modulation (QAM). In this paper, we propose a novel DNN application to perform TEQ for nonlinear mitigation in the context of BICM with iterative demodulation (ID). Although DNN has already been popular in nonlinear compensation, our paper is the first attempt to adopt DNN for TEQ in the framework of BICM-ID which takes soft-decision feedback from the FEC decoder to refine the DNN output for improved equalization accuracy. We make an analysis of the extrinsic information transfer (EXIT) of turbo DNN, and demonstrate that the proposed DNN paired with irregular low-density parity-check (LDPC) codes used in DVB-S2 standards offers a significant performance gain by accelerating the decoder convergence in nonlinear transmissions.
The contributions of this paper are summarized as follows:
Trend overview: We first overview the recent trend of deep learning in optical society.
Multi-label DNN: We then verify that nonbinary cross-entropy is not scalable to high-order QAM signals and DNN trained with BCE loss can appropriately compensate for fiber nonlinearity.
Turbo DNN: We propose a nested residual DNN architecture for TEQ to further improve performance.
EXIT analysis: We analyze EXIT chart of our DNN-TEQ and show that DNN-TEQ accelerates decoding convergence.
LDPC design: We optimize degree distribution of LDPC codes to match EXIT charts of DNN-TEQ, achieving higher throughput.
Note that due to the above contributions, in particular the demonstration of rate improvement with optimized LDPC codes for DNN-TEQ, this paper is distinguished from our preliminary reports[48, 20, 21]. To the best of authors’ knowledge, there is no other literature which applied DNN to TEQ for nonlinear compensation.
Ii Machine Learning for Optical Communications
Ii-a Trend Overview
Fiber-optic communications suffer from various linear and nonlinear impairments, such as laser linewidth, amplified spontaneous emission (ASE) noise, chromatic dispersion (CD), polarization mode dispersion (PMD), self-phase modulation (SPM), cross-phase modulation (XPM), four-wave mixing (FWM), and cross-polarization modulation (XPolM)[22, 23, 24]. Although the physics is well governed by nonlinear Schrödinger equation (NLSE) model, we may need high-complexity split-step Fourier method (SSFM) to solve lightwave propagation numerically. It is hence natural to admit that the nonlinear physics necessitates nonlinear signal processing to appropriately deal with the nonlinear distortions in practice.
In place of conventional model-based nonlinear signal processing, the application of machine learning techniques[1, 2, 3] to optical communication systems has recently received increased attention[4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21]. The promise of such data-driven approaches is that learning a black-box DNN could potentially overcome situations where limited models are inaccurate and complex theory is computationally intractable.
Fig. 1 shows the trend of machine learning applications in optical communications society in the past two decades. Here, we plot the number of articles in each year according to Google Scholar search of the keyword combinations; “machine learning” + “optical communication” or “deep learning” + “optical communication.” As we can see, machine learning has been already used for optical communications since twenty years ago. Interestingly, we discovered the Moore’s law in which the number of applications exponentially grows by a factor of nearly per year. For deep learning applications, more rapid annual increase by a factor of can be found in the past half decade. As of today, there are nearly thousand articles of deep learning applications. Note that the author’s article in 2014 is one of very first papers discussing the application of deep learning to optical communications.
Ii-B Statistical Learning Techniques
We briefly overview some learning techniques to analyze nonlinear statistics applied to optical communications as shown in Fig. 2
. For example, density estimation trees (DET), kernel density estimation (KDE) and Gaussian mixture model (GMM) can be alternative to histogram analysis. Principal component analysis (PCA) and independent component analysis (ICA) are useful to analyze important factors of data. For high-dimensional data sets, we may use Markov-chain Monte–Carlo (MCMC) and importance sampling (IS). To analyze stochastic sequence data, extended Kalman filter (EKF), unscented Kalman filter (UKF), and particle filter (PF) based on hidden Markov model (HMM) may be used.
Since mid-70’s, artificial neural networks (ANN) have led machine learning researches. Various topology including multi-layer perceptron (MLP), Hopfield neural networks (HNN), restricted Boltzmann machines (RBM), convolutional neural networks (CNN), and recurrent neural networks (RNN) have been investigated. Since mid-90’s, support vector machine (SVM) has taken over the lead for machine learning. One of important techniques to analyze nonlinear statistics is kernel trick, in which we analyze higher-dimensional linearlized feature spaces called reproducing kernel Hilbert space (RKHS) with kernel functions including radial basis function (RBF). Since 2006, deep learning
based on DNN has been a major breakthrough in media signal processing fields. In deep learning, many-layer deep belief networks (DBN) is trained with a massively large amount of datasets.
Ii-C Classic Machine Learning Applications
Now, we show a few examples of machine learning approaches applied to nonlinear fiber-optic communications. Xie et al. proposed the use of ICA for polarization recovery as an alternative to constant-modulus adaptation (CMA). Shallow ANN-based nonlinear equalizers have been studied in literature[40, 41, 42]. We have investigated GMM-based sliding MLSE and TEQ receivers, where up-to dB performance improvement was achieved compared to DBP. SVM has been also studied as another nonlinear equalizer[43, 44], in which a complicated decision rule like Yin–Yang spiral boundary can be learned by kernel-SVM. RBF kernels have been studied in other literature, e.g., . HMM-based turbo cycle-slip recovery offers greater than dB gain. A stochastic DBP proposed in  exhibits an outstanding performance by solving inverse NLSE with SSFM, which adopts MCMC particle representation of stochastic noise.
Ii-D Modern Deep Learning Applications
As shown in Fig. 1
, there exist a lot of deep learning applications, among which a limited number of examples are listed below. DNN was introduced for optical signal-to-noise ratio (OSNR) monitoring in. Modulation classification as well as OSNR monitoring was considered in , and a deep CNN showed an accurate performance in . Deep learning-based network management and resource allocation were studied in  and 
. Analogously, traffic optimization based on deep reinforcement learning (DRL) was also considered in[9, 10]. Various end-to-end deep learning which jointly optimizes signal constellation and detection have been proposed, e.g., [11, 12, 13, 14, 15], where denoising auto-encoder (AE) architecture is trained through nonlinear fiber channels. Also for receiver-end design, many DNN equalizers to compensate for fiber nonlinearity were introduced for coherent or non-coherent optical links, e.g., [16, 17, 18, 19, 20, 21].
Note that big data necessary for deep learning are readily available in high-speed optical communications, where we can obtain gigabits or terabits of data in a second
. In addition, the DNN is massively parallelizable in hardware implementation, which is suited for future optical communications. In modern DNN, various techniques have been introduced, e.g., pre-training, mini-batch, rectified linear unit (ReLU), dropout, batch normalization, skip connection, inception, adaptive-momentum (Adam) stochastic gradient, adversarial, and long short-term memory (LSTM) architectures.
Iii Deep Learning for Nonlinear Compensation
Similar to the other DNN equalizers, we focus on deep learning for fiber nonlinearity compensation. This paper has a distinguished contribution over existing literature as we propose a novel DNN-based TEQ suited for BICM-ID systems where state-of-the-art LDPC codes are employed.
Iii-a Nonlinear Fiber-Optic Communications System
The optical communications system under consideration is depicted in Fig. 3. Three-channel DP-QAM signals for GBd baud rate and GHz channel spacing are sent over fiber plants towards coherent receivers. We consider spans of dispersion managed (DM) links with km non-zero dispersion-shifted fiber (NZDSF) at a residual dispersion per span (RDPS) of %. The NZDSF has a dispersion parameter of ps/nm/km, a nonlinear factor of /W/km, and an attenuation of dB/km. The span loss is compensated by Erbium-doped fiber amplifiers (EDFA) with all ASE noise added just before the receiver assuming the noise figure of dB. We use digital root-raised cosine filters with % rolloff at both transmitter and receiver. The receiver employs standard phase recovery and linear equalization (LE) to compensate for linear dispersion. Due to fiber nonlinearity, residual distortion after LE will limit the achievable information rates.
Fig. 4 shows an example of residual distortion of DP-16QAM constellation after -tap least-squares LE for -span transmissions. We can see that the constellation is more seriously distorted with the increased launch power due to Kerr fiber nonlinearity. To compensate for the residual nonlinear distortion, we introduce DNN-based TEQ, which exploits soft-decision feedback from FEC decoder as shown in Fig. 3.
Iii-B Scalable Deep Neural Network Equalization
Before introducing DNN-TEQ, we discuss loss function to train DNN equalizers suited for BICM. Consider DP-16QAM equalization, where there arebits per symbol, leading to classes to identify. For such multi-class learning, we may use a single nonbinary softmax classification shown in Fig. 5(LABEL:sub@dnn_single), analogous to . However, this nonbinary (NB) DNN does not perform well for higher-order DP-QAM in particular for a limited number of training data. For example, DP-64QAM requires classes to identify per symbol, which necessitates unrealistically huge data sets for training.
To be scalable in high-order QAM, we shall use multi-label classification which employs multiple BCE losses as shown in Fig. 5(LABEL:sub@dnn_multi). The multi-label DNN produces log-likelihood ratio (LLR), which can be directly fed into SD-FEC decoder without external processing such as [16, 49]. This is a great advantage in practice because LLR calculation is cumbersome, especially for high-order and high-dimensional modulation. Note that sum of cross-entropy minimization is equivalent to maximizing the lower bound of generalized mutual information (GMI), which is used for SD-FEC performance metric.
Iii-C Nonbinary vs. Binary DNN Equalization
We compare DNN and LSTM with classical machine learning methods, specifically, linear discriminant analysis (LDA), naïve Bayes (NB), quadratic discriminant analysis (QDA), and SVM. For multi-class SVM, we use one-vs-one rule with linear kernel as it worked best among several variants such as one-vs-all and polynomial kernel. The DNN weight is trained by Adam with a dropout ratio of and a batch size of symbols to minimize a sum of softmax cross-entropy loss across all labels, using approximately training symbols. Figs. 6, 7, and 8 show the Q factor versus launch power of DP-4QAM, DP-16QAM, and DP-64QAM, respectively, for , , and spans times km fiber configurations. It is observed that DNN can offer the best performance among other methods, achieving greater than dB gain over LE in highly nonlinear regimes. More importantly, the conventional DNN with nonbinary softmax cross-entropy does not perform well for high-order QAMs. It suggests that DNN equalizers using BCE loss function has a great advantage not only for BICM compatibility but also for high-order QAM scalability.
Iv Neural Turbo Equalization: DNN-TEQ
Iv-a Nested Residual Network Architecture
Fig. 9 shows the architecture of our turbo DNN equalizer, which feeds distorted DP-QAM signals over consecutive -tap symbols to generate soft-decision LLR values for FEC decoding. The major extension from conventional DNN lies in the input layer which takes a priori (APR) side information along with DP-QAM symbols. The APR side information comes from FEC decoder representing intermediate soft-decision LLRs in run time. For efficient DNN training, the APR values having mutual information of
are synthetically generated via a Gaussian distribution followingwhere is an original bit and with being ten Brink’s J-inverse function , instead of considering a particular FEC decoder feedback.
The last layer has two branches, i.e., extrinsic (EXT) output and a posterioriprobability (APP) output, which uses a skip connection from the input layer to sum up EXT and APR at a target symbol. This nested residual network tries to train extrinsic message passing for TEQ realization. It was found that learning DNN model to minimize APP cross-entropy loss does not always minimize EXT cross-entropy loss accordingly, and vice versa. In order to keep both APP and EXT outputs reliable, we use a max-pooling layer following sigmoid cross-entropy loss.
The DNN uses four hidden layers, each of which consists of batch normalization, ReLU activation, and a fully-connected linear layer with skip connections and % dropout for neuron nodes. The DNN is trained with Adam for a mini-batch size of symbols to minimize the worst sigmoid cross-entropy losses between APP and EXT outputs, using training datasets of approximately symbols. An early stopping with a patience of is carried out up to a maximum of epochs.
Iv-B EXIT Chart Analysis
Fig. 10 shows the EXIT chart of DNN-TEQ given LLRs having a certain mutual information from the FEC decoder. It is clearly observed that the DNN outputs can be greatly improved by feeding in the FEC soft-decision. An almost linear slope towards in EXIT curve is achieved, implying that cross-entropy loss is mitigated linearly with FEC feedback reliability. This steep slope in the EXIT curve of DNN-TEQ can eventually make a significant improvement in LDPC decoding performance, as shown in Fig. 11, where we present the decoding trajectory between the variable-node decoder (VND) and the check-node decoder (CND) in the LDPC decoder. Here, we use a combined EXIT chart  of DNN-TEQ and LDPC decoder, for DP-16QAM 16-span DM links at dBm launch power and DVB-S2 LDPC codes with a code rate of . As shown, the conventional DNN equalizer without FEC feedback requires a large number of decoder iterations to reach an error-free mutual information of . Whereas for DNN-TEQ, we can open up an EXIT tunnel between VND and CND curves, that leads to a considerable acceleration of the decoder convergence to reach error-free condition within only a few iterations.
Iv-C BER Performance
We assume the use of an outer Bose–Chaudhuri–Hocquenghem (BCH) code with a rate of , having a minimum Hamming distance of . Based on the union (upper) bound, the bit-error rate (BER) threshold for this outer BCH code is at or above an input BER of to achieve an output BER below . Hence, a post-LDPC BER below can be successfully decoded to a BER below when this outer BCH code is used.
For FEC codes, we consider variable-rate irregular LDPC codes of block length bits, used in DVB-S2 standards. The LDPC codes have a different degree distribution for individual code rates. For instance at a code rate of , the variable degree polynomial (node perspective) is given as , whereas the check degree polynomial is . At a code rate of , the variable and check degree polynomials are and , respectively. We also consider an optimized degree distribution for DNN-TEQ as done analogously in , where the EXIT chart of DNN-TEQ in Fig. 10
is modeled with cubic functions and EXIT curves of combined VND and DNN-TEQ are optimized for triple-degree check-concentrated distribution, which has two degrees of freedom to search for the best distribution. For example, the optimized LDPC code for a code rate ofat a launch power of dBm for DP-64QAM systems has a degree distribution of .
Figs. 12 and 13 show the post-LDPC BER performance versus launch power of DP-16QAM and DP-64QAM, respectively, for , and spans of NZDSF links. We compare DVB-S2 LDPC codes for LE, DNN and DNN-TEQ and our optimized LDPC code for DNN-TEQ. From the figures, we can observe the following results:
Although DNN nonlinear compensation can improve BER performance of LE, achieving a BER of BCH threshold is mostly in failure.
DNN-TEQ can significantly improve the BER performance of DNN to reach the threshold and about dB margin around optimal launch power is realized.
Optimizing LDPC codes for DNN-TEQ can offer an additional marginal improvement over the standard DVB-S2 LDPC codes for the whole range of launch power.
Iv-D Achievable Rate Performance
The BER improvement with our proposed DNN-TEQ implies that we can increase the achievable throughput when the code rate is adaptively optimized. Fig. 14 shows achievable rate performance for DP-64QAM at -span NZDSF links. Here, we use the same variable node degree of DVB-S2 rate and plot the largest code rate such that the post-LDPC BER meets the BCH threshold by varying the check node degree to be a target rate. From this figure, we can see that the DNN nonlinear compensation can improve the performance of LE by b/s/Hz in the nonlinear regimes, and the achieved gain in the peak throughput is about b/s/Hz. Our DNN-TEQ offers a remarkable BICM-ID gain over the whole range of launch power, achieving a throughput improvement of b/s/Hz over the DNN when LDPC code is optimized. A total throughput improvement of b/s/Hz from the standard LE was achieved by the proposed DNN-TEQ.
We extended DNN machine learning techniques to TEQ for improved nonlinear compensation in coherent fiber communications. We first verified that DNN trained with binary cross-entropy loss can outperform various machine learning techniques to compensate for fiber nonlinearity. Through EXIT chart analysis, we then confirmed that the proposed DNN-TEQ offers decoder acceleration by feeding intermediate soft-decision LLR from the LDPC decoder. Our DNN-TEQ significantly improves BER performance through the turbo iteration. We also investigated LDPC code design to match the EXIT chart of DNN-TEQ, and demonstrated that the proposed DNN-TEQ with optimized LDPC codes can improve the achievable throughput by b/s/Hz over linear equalization with standard LDPC codes. To the best of authors’ knowledge, this is the first paper investigating TEQ based on DNN for fiber nonlinearity mitigation.
-  G. Hinton, S. Osindero, and Y.-W. Teh, “A fast learning algorithm for deep belief nets,” Neural Computation, vol. 18, no. 7, pp. 1527–1554, July 2006.
-  Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, no. 7553, pp. 436–444, May 2015.
-  S. Hockreiter and J. Schmidhuber, “Long short-term memory,” Neural Comput., vol. 9, no. 8, pp. 1735–1780, Nov. 1997.
-  T. Tanimura, T. Hoshida, T. Kato, S. Watanabe, J. C. Rasmussen, M. Suzuki, and H. Morikawa, “Deep learning based OSNR monitoring independent of modulation format, symbol rate and chromatic dispersion,” European Conference on Optical Communication (ECOC) Sep. 2016.
-  F. N. Khan, K. Zhong, X. Zhou, W. H. Al-Arashi, C. Yu, C. Lu, and A. P. Lau, “Joint OSNR monitoring and modulation format identification in digital coherent receivers using deep neural networks,” Opt. Expr. vol. 25, no. 15, pp. 17767-17776, July 2017.
-  D. Wang, M. Zhang, J. Li, Z. Li, J. Li, C. Song, and X. Chen, “Intelligent constellation diagram analyzer using convolutional neural network-based deep learning,” Opt. Expr., vol. 25, no. 15, pp. 17150-17166, July 2017.
-  J. Guo and Z. Zhu, “When deep learning meets inter-datacenter optical network management: Advantages and vulnerabilities,” J. Lightw. Technol., vol. 36, no. 20, pp. 4761–4773, Oct. 2018.
-  A. Yu, H. Yang, W. Bai, L. He, H. Xiao, and J. Zhang, “Leveraging deep learning to achieve efficient resource allocation with traffic evaluation in datacenter optical networks,” Optical Fiber Commun. Conf. (OFC), Mar. 2018.
-  X. Luo, C. Shi, L. Wang, X. Chen, Y. Li, and T. Yang, “Leveraging double-agent-based deep reinforcement learning to global optimization of elastic optical networks with enhanced survivability,” Opt. Expr., vol. 27, no. 6, pp. 7896–7911, Mar. 2019.
-  Y. Tang, H. Guo, T. Yuan, X. Gao, X. Hong, Y. Li, J. Qiu, Y. Zuo, and J. Wu, “Flow Splitter: A deep reinforcement learning-based flow scheduler for hybrid optical-electrical data center network,” IEEE Access, vol. 7, pp. 129955–65, Sep. 2019.
-  C. Ye, D. Zhang, X. Hu, X. Huang, H. Feng, and K. Zhang, “Recurrent neural network (RNN) based end-to-end nonlinear management for symmetrical 50Gbps NRZ PON with 29dB+ loss budget,” European Conf. Optical Commun. (ECOC), Sep. 2018.
-  B. Karanov, M. Chagnon, F. Thouin, T. A. Eriksson, H. Bülow, D. Lavery, P. Bayvel, and L. Schmalen, “End-to-end deep learning of optical fiber communications,” J. Lightw. Technol., vol. 36, no. 20, pp. 4843–55, Aug. 2018.
S. Li, C. Häger, N. Garcia, and H. Wymeersch, “Achievable information rates for nonlinear fiber communication via end-to-end autoencoder learning,”European Conf. Optical Commun. (ECOC), Sep. 2018.
-  R. T. Jones, T. A. Eriksson, M. P. Yankov, and D. Zibar, “Deep learning of geometric constellation shaping including fiber nonlinearities,” European Conf. Optical Commun. (ECOC), Sep. 2018.
-  M. Chagnon, B. Karanov, and L. Schmalen, “Experimental demonstration of a dispersion tolerant end-to-end deep learning-based IM-DD transmission system,” European Conf. Optical Commun. (ECOC), Sep. 2018.
-  R. Rios-Müller, J. M. Estarán, and J. Renaudier, “Experimental estimation of optical nonlinear memory channel conditional distribution using deep neural networks,” Optical Fiber Commun. Conf. (OFC), p. W2A5.1, Mar. 2017.
-  C. Y. Chuang, C. C. Wei, T. C. Lin, K. L. Chi, L. C. Liu, J. W. Shi, Y. K. Chen, and J. Chen, “Employing deep neural network for high speed 4-PAM optical interconnect,” European Conf. Optical Commun. (ECOC), Sep. 2017.
-  V. Kamalov, L. Jovanovski, V. Vusirikala, S. Zhang, F. Yaman, K. Nakamura, T. Inoue, E. Mateo, and Y. Inada. “Evolution from 8QAM live traffic to PS 64-QAM with neural-network based nonlinearity compensation on 11000 km open subsea cable,” Optical Fiber Commun. Conf. (OFC), p. Th4D-5, Mar. 2018.
-  P. Li, L. Yi, L. Xue, and W. Hu, “56 Gbps IM/DD PON based on 10G-class optical devices with 29 dB loss budget enabled by machine learning,” Optical Fiber Commun. Conf. (OFC), p. M2B.2, Mar. 2018.
-  T. Koike-Akino, D. S. Millar, K. Parsons, and K. Kojima, “Fiber nonlinearity equalization with multi-label deep learning scalable to high-order DP-QAM,” SPPCom, p. SpM4G.1, July 2018.
-  T. Koike-Akino, Y. Wang, D.S. Millar, K. Kojima, and K. Parsons, “Neural turbo equalization to mitigate fiber nonlinearity,” European Conf. Optical Commun. (ECOC), p. Tu.1.B.1, Sep. 2019.
-  A. D. Ellis, J. Zhao, and D. Cotter, “Approaching the non-linear Shannon limit,” J. Lightw. Technol., vol. 28, no. 4, pp. 423–433, Aug. 2009.
-  M. Secondini, E. Forestieri, and G. Prati, “Achievable information rate in nonlinear WDM fiber-optic systems with arbitrary modulation formats and dispersion maps,” J. Lightw. Technol., vol. 31, no. 23, pp. 3839–3852, Dec. 2013.
-  J. Renaudier, G. Charlet, P. Tran, M. Salsi, and S. Bigo, “A performance comparison of differential and coherent detections over ultra long haul transmission of 10Gb/s BPSK,” Optical Fiber Commun. Conf. (OFC), p. OWM1, Mar. 2007.
-  N. Alić, G. C. Papen, R. E. Saperstein, L. B. Milstein, and Y. Fainman, “Signal statistics and maximum likelihood sequence estimation in intensity modulated fiber optic links containing a single optical preamplifier,” Opt. Express, vol. 13, no. 12, pp. 4568–4579, June 2005.
-  Y. Cai, D. G. Foursa, C. R. Davidson, J. X. Cai, O. Sinkin, M. Nissov, and A. Pilipetskii, “Experimental demonstration of coherent MAP detection for nonlinearity mitigation in long-haul transmissions,” Optical Fiber Commun. Conf. (OFC), p. OTuE1, Mar. 2010.
-  T. Koike-Akino, C. Duan, K. Parsons, K. Kojima, T. Yoshida, T. Sugihara, and T. Mizuochi, “High-order statistical equalizer for nonlinearity compensation in dispersion-managed coherent optical communications,” Opt. Expr., vol. 20, no. 14, pp. 15769–15780, July 2012.
-  I. B. Djordjevic, L. L. Minkov, and H. G. Batshon, “Mitigation of linear and nonlinear impairments in high-speed optical networks by using LDPC-coded turbo equalization,” IEEE JSAC, vol. 26, no. 6, pp. 73–83, Aug. 2008.
-  H. G. Batshon, I. B. Djordjevic, L. Xu, and T. Wang, “Iterative polar quantization based modulation to achieve channel capacity in ultra-high-speed optical communication systems,” IEEE Photon. Journal, vol. 2, no. 4, pp. 593–599, Aug. 2010.
-  C. Duan, K. Parsons, T. Koike-Akino, R. Annavajjala, K. Kojima, T. Yoshida, T. Sugihara, and T. Mizuochi, “A low-complexity sliding-window turbo equalizer for nonlinearity compensation,” Optical Fiber Commun. Conf. (OFC), p. JW2A.59, Mar. 2012.
-  K. V. Peddanarappagari and M. Brandt-Pearce, “Volterra series transfer function of single-mode fibers,” J. Lightw. Technol., vol. 15, no. 12, pp. 2232–2241, Dec. 1997.
F. P. Guiomar, J. D. Reis, A. Teixeira, and A. N. Pinto, “Mitigation of intra-channel nonlinearities using a frequency-domain Volterra series equalizer,”European Conf. Optical Commun. (ECOC), p. Tu.6.B.1, Sep. 2011.
-  F. P. Guiomar, S. B. Amado, N. J. Muga, J. D. Reis, A. L. Teixeira, and A. N. Pinto, “Simplified Volterra series nonlinear equalizer by intra-channel cross-phase modulation oriented pruning,” European Conf. Optical Commun. (ECOC), p. We3C6, Sep. 2013.
-  X. Li, X. Chen, G. Goldfarb, E. Mateo, I. Kim, F. Yaman, and G. Li, “Electronic post-compensation of WDM transmission impairments using coherent detection and digital signal processing,” Opt. Express, vol. 16, no. 2, pp. 880–888, Jan. 2008.
-  E. Ip and J. M. Kahn, “Compensation of dispersion and nonlinear impairments using digital backpropagation,” J. Lightw. Technol., vol. 26, no. 20, pp. 3416–3425, Oct. 2008.
-  E. Ip, N. Bai, and T. Wang, “Complexity versus performance tradeoff in fiber nonlinearity compensation using frequency-shaped, multi-subband backpropagation,” Optical Fiber Commun. Conf. (OFC), p. OThF4, Mar. 2011.
-  W. Yan, Z. Tao, L. Dou, L. Li, S. Oda, T. Tanimura, T. Hoshida, and J. C. Rasmussen, “Low complexity digital perturbation back-propagation,” European Conf. Optical Commun. (ECOC), p. Tu.3.A.2, Sep. 2011.
-  N. Irukulapati, H. Wymeersch, and P. Johannisson, “Extending digital backpropagation to account for noise,” European Conf. Optical Commun. (ECOC), p. We.3.C.4, Sep. 2013.
-  X. Xie, F. Yaman, X. Zhou, and G. Li, “Polarization demultiplexing by independent component analysis,” IEEE Photon. Technol. Lett., vol. 22, no. 11, pp. 805–807, June 2010.
-  M. A. Jarajreh, E. Giacoumidis, I. Aldaya, S. T. Le, A. Tsokanos, Z. Ghassemlooy, and N. J. Doran, “Artificial neural network nonlinear equalizer for coherent optical OFDM,” IEEE Photon. Technol. Lett., vol. 27, no. 4, pp. 387–390, Dec. 2014.
-  E. Giacoumidis, S. T. Le, M. Ghanbarisabagh, M. McCarthy, I. Aldaya, S. Mhatli, M. A. Jarajreh, P. A. Haigh, N. J. Doran, A. D. Ellis, and B. J. Eggleton, “Fiber nonlinearity-induced penalty reduction in CO-OFDM by ANN-based nonlinear equalization,” Opt. Lett., vol. 40, no. 21, pp. 5113–5116, Nov. 2015.
-  H. Zhao and J. Zhang, “Adaptively combined FIR and functional link artificial neural network equalizer for nonlinear communication channel,” IEEE Trans. Neural Networks, vol. 20, no. 4, pp. 665–674, Feb. 2009.
-  D. J. Sebald and J. A. Bucklew, “Support vector machine techniques for nonlinear equalization,” IEEE Trans. Signal Processing, vol. 48, no. 11, pp. 3217–3217, Nov. 2000.
-  E. Giacoumidis, S. Mhatli, T. Nguyen, S. T. Le, I. Aldaya, M. McCarthy, and B. Eggleton, “Kerr-induced nonlinearity reduction in coherent optical OFDM by low complexity support vector machine regression-based equalization,” Optical Fiber Commun. Conf. (OFC), p. Th2A-49, Mar. 2016.
-  K.-P. Ho and J. M. Kahn, “Electronic compensation technique to mitigate nonlinear phase noise,” J. Lightw. Technol., vol. 22, no. 3, pp. 779–783, Mar. 2004.
S. Chen, B. Mulgrew, and P. M. Grant, “A clustering technique for digital communications channel equalization using radial basis function networks,”IEEE Trans. Neural Networks, vol. 4, no. 4, pp. 570–590, July 1993.
-  T. Koike-Akino, K. Kojima, D. S. Millar, K. Parsons, Y. Miyata, W. Matsumoto, and T. Mizuochi, “Cycle slip-mitigating turbo demodulation in LDPC-coded coherent optical communications,” Optical Fiber Commun. Conf. (OFC), p. M3A-3, Mar. 2014.
-  T. Koike-Akino, “Perspective of statistical learning for nonlinear equalization in coherent optical communications,” SPPCom, p. ST2D-2, July 2014.
-  T. Yoshida, K. Matsuda, K. Kojima, H. Miura, K. Dohi, M. Pajovic, T. Koike-Akino, D. S. Millar, K. Parsons, and T. Sugihara, “Hardware-efficient precise and flexible soft-demapping for multi-dimensional complementary APSK signals,” European Conf. Optical Commun. (ECOC), p. Th.2.P2.SC3.27, Sep. 2016.
-  C. Häer and H. D. Pfister, “Nonlinear interference mitigation via deep neural networks,” Optical Fiber Commun. Conf. (OFC), p. W3A.4, Mar. 2018.
-  D. S. Millar, R. Maher, D. Lavery, T. Koike-Akino, M. Pajovic, A. Alvarado, M. Paskov, K. Kojima, K. Parsons, B. C. Thomsen, S. J. Savory, and P. Bayvel. “Design of a 1 Tb/s superchannel coherent receiver,” J. Lightw. Technol., vol. 34, no. 6, pp. 1453–1463, Mar. 2016.
-  S. ten Brink, G. Kramer, and A. Ashikhmin, “Design of low-density parity-check codes for modulation and detection,” IEEE TCOM, vol. 52, no. 4, pp. 670–678, May 2004.