## I Introduction

The growing demand for reliable long-distance communication makes it essential to determine the capacity limits of optical links [Monroe2016, Agrell:2016, Essiambre:2010]. The emergence of coherent optical communication systems enabled DSP and polarization-division multiplexing to achieve higher spectral efficiencies. However, higher data rates come with increased sensitivity to impairments such as PMD and SOP fluctuations, which must be tracked dynamically in the receiver [Savory:2010]. Due to the random variation of both internal and environmental impairments, the SOP fluctuates randomly with time [Poole:1988]. Previous long-term measurements showed that the SOP drift might vary from days and hours in buried fibers [Karlsson:2000, Allen:2003] to microseconds in aerial fibers [Waddy:2001, Krummrich:2016, Zhang:2006].

The conventional DSP solutions for SOP tracking are replete with both blind and data-aided algorithms. The CMA [Godard:1980], thanks to its low complexity and tolerance to PN, has been widely considered for blind polarization tracking [Savory:2007, Savory:2008, Kikuchi:2008]. Many studies have been conducted to overcome the so-called “singularity” problem of CMA [Kikuchi:2011, Liu:2009, Faruk:2010], where the two estimated channels converge twice to the same polarization. The RDE and its variants were proposed to account for higher-order QAM in [Ready_RDE:1990, Fatadin_RDE:2009, Lavery_RDE:2015]. The MMA [Yang_MMA:2002] is also applicable to higher-order modulations. More reliable convergence can be obtained using data-aided algorithms such as LMS, which was adopted for SOP estimation by [Savory:2008, Fludger_LMS:2008], or standard LS, which has been extensively studied for both wireless and optical applications [Barhumi:2003, Ip:2007, Shieh:2008, Kuschnerov:2010]. While the LMS algorithm performs real-time continuous equalization, LS applies to block-based estimation.

While the DP channel can be represented by a unitary matrix, the majority of the available estimation techniques including LS and LMS have no unitary constraint on the estimated channel, leading to sub-optimal solutions. This makes room for algorithms that provide a unitary estimation of the DP channel. Louchet et al. [Louchet:2014] proposed the Kabsch algorithm [Kabsch:1976] as a joint blind PN and PMD estimation method. In [Czegledi:2016], a blind modulation-format-independent joint PN and polarization tracking algorithm was proposed, and its performance was compared with the blind Kabsch algorithm.

Depending on the estimation algorithm, the speed of the fluctuations, or the additive noise, the polarization tracking might be imperfect. This makes it relevant to ask how the capacity is affected by such an estimation error? While there are no capacity studies regarding the polarization drift channels, to the best of our knowledge, the literature is replete with fiber capacity results. The first finite capacity limit for the single-wavelength optical channel based on a sort of GN model was introduced in [Splett:1994, Kurtzke:1995], showing a lower bound that increases with the power until it reaches a maximum, and then it decreases due to the NLI. Although different nonlinear models were applied, similar results were reported in [Green:2002, Mitra:2002]. Since the GN model cannot truly describe the NLI [Agrell:2014], some modeled the NLI as a linear time-variant distortion highly correlated over time [Secondini:2012, Secondini:2013]. This made it possible to counteract NLI using the techniques that were conventionally used for linear impairments (e.g., PN, chromatic dispersion, PMD, and SOP [Secondini:2014, Dar:2014_1, Dar:2015]). For example, in [Dar_BlockWise:2014], a tighter lower bound for single-polarization, dispersion-unmanaged systems was achieved by considering the PN.

In this paper, for the first time, we study the capacity of the polarization drift channel in the presence of imperfect knowledge of the channel (e.g., imperfect polarization tracking). Using the mismatched decoding method, we derive an AIR, which is a lower bound on the capacity, in the presence of a channel estimation error. We show that a unitary estimation of the channel leads to a tighter lower bound, which makes it reasonable to seek unitary estimators. A data-aided version of the Kabsch algorithm is proposed for DP channel estimation, for the first time to the best of our knowledge. We compare Kabsch with an LS algorithm in terms of AIR and show that Kabsch outperforms LS throughout the range of considered SNR. For instance, with only eight pilot symbols per channel and block, Kabsch improves the AIR by at least bits per symbol compared to LS.

Notation:

Column vectors are denoted by underlined letters

and matrices by uppercase roman letters . We use bold-face letters for random quantities and the corresponding nonbold letters for their realizations. pdf are denoted as and conditional pdf as, where the subscripts will sometimes be omitted if they are clear from the context. Expectation over random variables is denoted by

. Sets are indicated by uppercase calligraphic letters. The complex zero-mean circularly symmetric Gaussian distribution of a vector is denoted by

, where is the covariance matrix. All logarithms are in base two. In the context of matrix operations, , , , and represent the determinant, transpose, conjugate transpose, and Frobenius norm operators, respectively.## Ii Channel Model and Mutual Information

### Ii-a Channel Model

We consider transmission over channels in the presence of ASE noise at the receiver. The channel is assumed to be constant during a transmission block of length , and changes randomly and independently between the blocks. The assumption that the channel does not change within a block is consistent with the fact that the SOP drifts at a much slower rate than typical transmission rates in optical links [Karlsson:2000, Allen:2003] and is well established in optical communications literature [Dar:2014_1, Dar_BlockWise:2014]. The block length is chosen based on the application and the drift speed of the channel. In most optical communication systems, there is no feedback channel between the transmitter and the receiver. Therefore, the problem of input distribution optimization, corresponding to the case that the channel is known at the transmitter, is not investigated. It is assumed that PMD is negligible and all channel impairments, including nonlinearities and chromatic dispersion are ideally compensated, with the exception of polarization fluctuation and ASE noise, which are modeled by a unitary matrix and AWGN, respectively.

Since different blocks are independent, we just model the symbols in one transmission block in the following. The transmitted signal in each channel at time is an -dimensional random vector that takes on values from a set of zero-mean constellation points. After filtering and resampling the received signals into one sample per symbol, the vector of received samples can be expressed as

(1) |

where the matrix represents a MIMO channel and denotes the complex ASE noise samples at time , which is assumed to be and independent of . In the remainder of this paper, we will omit the time index explicitly for notational convenience. This is possible because the input and noise are independent and identically distributed (i.i.d.) over time. We define the covariance matrix of the input vector as

(2) |

In order to maintain the generality of the results, they are given for an arbitrary number of channels whenever possible. Thus, the MIMO-AWGN channel (1) can describe a wide range of applications and impairments. However, for the purpose of DP optical channel modeling, we are particularly interested in the special cases of and being unitary, denoted by .

### Ii-B Mutual Information with Perfect Knowledge of Channel at the Receiver

The conditional MI between two random vectors , , when the channel is given, is defined as [CoverBook, Eq. 2.61]

(3) |

The capacity of this channel under an average power constraint is [Shannon:1948]

(4) |

where is the transmission power constraint. For the MIMO-AWGN channel in (1), the channel law given and is characterized by the pdf

(5) |

Then, the pdf of the channel output can be calculated as

(6) |

Given , the capacity-achieving distribution of the MIMO-AWGN channel law (5) is [Telatar:1999]. Therefore, assuming inputs,

(7) |

where

(8) |

is the covariance matrix of the received samples when the channel is given. The MI of a general MIMO system for a given channel is [Telatar:1999]

(9) |

where

(10) |

and hence , where is the SNR of each channel. If the channel matrix is confined to the set of unitary matrices (i.e., ), (9) gives

(11) |

The capacity of a unitary MIMO-AWGN channel is the supremum of (11) for all possible . In general, needs to be optimized for each realization of the channel , if it is known at the transmitter; however, based on (11), the MI is independent of . Since maximizing (11) is equivalent to maximizing and is positive definite, the nondiagonal elements of must be zero, yielding

to be a diagonal positive definite matrix. From the well-known theorem that the geometric mean is always upper-bounded by the arithmetic mean, it is straight-forward to show that uniform power distribution at the transmitter (i.e.,

) maximizes (11). Thus, the capacity of an -dimensional unitary MIMO-AWGN channel, still assuming that the channel is perfectly known at the receiver, is(12) |

The capacity of the DP channel is given by simply setting in (12).

For a uniform input distribution over a given constellation , the integral in (6) must be replaced by a summation over all the constellation points, i.e.,

(13) |

where

denotes the probability mass function of the input vector and

is defined in (5). Using (3) and (13), the MI of the channel for a uniformly distributed discrete input can be expressed as

(14) |

where the expectations can be estimated numerically.

## Iii the Mutual information in presence of Channel Estimation Error

We derived the capacity of the DP channel with the assumption of perfect channel knowledge at the receiver in (12). In this section, we derive a lower bound on the MI of the channel in the presence of an estimation error. As already shown in [Telatar:1999], when the channel is known, the capacity-achieving distribution is a zero-mean input with a power constraint. Thus, keeping the capacity-achieving distribution, i.e., seems reasonable for the imperfectly estimated channel as well. To derive the AIR , which is a lower bound on the MI (3) between and , the mismatched decoding inequality [Agrell_Duality_2017]

(15) |

is used, where stands for the pdf of an auxiliary channel. Note that this inequality holds for an arbitrary distribution of . The mismatched channel law is here assumed to be

(16) |

which is obtained by replacing in (5) with the estimated channel . This leads to

(17) |

Also, the covariance matrix of the auxiliary channel’s output is obtained by replacing with in (II-B), i.e.,

(18) |

The average AIR when the estimated channel is random can be written as

(19) |

In the following, we first derive an AIR for the MIMO-AWGN channel model with a fixed channel and estimated channel . Then, we extend the derived AIR to -dimensional unitary channels. By assuming a unitary estimate of the channel (i.e., ), a tighter AIR is derived. Finally, we use (19) to consider a random estimated channel .

###### Theorem 1

###### Proof:

One may rewrite (15) as

(21) |

Using (7) and (17), given , the first term can be calculated as

(22) |

where the last step follows the cyclic permutation rule of the trace. For the second term, , when is given, we use (5) and (16) to obtain

(23) | ||||

(24) | ||||

(25) |

where (23) follows from the fact that and are independent, and the cyclic permutation rule of the trace is used in (24). Finally, substituting (22) and (25) into (21) completes the proof.

For the sake of keeping Theorem LABEL:Th:Lower_Bound as general as possible, no assumption is made about the transmitter knowledge of the estimated channel. However, in the context of the optical communications, we are more interested in the case that the transmitter has no channel knowledge. When the channel is not known at the transmitter, the only reasonable choice for is to apply uniform power distribution between the channels.

###### Corollary 1

corollary:1 Assume that the channel is unitary (i.e., ) and that uniform power distribution takes place at the transmitter. Then the AIR of a unitary channel can be written as

(26) |

###### Proof:

###### Corollary 2

corollary:2 Assume that the channel is a fixed unitary matrix

and that the estimated channel is an arbitrary random matrix

. Then with a uniform power distribution, the average AIR is(27) |

where

(28) |

###### Corollary 3

corollary:3 If has a spherically symmetric distribution, then (2) gives the same for any .

###### Proof:

Since , we can write

(29) |

and since has a spherically symmetric distribution, it is invariant to rotation. Therefore, for any , has the same distribution as and has the same distribution as . Thus, (2) yields the same AIR independently of .

We have not made any assumption on the estimation technique, so the derived lower bounds hold for an arbitrary estimator. It can be seen that (2) highly depends on the choice of estimation technique, so one can tighten the bound by choosing a suitable estimator.

###### Corollary 4

corollary:4 For a complex unitary channel, a unitary estimated channel, and a uniform power distribution, the AIR is

(30) |

###### Proof:

By applying to (2) the proof is complete.

Note that (30) is independent of , meaning that the AIR is the same for any unitary channel. Interestingly, the unitary estimation of the channel removes two terms on the right-hand side of (2), leading to a simpler bound.

For uniformly distributed discrete inputs, the pdf of the output of the auxiliary channel is

(31) |

where is defined in (16). Using (15), (16), and (31), the average AIR for a uniformly distributed discrete input can be expressed as

(32) |

where the expectation is over , , and , which can be estimated numerically.

## Iv Channel Estimation

In this section, first, the well-established LS estimation algorithm is presented. Then, a unitary estimation method is proposed for unitary channels. As illustrated in Fig. 1, to make data-aided channel estimation possible, the first symbols of the transmission block are pilot symbols , which are assumed to be known at both transmitter and receiver. The optimal pilot assignment should have the following properties [Hassibi:2003]:

The complex matrix of received symbols is

(33) |

where is an matrix of i.i.d. noise samples and is an random unitary channel matrix, which is assumed to remain constant during a transmission block.

Note that the channel estimation problem can be translated to finding a certain number of independent real values, often regarded as the DOF and denoted by .

### Iv-a LS Algorithm

Conventional optical transmission systems often use LMS for a real-time estimate of the channel, because LMS tracks the channel change with each received symbol. However, in this paper it is assumed that the channel is constant during a transmission block and the LS method is well adapted to block based transmissions. The LS estimator estimates the channel by minimizing the squared error between the desired and received signal. For a MIMO-AWGN channel, the LS optimization problem can be expressed as [LS_Biguesh_2006]

(34) |

Knowing and , the solution of (34) is [LS_Biguesh_2006],[Kay:1993, Ch. 8]

(35) |

However, the pilots are chosen in such a way that . Therefore, we can write

(36) |

It can be shown that for the LS algorithm is [LS_Biguesh_2006]

(37) |

showing that the estimation error of the LS algorithm is inversely proportional to the SNR and the pilot length .

It is worth mentioning that the LS algorithm has to find independent real values to make an estimate.

### Iv-B Kabsch Algorithm

The problem with both blind estimation algorithms (e.g., CMA, RDE, and MMA) and pilot-aided estimation algorithms (e.g., LS and LMS) is that their optimization problems are not adopted for unitary channel estimation, making their solutions suboptimal if is known to be unitary. Thus, in this part, we apply the unitary constraint of the channel to the estimation problem of (34) and write

(38) |

The optimal solution to this problem is given by the Kabsch algorithm [Kabsch:1976] as

(39) |

where

is the singular value decomposition function of

. As can be seen, unlike the blind conventional estimators, Kabsch is independent of the modulation format of the transmission. The Kabsch algorithm was proposed for optical communication by Louchet et al. [Louchet:2014] as a blind polarization tracking algorithm, where decision-directed symbols were used instead of pilots.In contrast to LS, no analytical result is known for of the Kabsch algorithm. Although we cannot analytically prove it, we can make an intuitive prediction by considering that for a unitary estimation of the channel while for a general estimation of the channel . Since the channel estimation problem is equivalent to finding independent real-valued quantities, one can predict that the estimation error of Kabsch would be half of LS. More interestingly, for special unitary channels where , the DOF is and the gain by the unitary algorithm can be even higher; however, this gain vanishes for large .

Note that in the case of the DP channel, the singular value decomposition is deployed only on a two-by-two matrix, making it less computationally complex than for higher

.## V Numerical Results

In this section, through Monte Carlo trials, the AIR of the DP channel (i.e., ) for the estimation algorithms detailed in Section IV is computed. While AIRs are derived for a fixed channel matrix , the estimated channel is dependent on each realization of the channel. A deterministic sequence of QPSK symbols is selected to satisfy the pilot conditions detailed in Section IV.

Numerical results verify that the estimation error of the Kabsch algorithm completely follows our prediction in Sec. IV-B. Thus, to perform a fair comparison between the unitary and nonunitary estimators, we define a new parameter called estimation error per DOF as . Note that for a general nonunitary estimator and for a general unitary estimator .

Using inputs, Fig. 2 shows the Monte-Carlo averaged AIR of the DP channel according to (2) and (30) for . Note that based on corollary:3, for an estimation error with a spherically symmetric distribution, (2) is independent of . For the blue curves, we have assumed that is distributed according to with . For the dashed magenta curves, is unitary and . The results illustrate that when we have the same , the unitary estimate of the channel leads to a higher AIR. Besides, it can be seen that for a constant estimation error per DOF , the average AIR reaches a maximum for a specific SNR and then decreases. This can be justified according to the right-hand side of (2) and (30), where the first term increases in a logarithmic manner with respect to SNR, but the second term decreases linearly with SNR. Therefore there is an optimum SNR that maximizes the AIR. This behavior comes from the assumption of a constant regardless of the SNR, which, as we shall see next, is not fully realistic.

Fig. 3(a) presents the AIR of LS and Kabsch when inputs are used. The solid red line indicates the receiver perfectly knows the channel. It can be concluded that with , Kabsch surpasses LS throughout the range of considered SNRs. More specifically, at and dB SNRs (see insets), Kabsch has at least and bits per symbol higher AIR, respectively. Additionally, it is clear from the results that LS is upper-bounded by Kabsch, and increasing the SNR cannot fill the gap. This behavior can be justified because, unlike LS, Kabsch guarantees a unitary estimation of the channel leading to a lower estimation error. Moreover, as the SNR increases, the gap between Kabsch and LS and the actual channel MI is almost constant. Since the theoretical gap is according to corollary:4, we conclude that the error covariance of Kabsch is inversely proportional to the SNR, i.e., . Unlike Fig. 2, the AIR bounds are monotonically increasing with SNR which is due to the fact that the estimation error is decreasing with SNR.

A comparison between LS and Kabsch with DP--QAM inputs is provided in Fig. 3(b). The MI when the channel is perfectly known at the receiver is marked by the solid red line. Evidently, Kabsch outperforms LS throughout the considered range of SNRs. The results also support the fact that Kabsch upper-bounds LS for various inputs, which completely agrees with Fig. 2, where the unitary estimation of the channel leads to a higher AIR.

Fig. 4 displays the information gap between the AIRs and the capacity of the DP channel, when (a) inputs and (b) DP--QAM inputs are used. The overall dominance of Kabsch throughout the range of considered SNRs can be easily verified. Evidently, it is beneficial to use higher pilot numbers at low SNRs. Given the fact that for LS (37) and that Kabsch in Fig. 4 follows the same trend with respect to , we can empirically conclude that of Kabsch is also inversely proportional to the pilot length and the SNR.

It is beneficial to use Kabsch to limit the rate loss due to the pilots. For example, a reasonable performance is achieved by only a block of pilot symbols, which is relatively small compared to the transmission block size in optical communication systems, implying that the rate loss is negligible. For instance, for a system operating at a rate of Gbaud, even if the channel remains constant for one microsecond (i.e., the SOP drift time is one microsecond), it corresponds to a block length of at least symbols and the rate loss due to pilots is negligible.

## Vi Conclusion

The capacity of unitary MIMO-AWGN channels and specifically DP was investigated. With perfect channel knowledge, the capacity of unitary channels is the same as for the regular AWGN channel. An AIR with imperfect channel knowledge was derived and showed that the AIR is highly dependent on the estimation algorithm. In the case of unitary channels, higher AIR are obtained with a unitary estimation of the channel. The bounds are derived for any dimensions, meaning that the results can be applied to other optical channels. In particular, Th:Lower_Bound can be directly applied to space-division multiplexed channels which are impaired with polarization- and mode-dependent loss. In contrast to the conventional estimation algorithms, the proposed data-aided Kabsch algorithm ensures a unitary estimate of the channel. Numerical results showed that for various input distributions, Kabsch outperforms LS in terms of AIR. Also, like LS, Kabsch can perform very well with only a few pilot symbols, making the transmission rate loss due to the pilots negligible.