Polar codes were introduced in [1, 2]. They are the first class of codes that achieve the capacity of binary input discrete memoryless channels with a deterministic construction . In  it was shown that the effect of polarization also takes place for non-stationary channels. In this paper, we consider parallel BiAWGN channels with an average power constraint [4, Section 9.4]. Parallel channels naturally arise for OFDM (OFDM) transceivers, where a time-frequency resource block has multiple channels of different quality. The model also describes block-fading channels. Polar codes are a natural choice for parallel channels because the different channels can be interpreted as being pre-polarized.
To develop a basic understanding, we consider the special case of two parallel BiAWGN channels. We address the following two questions: 1) How should the codeword bits be mapped to channels of different quality — or equivalently, how should one design an interleaver between the codeword bits and the channel. This has been partially addressed in the literature, e.g., [5, 6]. Both papers propose a sorted mapping that combines two different channels such that each kernel gets one instance of both channels. We also use this mapping, but we show that it does not necessarily minimize the FER (FER). 2) How should power be allocated for a good finite length performance? To the best of our knowledge, this has not been considered in the literature yet. We show that the information-theoretic approach of maximizing the achievable rate, also known as mercury/waterfilling , is suboptimal in terms of FER for finite length polar codes.
We denote random variables by capital letters (e.g.,) and deterministic variables or realizations by small letters (e.g.,
). Deterministic vectors are denoted by a bold italic font with small letters (e.g.,), while we use a bold italic font with capital letters (e.g., ) for deterministic matrices and random vectors. We write .
Ii-B System Model
Consider parallel BiAWGN channels
where , , , and denote the receive signal, the channel coefficient, the transmit signal, and additive white Gaussian noise with , respectively. For simplicity, we consider only parallel channels and . We assume that the channel coefficients are known to the encoder and decoder. The input signals are scaled BPSK symbols, i.e., we have
The value is the power of the transmit signal . We consider a common power constraint (see [4, Section 9.4])
We combine uses of each of the two channels to a block of channel uses.
Ii-C Polar Codes
Polar codes are linear block codes described by three parameters : the block length , dimension , and a set of information bits with . The code rate is . The input has an information bit at position if , and zeros at the remaining positions, i.e., . These bits are called frozen. The codeword is generated from by
denotes the -th Kronecker power of . The codeword is mapped to BPSK transmit symbols which are transmitted over the channel and received as the vector .
With SC (SC) decoding, the information bits
, are estimated usingand the estimates of the previous bits . The frozen bits are decoded to zero, i.e., for . The MI terms specify the maximum transmission rate over virtual channels with input , output , and known . This MI terms polarize to being either close to one or close to zero for large . Thus, polar codes are often seen as a transformation of channel uses into virtual channels with MI either close to one or close to zero. The fraction of virtual channels with MI close to one approaches the capacity of the original channel for large , and thus polar codes are capacity achieving.
The positions in with smallest MI are frozen. Polar code design consists of finding these positions. We use density evolution [8, 9] with a Gaussian approximation  to estimate the bit reliabilities.
The MI terms can be calculated recursively using the transform depicted in Fig. 1. The values are given by:
denotes the probability that the first bit error of a block occurs at bit(i.e., the probability that the SC decoder makes the wrong decision for bit given that all previous decisions were correct), and
denotes the tail distribution function of the normal distribution. The functions in (8) can be approximated numerically.
Information theory suggest to allocate power such that the achievable rate is maximized, i.e.,
This optimization problem was solved in  for discrete channel input symbols in a (semi-)closed form, and is known as mercury/waterfilling. The naming is in analogy to the waterfilling solution for Gaussian inputs [4, Section 9.4].
Fig. 2 shows the mercury/waterfilling solution for two parallel BiAWGN channels with channel coefficients and . In the low-power regime, the power is allocated only to the better channel. When this channel’s MI starts to saturate, power is also assigned to the worse channel. For comparison, the waterfilling solution for Gaussian channel inputs is depicted by dashed curves.
Ii-E Normal Approximation
To take finite length effects into account, we resort to the NA (NA) (e.g., [12, Sec. II-F]), which is an approximation of the maximum achievable rate for a finite block length and reads as
where is the capacity of the respective channel and is the dispersion. The dispersion is defined as with being the information density. For the considered example of two parallel BiAWGN channels we have
Iii Polar Code Design for Parallel Channels
Iii-a Problem Statement
We design polar codes for two parallel BiAWGN channels. Each channel is used times and a polar code of block length (which we assume to be a power of ) is applied jointly over all channel uses. The objective is to minimize the FER of a polar code under SC decoding. We optimize the mapping of code word bits to different channels, the set of frozen bits, and the power allocation for and given the average power constraint . The FER under SC decoding can be estimated using (7) and (8), such that no Monte-Carlo simulations are necessary.
Iii-B Channel Mappings
The mapping of codeword bits to channels has been discussed in  and . In , the authors propose to combine two different channels so that each kernel of the polar code gets one instance of the channel and one instance of the channel (see Fig. 4 for the kernel and Fig. (a)a for an example of a polar code of length ). We denote this mapping as a sorted mapping. The other extreme is a mapping we call an alternating mapping111Our nomenclature refers to a non bit-reversal representation of the polar code. In a bit-reversal representation, these two mappings change their roles.. This mapping combines identical channels as long as possible, i.e., during the first polarization levels (from the channel perspective) for two different channels. An example of this mapping for a polar code of length is depicted in Fig. (b)b.
where denotes the Bhattacharyya-parameter of the -th virtual channel after levels of polarization. As solving (12) is not feasible, they resort to solving
i.e., they minimize the sum of even-indexed Bhattacharyya-parameters after the first polarization level. The authors of 
argue by numerical simulations that this heuristic leads to good results. The solution to this relaxed optimization problem is the sorted mapping. However, it turns out that in some scenarios (e.g., for) the alternating mapping achieves a lower FER than the sorted mapping. Thus the sorted mapping is not globally optimal. Nevertheless, we use the sorted mapping for the following reasons:
After the first level of polarization (from the channel perspective), one obtains two different virtual channels and , see Fig. (a)a. Thus, after the first level, the code behaves like a “regular” polar code that also creates two different virtual channels after the first level. This is in contrast to the alternating mapping, where after the first level of polarization there are four different virtual channels, see Fig. (b)b. This insight gives an intuition on how to extend the system to more than two parallel channels, namely by aiming for a “regular” polar code after as few levels as possible.
Compared to a polar code over identical channels with MI the code over two parallel channels always leads to stronger polarization in the sense that after the first level of polarization, the virtual channel has worse quality than the channel that would arise from identical channels, and the virtual channel has better quality then the channel that would arise from identical channels. This is shown in Fig. 5 where the two mappings are compared in terms of achievable code rate at a fixed FER for different channels of constant average MI. When the MI of one channel increases (and thus the MI of the other channel decreases by the same amount), the achievable rate with the sorted mapping increases (for sufficiently large ), whereas the achievable rate with the alternating mapping decreases at first.
Iii-C Frozen Bit Selection
Iii-D Power Allocation
Next we consider the allocation of powers and . From an information theoretic perspective, the powers should be allocated such that the achievable rate (i.e., MI) is maximized. This is described in Sec. II-D and the solution is called mercury/waterfilling.
However, it turns out that mercury/waterfilling is not best for finite blocklength polar codes over parallel channels. In particular, we are interested in the power allocation that minimizes the FER of a polar code with fixed parameters (length, dimension, and average power constraint):
where denotes the FER (calculated using (7) and (8)) of the polar code with frozen bit indices optimized for the power allocations and . We assume that the power constraint is fulfilled with equality. Thus, the optimization problem can be re-written as a one dimensional optimization problem in , i.e., we have
Fig. 6 shows an example of the objective for two parallel channels with channel coefficients and . The FER is plotted versus the power allocation (normalized by ). Different curves correspond to different power constraints.
The power allocations that are given by mercury/waterfilling are depicted by asterisks. The dashed vertical line corresponds to the power allocation given by mercury/waterfilling in the Shannon limit, i.e., the point where (in the depicted scenario, the Shannon limit is at ). As one can see, the FER optimal power allocation is far from the power allocation given by mercury/waterfilling. The difference is several orders of magnitude in FER, or more than . The polar-optimal power allocation pushes the good channel further into saturation, i.e., we obtain channels with a stronger pre-polarization. These effects also occur at very long block lengths. Combining polar codes with CRC-aided SCL (SCL) decoding  also leads to similar effects. However, as the FER for SCL has to be obtained using Monte-Carlo simulations, the optimization is much more complex and we thus focus on optimizing the power allocation for SC decoding.
These results raise the question whether the effects are specific to polar codes or if they originate from a finite number of channel uses. To answer the question, we first compare with an LDPC code from the 5G eMBB (eMBB) standard . The code is derived from basegraph one of the respective standard and has a blocklength of and rate . As shown in Fig. 6 by dashed lines, the optimal power allocation closely follows the assignment given by mercury/waterfilling.
The polar-optimal power allocation (red circle) reduces the achievable rate according to the normal approximation as compared to the mercury/waterfilling solution (black asterisk). Furthermore, the mercury/waterfilling solution is close to the maximum.
From these observations, we conclude that the effects are inherently linked to polar codes. The behaviour may be partly explained by the following: if bits are frozen whose MI is not zero, then their MI is “lost” with SC decoding, as these bits can not be used for information transmission. On the other hand, bits with a MI not close to one need to be frozen to reach a feasible FER. Fig. 8 depicts this rate loss for the scenario from Fig. 6 with . The rate loss with the polar optimal power allocation (red circle) is less than half of the rate loss with mercury/waterfilling (black asterisk). Thus, the polar-optimal power allocation is a tradeoff between rate loss (in terms of achievable rate) by sub-optimal power allocation and rate loss by imperfect polarization. Instead of minimizing the frame error rate one could also maximize the achievable rate of the unfrozen bits, i.e., the rate
This leads to almost the same results as optimizing the FER (14), and brings the power allocation for polar codes back into an information theoretic framework.
Iv Numerical Results
We investigate an extreme case of two parallel channels with , and BPSK (BPSK). The simulation results are depicted in Fig. 9. A polar code of block length is used. The figure shows the FER versus the average power. With SC decoding, the polar code with optimized power allocation outperforms the polar code with mercury/waterfilling by at a FER of . For SCL decoding  with list size , the qualitative behaviour stays the same, but the gap between the two power allocations shrinks to approximately . The SC decoded polar code with optimized power allocation outperforms the SCL decoded polar code with mercury/waterfilling. When combining SCL decoding with an outer CRC with , the polar code with power allocation optimized for SC decoding still outperforms the polar code with mercury/waterfilling by . It also outperforms the 5G LDPC code by about . It operates approximately away from the normal approximation .
We proposed a novel approach to allocate power for polar codes over parallel channels with an average power constraint. We showed significant gains in terms of FER as compared to power allocation by mercury/waterfilling. We elaborated on the design of polar codes for parallel channels and the mapping between codeword bits and channels of different quality. Future work involves a study of more than two parallel channels, including the design of the mapping between codeword bits and channels. A further research topic is the power allocation for polar codes with higher order modulation, e.g., using multi-level coding [16, 17].
-  N. Stolte, “Rekursive codes mit der Plotkin-konstruktion und ihre decodierung,” Ph.D. dissertation, Technische Universität, Darmstadt, Januar 2002. [Online]. Available: http://tuprints.ulb.tu-darmstadt.de/183/
-  E. Arıkan, “Channel polarization: a method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, July 2009.
-  M. Alsan and E. Telatar, “A simple proof of polarization and polarization for non-stationary memoryless channels,” IEEE Trans. Inf. Theory, vol. 62, no. 9, pp. 4873–4878, Sept 2016.
-  T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed. John Wiley & Sons, Inc., 2006.
-  H. Mahdavifar, M. El-Khamy, J. Lee, and I. Kang, “Compound polar codes,” in Inf. Theory and Appl. Workshop, Feb 2013, pp. 1–6.
-  S. Liu, Y. Hong, and E. Viterbo, “Polar codes for block fading channels,” in IEEE Wireless Commun. and Netw. Conf. Workshops, March 2017, pp. 1–6.
-  A. Lozano, A. M. Tulino, and S. Verdú, “Optimum power allocation for parallel Gaussian channels with arbitrary input distributions,” IEEE Trans. Inf. Theory, vol. 52, no. 7, pp. 3033–3051, July 2006.
-  R. Mori and T. Tanaka, “Performance and construction of polar codes on symmetric binary-input memoryless channels,” in IEEE Int. Symp. Inf. Theory (ISIT), June 2009, pp. 1496–1500.
-  ——, “Performance of polar codes with the construction using density evolution,” IEEE Commun. Lett., vol. 13, no. 7, pp. 519–521, July 2009.
-  S. ten Brink, G. Kramer, and A. Ashikhmin, “Design of low-density parity-check codes for modulation and detection,” IEEE Trans. Commun., vol. 52, no. 4, pp. 670–678, April 2004.
-  F. Brannström, L. K. Rasmussen, and A. J. Grant, “Convergence analysis and optimal scheduling for multiple concatenated codes,” IEEE Trans. Inf. Theory, vol. 51, no. 9, pp. 3354–3364, Sept 2005.
-  Y. Polyanskiy, H. V. Poor, and S. Verdú, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2307–2359, May 2010.
-  I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2213–2226, May 2015.
-  “3GPP TS 38.212 V15.0.0: Multiplexing and channel coding,” Dec. 2017.
-  J. Park and D. Park, “A new power allocation method for parallel AWGN channels in the finite block length regime,” IEEE Commun. Lett., vol. 16, no. 9, pp. 1392–1395, September 2012.
-  M. Seidl, A. Schenk, C. Stierstorfer, and J. B. Huber, “Polar-coded modulation,” IEEE Trans. Commun., vol. 61, no. 10, pp. 4108–4119, October 2013.
-  G. Böcherer, T. Prinz, P. Yuan, and F. Steiner, “Efficient polar code construction for higher-order modulation,” in 2017 IEEE Wireless Commun. and Netw. Conf. Workshops (WCNCW), March 2017, pp. 1–6.