Design of Polar Codes for Parallel Channels with an Average Power Constraint

Polar codes are designed for parallel binary-input additive white Gaussian noise (BiAWGN) channels with an average power constraint. The two main design choices are: the mapping between codeword bits and channels of different quality, and the power allocation under the average power constraint. Information theory suggests to allocate power such that the sum of mutual information (MI) terms is maximized. However, a power allocation specific to polar codes shows significant gains.



There are no comments yet.


page 1

page 2

page 3

page 4


Design of Puncturing for Length-Compatible Polar Codes Using Differential Evolution

This paper presents a puncturing technique to design length-compatible p...

Feedback Capacity of Parallel ACGN Channels and Kalman Filter: Power Allocation with Feedback

In this paper, we relate the feedback capacity of parallel additive colo...

Approaching Waterfilling Capacity of Parallel Channels by Higher Order Modulation and Probabilistic Amplitude Shaping

Parallel, additive white Gaussian noise (AWGN) channels with an average ...

Random Spreading for Unsourced MAC with Power Diversity

We propose an improvement of the random spreading approach with polar co...

Flexible IR-HARQ Scheme for Polar-Coded Modulation

A flexible incremental redundancy hybrid auto- mated repeat request (IR-...

Bhattacharyya parameter of monomials codes for the Binary Erasure Channel: from pointwise to average reliability

Monomial codes were recently equipped with partial order relations, fact...

Turbo Autoencoder with a Trainable Interleaver

A critical aspect of reliable communication involves the design of codes...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Polar codes were introduced in [1, 2]. They are the first class of codes that achieve the capacity of binary input discrete memoryless channels with a deterministic construction [2]. In [3] it was shown that the effect of polarization also takes place for non-stationary channels. In this paper, we consider parallel BiAWGN channels with an average power constraint [4, Section 9.4]. Parallel channels naturally arise for OFDM (OFDM) transceivers, where a time-frequency resource block has multiple channels of different quality. The model also describes block-fading channels. Polar codes are a natural choice for parallel channels because the different channels can be interpreted as being pre-polarized.

To develop a basic understanding, we consider the special case of two parallel BiAWGN channels. We address the following two questions: 1) How should the codeword bits be mapped to channels of different quality — or equivalently, how should one design an interleaver between the codeword bits and the channel. This has been partially addressed in the literature, e.g., [5, 6]. Both papers propose a sorted mapping that combines two different channels such that each kernel gets one instance of both channels. We also use this mapping, but we show that it does not necessarily minimize the FER (FER). 2) How should power be allocated for a good finite length performance? To the best of our knowledge, this has not been considered in the literature yet. We show that the information-theoretic approach of maximizing the achievable rate, also known as mercury/waterfilling [7], is suboptimal in terms of FER for finite length polar codes.

This work is structured as follows: in Sec. II we state the system model and preliminaries. In Sec. III we discuss the problem of designing polar codes for parallel channels. We provide numerical examples in Sec. IV and conclude in Sec. V.

Ii Preliminaries

Ii-a Notation

We denote random variables by capital letters (e.g.,

) and deterministic variables or realizations by small letters (e.g.,

). Deterministic vectors are denoted by a bold italic font with small letters (e.g.,

), while we use a bold italic font with capital letters (e.g., ) for deterministic matrices and random vectors. We write .

Ii-B System Model

Consider parallel BiAWGN channels


where , , , and denote the receive signal, the channel coefficient, the transmit signal, and additive white Gaussian noise with , respectively. For simplicity, we consider only parallel channels and . We assume that the channel coefficients are known to the encoder and decoder. The input signals are scaled BPSK symbols, i.e., we have


The value is the power of the transmit signal . We consider a common power constraint (see [4, Section 9.4])


We combine uses of each of the two channels to a block of channel uses.

Ii-C Polar Codes

Polar codes are linear block codes described by three parameters : the block length , dimension , and a set of information bits with . The code rate is . The input has an information bit at position if , and zeros at the remaining positions, i.e., . These bits are called frozen. The codeword is generated from by


denotes the -th Kronecker power of . The codeword is mapped to BPSK transmit symbols which are transmitted over the channel and received as the vector .

With SC (SC) decoding, the information bits

, are estimated using

and the estimates of the previous bits . The frozen bits are decoded to zero, i.e., for . The MI terms specify the maximum transmission rate over virtual channels with input , output , and known . This MI terms polarize to being either close to one or close to zero for large [2]. Thus, polar codes are often seen as a transformation of channel uses into virtual channels with MI either close to one or close to zero. The fraction of virtual channels with MI close to one approaches the capacity of the original channel for large , and thus polar codes are capacity achieving.

The positions in with smallest MI are frozen. Polar code design consists of finding these positions. We use density evolution [8, 9] with a Gaussian approximation [10] to estimate the bit reliabilities.

Figure 1: MI of the polar transform.

The MI terms can be calculated recursively using the transform depicted in Fig. 1. The values are given by:


where the -function [10] is approximated numerically [11]. The FER with SC decoding is




denotes the probability that the first bit error of a block occurs at bit

(i.e., the probability that the SC decoder makes the wrong decision for bit given that all previous decisions were correct), and

denotes the tail distribution function of the normal distribution. The functions in (

8) can be approximated numerically.

Ii-D Mercury/Waterfilling

Information theory suggest to allocate power such that the achievable rate is maximized, i.e.,


This optimization problem was solved in [7] for discrete channel input symbols in a (semi-)closed form, and is known as mercury/waterfilling. The naming is in analogy to the waterfilling solution for Gaussian inputs [4, Section 9.4].

Figure 2: Power allocation for mercury/waterfilling and two parallel BiAWGN channels with and . For comparison, the waterfilling solution for Gaussian inputs is shown by dashed curves.

Fig. 2 shows the mercury/waterfilling solution for two parallel BiAWGN channels with channel coefficients and . In the low-power regime, the power is allocated only to the better channel. When this channel’s MI starts to saturate, power is also assigned to the worse channel. For comparison, the waterfilling solution for Gaussian channel inputs is depicted by dashed curves.

Ii-E Normal Approximation

To take finite length effects into account, we resort to the NA (NA) (e.g., [12, Sec. II-F]), which is an approximation of the maximum achievable rate for a finite block length and reads as


where is the capacity of the respective channel and is the dispersion. The dispersion is defined as with being the information density. For the considered example of two parallel BiAWGN channels we have


Iii Polar Code Design for Parallel Channels

Iii-a Problem Statement

We design polar codes for two parallel BiAWGN channels. Each channel is used times and a polar code of block length (which we assume to be a power of ) is applied jointly over all channel uses. The objective is to minimize the FER of a polar code under SC decoding. We optimize the mapping of code word bits to different channels, the set of frozen bits, and the power allocation for and given the average power constraint . The FER under SC decoding can be estimated using (7) and (8), such that no Monte-Carlo simulations are necessary.

Iii-B Channel Mappings

(a) Sorted mapping
(b) Alternating mapping
Figure 3: Polar codes of length over two parallel channels for the sorted mapping and the alternating mapping.
Figure 4: Polar kernel for two parallel channels.

The mapping of codeword bits to channels has been discussed in [5] and [6]. In [5], the authors propose to combine two different channels so that each kernel of the polar code gets one instance of the channel and one instance of the channel (see Fig. 4 for the kernel and Fig. (a)a for an example of a polar code of length ). We denote this mapping as a sorted mapping. The other extreme is a mapping we call an alternating mapping111Our nomenclature refers to a non bit-reversal representation of the polar code. In a bit-reversal representation, these two mappings change their roles.. This mapping combines identical channels as long as possible, i.e., during the first polarization levels (from the channel perspective) for two different channels. An example of this mapping for a polar code of length is depicted in Fig. (b)b.

The authors of [6] give reasons for using the sorted mapping. They minimize a bound on the FER (similar to (7)) with respect to the mapping :


where denotes the Bhattacharyya-parameter of the -th virtual channel after levels of polarization. As solving (12) is not feasible, they resort to solving


i.e., they minimize the sum of even-indexed Bhattacharyya-parameters after the first polarization level. The authors of [6]

argue by numerical simulations that this heuristic leads to good results. The solution to this relaxed optimization problem is the sorted mapping. However, it turns out that in some scenarios (e.g., for

) the alternating mapping achieves a lower FER than the sorted mapping. Thus the sorted mapping is not globally optimal. Nevertheless, we use the sorted mapping for the following reasons:

Figure 5: Achievable code rate with SC decoding at a FER of for a polar code with block length over two parallel BiAWGN channels with average MI .
  • After the first level of polarization (from the channel perspective), one obtains two different virtual channels and , see Fig. (a)a. Thus, after the first level, the code behaves like a “regular” polar code that also creates two different virtual channels after the first level. This is in contrast to the alternating mapping, where after the first level of polarization there are four different virtual channels, see Fig. (b)b. This insight gives an intuition on how to extend the system to more than two parallel channels, namely by aiming for a “regular” polar code after as few levels as possible.

  • Compared to a polar code over identical channels with MI the code over two parallel channels always leads to stronger polarization in the sense that after the first level of polarization, the virtual channel has worse quality than the channel that would arise from identical channels, and the virtual channel has better quality then the channel that would arise from identical channels. This is shown in Fig. 5 where the two mappings are compared in terms of achievable code rate at a fixed FER for different channels of constant average MI. When the MI of one channel increases (and thus the MI of the other channel decreases by the same amount), the achievable rate with the sorted mapping increases (for sufficiently large ), whereas the achievable rate with the alternating mapping decreases at first.

Iii-C Frozen Bit Selection

Suppose the power allocation is fixed, i.e., and are known. We use density evolution with Gaussian approximation to select the frozen bits as described in Sec. II-C. We propagate the MI of the channels through the graphs depicted in Fig. 3.

Iii-D Power Allocation

Next we consider the allocation of powers and . From an information theoretic perspective, the powers should be allocated such that the achievable rate (i.e., MI) is maximized. This is described in Sec. II-D and the solution is called mercury/waterfilling.

However, it turns out that mercury/waterfilling is not best for finite blocklength polar codes over parallel channels. In particular, we are interested in the power allocation that minimizes the FER of a polar code with fixed parameters (length, dimension, and average power constraint):


where denotes the FER (calculated using (7) and (8)) of the polar code with frozen bit indices optimized for the power allocations and . We assume that the power constraint is fulfilled with equality. Thus, the optimization problem can be re-written as a one dimensional optimization problem in , i.e., we have


Fig. 6 shows an example of the objective for two parallel channels with channel coefficients and . The FER is plotted versus the power allocation (normalized by ). Different curves correspond to different power constraints.

Figure 6: Solid lines depict the FER (estimated using (7)) versus power allocation for a polar code (, ) over two parallel BiAWGN channels with and . Dashed lines depict the FER of a 5G LDPC code (simulated with a grid size of ).

The power allocations that are given by mercury/waterfilling are depicted by asterisks. The dashed vertical line corresponds to the power allocation given by mercury/waterfilling in the Shannon limit, i.e., the point where (in the depicted scenario, the Shannon limit is at ). As one can see, the FER optimal power allocation is far from the power allocation given by mercury/waterfilling. The difference is several orders of magnitude in FER, or more than . The polar-optimal power allocation pushes the good channel further into saturation, i.e., we obtain channels with a stronger pre-polarization. These effects also occur at very long block lengths. Combining polar codes with CRC-aided SCL (SCL) decoding [13] also leads to similar effects. However, as the FER for SCL has to be obtained using Monte-Carlo simulations, the optimization is much more complex and we thus focus on optimizing the power allocation for SC decoding.

These results raise the question whether the effects are specific to polar codes or if they originate from a finite number of channel uses. To answer the question, we first compare with an LDPC code from the 5G eMBB (eMBB) standard [14]. The code is derived from basegraph one of the respective standard and has a blocklength of and rate . As shown in Fig. 6 by dashed lines, the optimal power allocation closely follows the assignment given by mercury/waterfilling.

Secondly, we follow the approach of [15] and use a finite length bound for power allocation. Fig. 7 shows the achievable rate according to the normal approximation [12] for the scenario from Fig. 6.

Figure 7: Achievable rate according to normal approximation at a frame error rate of for the scenario from Fig. 6 with . The power allocation with merucry/waterfilling is denoted by the black asterisk and the polar-optimal power allocation by the red circle.
Figure 8: Sum of MI terms of frozen bits (choice of frozen bits optimized with Gaussian approximation) for the scenario from Fig. 6 and .

The polar-optimal power allocation (red circle) reduces the achievable rate according to the normal approximation as compared to the mercury/waterfilling solution (black asterisk). Furthermore, the mercury/waterfilling solution is close to the maximum.

From these observations, we conclude that the effects are inherently linked to polar codes. The behaviour may be partly explained by the following: if bits are frozen whose MI is not zero, then their MI is “lost” with SC decoding, as these bits can not be used for information transmission. On the other hand, bits with a MI not close to one need to be frozen to reach a feasible FER. Fig. 8 depicts this rate loss for the scenario from Fig. 6 with . The rate loss with the polar optimal power allocation (red circle) is less than half of the rate loss with mercury/waterfilling (black asterisk). Thus, the polar-optimal power allocation is a tradeoff between rate loss (in terms of achievable rate) by sub-optimal power allocation and rate loss by imperfect polarization. Instead of minimizing the frame error rate one could also maximize the achievable rate of the unfrozen bits, i.e., the rate


This leads to almost the same results as optimizing the FER (14), and brings the power allocation for polar codes back into an information theoretic framework.

Iv Numerical Results

Figure 9: Performance comparison of polar optimal power allocation (solid curves) versus mercury/waterfilling (dashed curves) for a scenario with , , , with SC, SCL, and CRC-aided SCL decoding. For comparison the 5G LDPC code described Sec. III-D and the normal approximation [12] are shown.

We investigate an extreme case of two parallel channels with , and BPSK (BPSK). The simulation results are depicted in Fig. 9. A polar code of block length is used. The figure shows the FER versus the average power. With SC decoding, the polar code with optimized power allocation outperforms the polar code with mercury/waterfilling by at a FER of . For SCL decoding [13] with list size , the qualitative behaviour stays the same, but the gap between the two power allocations shrinks to approximately . The SC decoded polar code with optimized power allocation outperforms the SCL decoded polar code with mercury/waterfilling. When combining SCL decoding with an outer CRC with , the polar code with power allocation optimized for SC decoding still outperforms the polar code with mercury/waterfilling by . It also outperforms the 5G LDPC code by about . It operates approximately away from the normal approximation [12].

V Conclusion

We proposed a novel approach to allocate power for polar codes over parallel channels with an average power constraint. We showed significant gains in terms of FER as compared to power allocation by mercury/waterfilling. We elaborated on the design of polar codes for parallel channels and the mapping between codeword bits and channels of different quality. Future work involves a study of more than two parallel channels, including the design of the mapping between codeword bits and channels. A further research topic is the power allocation for polar codes with higher order modulation, e.g., using multi-level coding [16, 17].


  • [1] N. Stolte, “Rekursive codes mit der Plotkin-konstruktion und ihre decodierung,” Ph.D. dissertation, Technische Universität, Darmstadt, Januar 2002. [Online]. Available:
  • [2] E. Arıkan, “Channel polarization: a method for constructing capacity-achieving codes for symmetric binary-input memoryless channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 3051–3073, July 2009.
  • [3] M. Alsan and E. Telatar, “A simple proof of polarization and polarization for non-stationary memoryless channels,” IEEE Trans. Inf. Theory, vol. 62, no. 9, pp. 4873–4878, Sept 2016.
  • [4] T. M. Cover and J. A. Thomas, Elements of Information Theory, 2nd ed.   John Wiley & Sons, Inc., 2006.
  • [5] H. Mahdavifar, M. El-Khamy, J. Lee, and I. Kang, “Compound polar codes,” in Inf. Theory and Appl. Workshop, Feb 2013, pp. 1–6.
  • [6] S. Liu, Y. Hong, and E. Viterbo, “Polar codes for block fading channels,” in IEEE Wireless Commun. and Netw. Conf. Workshops, March 2017, pp. 1–6.
  • [7] A. Lozano, A. M. Tulino, and S. Verdú, “Optimum power allocation for parallel Gaussian channels with arbitrary input distributions,” IEEE Trans. Inf. Theory, vol. 52, no. 7, pp. 3033–3051, July 2006.
  • [8] R. Mori and T. Tanaka, “Performance and construction of polar codes on symmetric binary-input memoryless channels,” in IEEE Int. Symp. Inf. Theory (ISIT), June 2009, pp. 1496–1500.
  • [9] ——, “Performance of polar codes with the construction using density evolution,” IEEE Commun. Lett., vol. 13, no. 7, pp. 519–521, July 2009.
  • [10] S. ten Brink, G. Kramer, and A. Ashikhmin, “Design of low-density parity-check codes for modulation and detection,” IEEE Trans. Commun., vol. 52, no. 4, pp. 670–678, April 2004.
  • [11] F. Brannström, L. K. Rasmussen, and A. J. Grant, “Convergence analysis and optimal scheduling for multiple concatenated codes,” IEEE Trans. Inf. Theory, vol. 51, no. 9, pp. 3354–3364, Sept 2005.
  • [12] Y. Polyanskiy, H. V. Poor, and S. Verdú, “Channel coding rate in the finite blocklength regime,” IEEE Trans. Inf. Theory, vol. 56, no. 5, pp. 2307–2359, May 2010.
  • [13] I. Tal and A. Vardy, “List decoding of polar codes,” IEEE Trans. Inf. Theory, vol. 61, no. 5, pp. 2213–2226, May 2015.
  • [14] “3GPP TS 38.212 V15.0.0: Multiplexing and channel coding,” Dec. 2017.
  • [15] J. Park and D. Park, “A new power allocation method for parallel AWGN channels in the finite block length regime,” IEEE Commun. Lett., vol. 16, no. 9, pp. 1392–1395, September 2012.
  • [16] M. Seidl, A. Schenk, C. Stierstorfer, and J. B. Huber, “Polar-coded modulation,” IEEE Trans. Commun., vol. 61, no. 10, pp. 4108–4119, October 2013.
  • [17] G. Böcherer, T. Prinz, P. Yuan, and F. Steiner, “Efficient polar code construction for higher-order modulation,” in 2017 IEEE Wireless Commun. and Netw. Conf. Workshops (WCNCW), March 2017, pp. 1–6.