Polar codes  are constructed from the generator matrix with , where denotes the th Kronecker power. It has been shown in , that the synthesized channels seen by individual bits approach two extremes, either a noiseless channel or a pure-noise channel, as the block length grows large. The fraction of noiseless channels is close to the channel capacity. Therefore, the noiseless channels, termed unfrozen bit channels, are selected for transmitting message bits while the other channels, termed frozen bit channels, are set to fixed values known by both encoder and decoder. Therefore, polar codes are the first family of codes that achieve the capacity of symmetric binary-input discrete memoryless channels under a low-complexity successive cancellation (SC) decoding algorithm as the block length approaches infinity.
However, the performance of polar codes at short to moderate block lengths is disappointing under the SC decoding algorithm. Later, a successive cancellation list (SCL) decoding algorithm for polar codes was proposed , which approaches the performance of the maximum-likelihood (ML) decoder as the list size is large. However, the performance levels of polar codes are still inferior to those of low-density parity-check (LDPC) codes even under the ML decoder. To strengthen polar codes, a serial concatenation of a cyclic redundancy check (CRC) code and a polar code, termed the CRC-aided polar code, was found to be effective to improve the performance under the SCL decoding algorithm . The performance levels of CRC-aided polar codes under the SCL decoding algorithm are better than those of LDPC and turbo codes [22, 17].
As the SCL decoder is capable to achieve the ML performance, it is important to study the block error rate (BLER) of polar codes under the ML decoder. However, in the literature, there are no analytical results regarding the ML performance of polar codes. The BLERs of polar codes rely on simulations that are time-consuming. A possible way to analyze the BLER performance of a coding scheme is to use the BLER upper bound which is a function of the weight enumerating function (WEF) as that used to analyze turbo codes 
. However, if the code size is large, obtaining the exact WEF of a polar code with the heuristic method is prohibitively complex. Approximations of WEFs of polar codes are proposed in[25, 27] based on the probabilistic weight distribution (PWD) .
In this paper, we propose to randomize the polar code using interleavers between the intermediate stages of the polar code encoder. Codes constructed on the basis of this idea are called interleaved polar (i-polar) codes. The ensemble of i-polar codes is formed by considering all possible interleavers. The regular polar code is just one realization of the ensemble of i-polar codes. Based on the concept of uniform interleaver, i.e., all interleavers are selected uniformly at random from all possible permutations, the average WEF of a code selected at random from the ensemble of i-polar codes can be evaluated. The concept of uniform interleaver has also been used in the analysis of turbo codes . Note that the WEF analysis in this paper is not an approximation to the WEF of a polar code, but is an exact WEF averaged over the ensemble of i-polar codes. Based on the average WEF, a BLER upper bound, termed simple bound , can be used to evaluate the BLER performance averaged over the ensemble of codes. Simulation results show that the BLER upper bounds can well predict the ML performance levels of i-polar codes at high SNRs. Also, we will show by simulations that a specific realization of i-polar codes outperforms a regular polar code under the SCL decoder of the same list size.
We also propose a concatenated coding scheme that employs identical high rate codes as the outer code and identical i-polar codes as the inner code with an interleaver in between. CRC codes are the most popular outer codes employed in the concatenation of polar codes. We propose as an alternative to use systematic regular repeat-accumulate (RRA) codes or irregular repeat-accumulate (IRA) codes  as the outer component code. The average WEF of the concatenated code is derived based on the uniform interleaver assumption. Simulation results show that the BLER upper bounds can well predict the BLER performance levels of the concatenated codes. One advantage of the proposed concatenated code is that, for , the code can be decoded using SCL decoders working in parallel which can significantly reduce the decoding latency when is large. Analytical and simulation results both show that the performance of the proposed concatenated code with is better than that of the CRC-aided i-polar code with of the same length and code rate at high SNRs. Therefore, the proposed coding scheme is suitable for ultra-reliable low-latency communications (URLLC) .
The rest of the paper is organized as follows. We begin with a brief introduction of polar codes in Section II. The construction of i-polar codes is presented in Section III. Section IV presents the WEF and IOWEF analysis of i-polar codes. In Section V, a concatenated coding scheme with the i-polar code as the inner component code is proposed and the WEF of the concatenated code is presented. Analytical and simulation results are given in Section VI. Finally, conclusions are given in Section VII.
Notations: Throughout this paper, matrices and vectors are set in boldface, with upper case letters for matrices and lower case letters for vectors. An-tuple vector is denoted as with the indices starting from 0 (instead of 1 for normal vector representations). The notation means the sub-vector if and null vector otherwise. Set quantities such as are denoted using the calligraphic font, and the cardinality of the set is denoted as .
A codeword of the polar code of length without bit-reversal matrix can be represented by
where is the message bits, is the codeword bits, and . A polar code of block length can be represented by a graph with layers of trellis connections as given in  which is called the standard graph of the polar code. It has been shown in  that for a polar code of block length , there exist different graphs obtained by different permutations of the layers of trellis connections. We consider to represent a polar code with reverse ordering of its standard graph. Figure 1 shows an example graph for with reverse ordering of the standard graph, where the notation represents a modulo-2 adder.
The codeword obtained from (1) is then transmitted via independent uses of the binary input discrete memoryless channel (B-DMC) , where denotes the input alphabet, denotes the output alphabet, and
denotes the channel transition probabilities. The conditional distribution of the outputgiven the input , denoted as , is given by
The distribution of conditioned on , denoted as , is given by
The polar code of length transfers the original identical channels into synthesized channels, denoted as for with the transition probability given by
It has been shown in  that as grows large, the synthesized channels start polarizing. They approach either a noiseless channel or a pure-noise channel. The fraction of noiseless channels is close to the channel capacity. Therefore, the noiseless channels are selected for transmitting message bits while the other channels are set to fixed values known by both encoder and decoder. In the code design, a polar code of dimension is generated by selecting the least noisy channels among and the indices of the least noisy channels are denoted as a set . Define as a sub-vector of formed by the elements of with indices in . Only the sub-vector , termed unfrozen bits, is employed to transmit message bits. The other bits , termed frozen bits, are set to fixed values known by both encoder and decoder. In this paper, we set the frozen bits to all zeros.
Polar codes can be decoded by the SC decoder which has decoding complexity of and can achieve the capacity as approaches infinity . However, SC decoder does not perform well at short to moderate block lengths. The SC decoder has the drawback that if a bit is not correctly detected, it is not possible to correct it in future decoding steps. To improve the performance, a more sophisticated SCL decoder was proposed in , which performs very close to the ML performance for large list size . The SCL decoder of list size is based on the tree search over the message bits under the complexity constraint that the number of candidates in the list is at most . At the th step, if , the decoder extends every candidate path in the list along two paths of the binary tree by appending a bit 0 or a bit 1 to each of the candidate paths. Therefore, for every , the decoder doubles the number of paths. When the number of paths exceeds , only most reliable paths are retained. This procedure is repeated until . At the last step, the most reliable path is selected as the output of the decoder. The SCL decoder degenerates to the SC decoder when . The details of the SCL decoder can be found in  based on the probability domain and in  based on the log-likelihood ratio (LLR) domain.
Iii Interleaved Polar (I-Polar) Codes
A polar code is constructed recursively by the well-known structure . Without ambiguity, we use ’+’ to denote the binary addition as well as the ordinary real number addition. Let and be two linear codes of the same length . We define as
As shown by the graph representation of the polar code, a polar code can be described by the following recursive equation
for and . The initial conditions are if and if . The polar code of length is represented by the code .
We propose to construct the i-polar code by inserting an interleaver at the output of every upper encoder for . An interleaver can be represented as a permutation matrix . We define as
which represents a code obtained by permuting the code bits of all codewords of using the interleaver . Therefore, the i-polar code can be described by the following recursive equation
for and . The initial conditions are if and if . Note that the interleavers , for , are trivial. At the th layer, there are interleavers of size . Figure 2 shows the graph of an i-polar code of length , for which three interleavers are required, i.e., , , and with sizes 2, 2, and 4, respectively. Note that the interleavers , for , are omitted because they are trivial.
The following theorem shows that the interleavers do not change the polarization effect.
I-polar codes have the same polarization effect as polar codes.
Reading from right to left, Figure 3 illustrates the channel transformation process of the i-polar code for with synthesized channels de-interleaved by , , and . The channel transformation process of the polar code is formed by replacing the de-interleavers with direct links represented by dashed lines in Figure 3. For channel transformation process of the polar code, the figure starts with copies of the transformation . The transformation continues in butterfly patterns with copies of the transformation for and with . Finally, the synthesized channels for can be obtained. The synthesized channels at the intermediate stages can be represented as a binary tree similar to that shown in Figure 6 of . The root node represents the channel . The root gives birth to an upper channel and a lower channel , which are represented by the two nodes at level 1. The channel in turn gives birth to channels and , and the channel gives birth to channels and , and so on. Then based on the concept of a random tree process, the polarization effect is proved in . We want to prove that inserting de-interleavers at the upper channels as that shown in Figure 3 does not change the polarization effect. Note that there are copies of the transformation . For the i-polar code, after channel transformation, the copies of the upper channels are de-interleaved by . Since the de-interleaver acts only on the channels of the same type , the outputs of the de-interleaver are just re-ordered channels of the same type which is the same as that of the original polar code. Therefore, by induction, for the i-polar code, further transformation of the re-ordered channels of the same type gets the same synthesized channels as those of the polar code. ∎
The SC or SCL decoder for polar codes can be easily modified to decode i-polar codes. As proved in Theorem 1, the i-polar code and polar code produce the same synthesized channels for . Therefore, the same bit channel selection algorithm as those designed for polar codes can be employed for i-polar codes. It has been shown that the bit channel selection algorithms based on Gaussian approximation (GA) for density evolution such as those proposed in [24, 6, 7] are effective for binary-input additive white Gaussian noise (BI-AWGN) channels. Bhattacharyya parameter can calso be employed for bit channel selection . In this paper, we employ the bit channel selection algorithm given in  for both i-polar and polar codes. For convenience, we give a brief review of the algorithm proposed in . Assume that the all-zero codeword was transmitted. The bit LLR under SC decoding for the channel is defined as
. The idea of GA is to approximate the LLR as a Gaussian random variable with mean
and variancesatisfying . Therefore, the p.d.f. of the LLR random variable can be described by a single parameter . The mutual information of the channel , defined as , was shown in  to be
where is the variance of the LLR random variable . We want to find the transformation of mutual information that corresponds to the mutual information for the channel transformation . The initial condition is , where is the noise variance of the AWGN channel. Now under SC decoding, assuming that the upper branch is correctly decoded, is the sum of two i.i.d. Gaussian random variables with variance . Therefore is a Gaussian random variable with variance , and hence the mutual information is given by
for . Also, according to Proposition 4 of , , and hence
Given the same set , the i-polar code has the same performance level as that of the polar code under the SC decoder, since the synthesized channels, for , are the same for both codes as shown in the proof of Theorem 1. However, they have different performance levels when a more sophisticated decoder, such as the SCL decoder  or stack decoder [19, 20], is employed. We will show that the WEF of the i-polar code is different from that of the polar code. This implies that these two codes have different performance levels under the ML decoder. Actually, simulation results show that i-polar codes perform better than polar codes under the SCL decoder.
Iv WEF and IOWEF of I-Polar Codes
Iv-a Ensemble of I-Polar Codes
As described in Section III, at the th layer of the i-polar graph, there are interleavers of size . For an interleaver of size , there are possible interleavers. The ensemble of i-polar codes is formed by all possible interleavers given the unfrozen bit set , of which the code length and dimension . In theory, it is impossible to exhaustively enumerate the WEF of i-polar codes over all possible interleavers when the code size is large. To overcome this difficulty, we assume that all interleavers are selected independently at random and each interleaver follows the uniform assumption as that used in the analysis for turbo codes .
 A uniform interleaver of length is a probability device selected in random over all possible interleavers which maps a given input binary vector of weight to all permutations with equal probability .
Iv-B WEF and IOWEF
Given an linear block code , its WEF is defined as
where is the Hamming weight of , is the number of codewords of with Hamming weight , and
is a dummy variable. The WEF can be used to compute the exact probability of undetected errors and an upper bound on the BLER.
We define the input-output weight enumerating function (IOWEF) of the code as
where denotes the number of codewords of generated by an input message word of Hamming weight whose output codeword has Hamming weight . It should be noted that the WEF is a polynomial in one variable and the IOWEF is a polynomial in two variables. The relation between the WEF and IOWEF is given by
The WEF depends only on the code , as only weights of codewords are enumerated. However, the IOWEF depends on the encoder, as it depends on the pairs of Hamming weights of input message word and output codeword. Since there are many different encoders that generate the same code , we will assume that a specific encoder is employed when the IOWEF is considered. The IOWEF can be used to compute the upper bound on the bit error rate (BER) . Also it is important for the study of concatenated coding schemes.
Due to the recursive equation (2), the code can be represented as the graph shown in Figure 4. The code forms a ensemble of i-polar codes, where and . The input of the encoder of is and the output is , where is the output vector at the -th stage of the -polar encoder. Define the WEF of the i-polar code averaged over the ensemble as . Assume that the WEFs and are known. Then is a function of and . The following lemma is important to calculate .
Let and be length- binary vectors with Hamming weights and , respectively. Assume that is a uniform interleaver. Then the weight distribution for averaged over all possible interleavers is given by
We first derive the weight distribution for . Since the weight of is , there are permutations of with equal probability . Among these permutations, let be the number of positions at which the elements in and are both equal to 1. The minimum value of can be easily shown to be and the maximum value of to be . Given the value , the Hamming weight of is , and there are a total of such permutations. Therefore, the probability that has weight is given by . Finally, the additional concatenation of gives the Hamming weight of as . The proof is completed. ∎
We are ready to calculate the average WEF based on the recursive equation (2).
Given the WEFs and , the WEF of the code averaged over all possible interleavers is
The averaged WEF of can be written as
By Lemma 2, we have
The average number of codeword combinations of and for and is . The proof is completed. ∎
Similarly, for the average IOWEF, we have the following theorem.
Given the IOWEFs and , the IOWEF of the code averaged over all possible interleavers is
for and , where the initial conditions are given by
The interleaver can also be applied for every output of the lower encoder. In this case, the recursive equation becomes
for and . The initial conditions are if and if . De-interleaving and by does not change the WEF of the code. Therefore, the following code has the same WEF as
where . Since the mapping is bijective, the interleaver is uniform if is uniform. Comparing to (2), as averaged over all interleavers, has the same average WEF as . Therefore, it is sufficient to consider only the former option.
Iv-C WEFs of (32, 16) I-Polar Code and Polar Code
|Weight||WEF(A)||WEF(B)||WEF(C)||Type 1||Type 2|
The WEFs of the (32, 16) i-polar code and polar code are compared. The set which is used for the i-polar code and polar code is given by
This set is obtained by using the bit channel selection algorithm described previously which was proposed in . Table I gives the coefficients of WEFs of all test cases, of which the Hamming weights, denoted as , are listed in the first column and the remaining columns are the coefficients of all test cases. The WEF averaged over the ensemble of i-polar codes is denoted as WEF(A), which is computed based on the recursive equation (5). Since the code size is small, the WEF of a realization of i-polar codes can be enumerated exhaustively. We take 1000 independent realizations of i-polar codes and compute the WEF of each realization. In Table I, WEF(B) denotes the sample average of the WEFs over 1000 realizations. The WEF of the polar code is also enumerated exhaustively and is denoted as WEF(C). It can be observed that the sample average WEF(B) is very close to the (analytical) ensemble average WEF(A). Also, among the 1000 realizations, only two types of WEFs are observed, denoted as WEF type 1 and WEF type 2 as given in Table I. Among the 1000 realizations, the WEF type 1 and WEF type 2 appear 991 times and 9 times, respectively. Note that WEF type 1 and WEF type 2 are just two WEFs that occur with higher probability than the others. There are other WEFs, e.g., the WEF of the polar code, with small probabilities that do not appear among 1000 realizations. The WEF type 1 and WEF type 2 are all close to the WEF(A), which means that, with high probability, any realizations are as good as the ensemble average WEF(A). The WEF(C) of the polar code concentrates to a smaller number of Hamming weights, i.e., there are no codewords of Hamming weights 10, 14, 18, and 22 as shown in Table I. The reason is that the i-polar code contains interleavers which have the effect of spreading the Hamming weights of codewords widely.
For linear codes, the minimum Hamming weight, denoted as , and its multiplicity, denoted as , dominate the performance at high SNRs. Table I shows that both polar code and i-polar code have the same number of codewords 8 with minimum Hamming weight 4. This means that both codes have the the same error probability at high SNRs with ML decoding. This phenomenon can be observed from the upper bounds and simulated BLER curves of both codes shown in Figure 6.
The parameters and for the i-polar and polar codes with and varying from 32 to 480 are shown in Figure 5. The parameters and for i-polar codes are obtained through the recursive equation (5). Since there are no analytical WEFs for polar codes, we use the SCL decoder with to search for the minimum weight codewords as that proposed in . The results show that both codes have the same minimum Hamming weight. However, the parameters of i-polar codes are smaller than or equal to those of polar codes, which means that, for some cases, i-polar codes perform better than polar codes with ML decoding at high SNRs.
Iv-D BLER Upper Bound
In this paper, we focus on the performance of codes over BI-AWGN channels. For BI-AWGN channels, the th received signal can be represented as
where is the -th bit of the codeword , is the zero-mean additive white Gaussian noise with , and is the symbol energy. Given the WEF of a code , the union bound on the BLER over BI-AWGN channels is a function of the WEF given by
where is the signal-to-noise ratio defined as . However, the union bound may be too loose at low SNRs. A tighter upper bound, called simple bound, was proposed in . For convenience, the bound is given here as
where , and with ,
The functions and are given by
It should be noted that a similar bound can be employed to obtain the upper bound on the bit error rate (BER) if the IOWEF of the code is known. However, in this paper, we only focus on the BLER upper bound.
Figure 6 shows the BLER upper bounds for the polar code and i-polar code based on the WEFs given in Table I. Also BLER simulations are conducted based on the SCL decoder with . In this case, we have verified that the performance of the SCL decoder with is very close to the ML performance. The BLER upper bounds show that the i-polar code is slightly better than the polar code at low SNRs. Simulation results also show such a slight difference.
V WEF of Concatenated Coding Schemes
Polar codes are weak at short to moderate block lengths. Therefore, concatenated coding schemes are often considered in the design of polar codes. A famous coding scheme is to concatenate a CRC code as the outer code with a polar code as the inner code . The Reed-Solomon codes, BCH codes, convolutional codes, and LDPC codes are also considered as the outer code with a polar code as the inner code [18, 26, 10]
. So far, for concatenated coding schemes, most research works focus on the asymptotic analysis, i.e., the performance analysis asapproaches infinity. The performance levels for codes of finite block lengths rely on simulations which are time-consuming. We will develop the WEF analysis for concatenated codes which can then be used to evaluate the BLER performance using the bound given in (8).
We consider a concatenated coding scheme as shown in Figure 7. The encoder consists of parallel outer encoders of the same code, denoted as . The code is called the outer component code. For the th outer encoder, the input message word is denoted as a -bit vector and the output codeword is denoted as an -bit vector . The output codewords from the outer encoders form an -bit super-codeword, denoted as . The -bit super-codeword is then interleaved by an interleaver which outputs a vector represented as , where is the permutation matrix formed by the interleaver. The vector is then partitioned into blocks with each block of size , where . The partition is given by with being a -bit vector for . Then parallel i-polar encoders of the inner component code are employed to encode the input vectors and finally the output codewords are obtained. The final super-codeword is .
As indicated in Figure 7 by dashed blocks, we may represent the super-code corresponding to the parallel outer encoders as and the super-code corresponding to the parallel inner encoders as . The entire system becomes a simple concatenation of and with an interleaver in between. Given the WEF of the outer component code, and the IOWEF of the inner component code, we can calculate the WEF of the outer super-code and the IOWEF of the inner super-code