I Introduction
Polar codes [1] are the first class of constructive channel codes that was proven to achieve the symmetric (Shannon) capacity of a binaryinput discrete memoryless channel (BIDMC) using a lowcomplexity successive cancellation (SC) decoder. Decoding of polar codes and their variants requires passing the channel loglikelihood ratios (LLRs) through a factor graph shown in 1. The evolved information at the output of the factor graph is used to make a hard decision or to calculate a metric in the SCbased decoders. The evolved LLR, a.k.a decision LLR, is obtained for each bitchannel successively. To calculate each decision LLR, we need to access the intermediate information on the factor graph. There are two ways to access them: 1) We can store all intermediate values, including LLRs and partial sums on the factor graph. This approach is acceptable for short codes under SC decoder or Fano decoder. However, as the code gets longer, in particular under list decoding or stack decoding, this approach will be expensive in terms of memory requirement. 2) We can store a portion of the intermediate values. It was observed in [2] that for calculating every decision LLR, we need at most intermediate LLRs (excluding channel LLRs) and partial sums at any decoding step.
Some decoding schemes rely on additional decoding attempts when the decoding process fails in the first attempt. These schemes are as follows: 1) SCflip decoding: In this scheme, when SC decoding fails, the decoding is repeated from scratch, while in the additional attempts, the value of a single or multiple bits are flipped throughout the SC decoding process to correct the error caused by the channel noise and avoid propagation of this error [3]. 2) Shiftedpruning based list decoding: In this scheme [4, 5, 6], when SC list decoding fails, additional decoding attempts may correct the error given we shift the path pruning window at the position where the correct path was pruned from the list in the first decoding attempt. Note that a special case of this scheme is also called SCL flip scheme, bitflipping for SCL, or by other names. 3) Fano decoding: In this scheme [7, 8], the decoder may have a backtracking or backward movement to explore the other paths on the decoding tree. Unlike the first two schemes where the decoding of a codeword is completed, and then the additional decoding is repeated from the first bit, in the Fano algorithm, the backward movement occurs frequently somewhere between the first bit and the last bit. It might be better to use memory for intermediate information in Fano decoding of very short codes rather than memory elements. This way, we the complexity reduces significantly at the cost of a larger memory requirement.
In [7] and [8], we proposed a sophisticated algorithm to do the partial rewinding in Fano decoding. In this work, we propose a simple analyticallyproved approach to efficiently rewind the SC algorithm to the position that we need to flip the value of a bit in SCflip decoding or to shift the pruning window in the shiftedpruning scheme. This approach is designed based on the scheduling properties of the SC process and an operator that we introduce in this paper. The approach relies on a special grouping of the bit indices in based on the introduced operator. We also prove that the suggested bit index to resume the SC process utilizes the untouched intermediate information in the memory left from the previous decoding stages. The grouping of bit indices eases the job of finding the closest bit index to the target bit index for rewinding. This closeness contributes to the significant reduction of the time and computational complexity of the underlying decoding scheme while it does retain the error correction performance. We also adapt the proposed approach for multiple rewinds, which is a bit different with a single rewind, and we apply it on SCflip decoding and shiftedpruning scheme for list decoding.
Paper Outline: The rest of the paper is organized as follows. Section II introduces the notation for the polar codes and describes the intermediate information of the SC process. Section III review the details of updating schedule for intermediate information based on the binary representation of the bit indices. In Section IV, the properties of the SC process by introducing an operator and a special grouping scheme are explored. Section V proposes a simple approach for single and multiple rewinds of the SC process. In Section VI, we evaluate the complexity reduction of the proposed approach by applying it on the SCflip and shiftedpruning sachems. Finally, Section VII makes concluding remarks.
Ii Preliminaries
A polar code of length with information bits is constructed by choosing
good bitchannels in the polarized vector channel for transmitting the information bits and optional auxiliary CRC or parity bits. The indices of these bitchannels are collected in the set
. The rest of the bitchannels are used for transmitting known values as redundancy. Polar codes are encoded by where is the input vector, and , where , is an bitreversal permutation matrix, and denotes the th Kronecker power [1]. Let denote the output vector of a noisy channel, and vector indicates the long likelihood ratios (LLRs). The channel LLRs are computed based on the received signals from the physical channel, .We also have intermediate LLRs as shown in Fig. 1. The intermediate LLRs are computed based on the type of node in the factor graph. and nodes are shown by circles and rectangles, respectively, in the factor graph. The output of these nodes can be computed from right to left by
(1) 
(2) 
where and are the input LLRs to a node and
is the partial sum of previously decided bits corresponding to feed the estimated bits
backward into the factor graph.In the SC decoding, the nonfrozen bits are estimated successively based on the evolved LLRs via a onetime pass through the factor graph. When decoding the th bit, if , then since is a frozen bit. Otherwise, bit is decided by a maximum likelihood (ML) rule . Unlike the SC decoding which makes a final decision for th bit, SC list decoding considers both possible values and . In SC list decoding, the most reliable paths are preserved at each decoding step to limit growing of the number of paths. The solution for decoding is chosen at the lest bit based on the likelihood or the cyclic redundancy check (CRC) approach. The cyclic redundancy check (CRC) is also used in the redecoding schemes such as SCflip and the shiftedpruning based list decoding.
Iii Updating the Intermediate Information
Iiia Intermediate LLRs
The factor graph shown in Fig. 1 has nodes however, as it was shown in [2], it is sufficient to update/access at most intermediate LLRs out of LLRs for decoding any bit . Fig. 1 illustrates the LLRs associated with decoding bit in a tree form on the factor graph.
As can be seen, there are LLRs in stage for . Hence, according to the geometric series, we need a total memory space of
(3) 
Suppose is the bit that was just decoded and is the binary representation of index where the least significant bit is indexed 0 and most significant bit is indexed . . The stages are updated from right to left (where ). The first stage to be updated is obtained by finding the first one, ffo, or the position of the least significant bit set to one as
(4) 
Note that in the semiparallel hardware architecture [2], since the LLRs are stored in blocks, memory usage is inefficient such that there will be some unused memory space. In fact, the reduction in the number of processing elements is traded with slightly higher clock cycles and larger memory space.
IiiB Partial Sums
The Partial sums are the other set of intermediate information needed for the SC process. It turns out that we need the same memory space for the partial sums as well, i.e., at most memory elements. It was observed in [2] that we need to store bits corresponding to nodes of type at stage , which are waiting to be summed with the next decoded bit. Here, let us define an operator that indicates the last stage to be updated. The last stage that its partial sums to be updated is obtained by finding the first zero, ffz, or the position of the least significant bit set to zero as
(5) 
It turns out that this is the only stage that consists of nodes in the process of updating LLRs from stage up to . Clearly, after decoding the last bit where there is no zero in the binary representation of the index, , there is no need to update partial sums.
Fig. 2 shows partial sums ( to ) associated to . The values in orange are updated after decoding bit as , , , and .
There are methods proposed in [9, 10] for hardware implementation that require slightly less memory space for updating the partial sums.
You may notice that for , we have
(6) 
That is the reason why at any bit , the stage where its LLRs needs to be updated consists of only nodes. Therefore, after decoding bit , the partial sums of this stage are updated to be used for the nodes at stage .
Iv Properties of the SC Process
We discover some properties of the SC algorithm that can help us to rewind the process efficiently. The goal is not storing all the values for LLRs and partial sums or restarting the SC process from bit 0 in the SCbased decoding when a redecoding attempt is required. First, let us define an operator that helps us in the upcoming analysis.
Definition 1.
The operator finds the last zero, flz, or the position of the most significant bit set to zero in the binary representation of indexed in reverse order as
(7) 
for every . We denote the output of the operator by parameter .
Note that since the indexing is in the opposite direction when the most significant bit is set to zero, i.e., , then we get , and when the only zero bit is or there is no 0value bit, then .
Definition 2 (Set ).
We group the bit indices based on the identical into sets denoted by set with order , or
(8) 
Example 1.
For , we can group the indices 0 to 7 into the following sets: , , and .
Remark 1.
The distribution of nonfrozen indices in set among sets depends on the code rate. As the code rate reduces, a fewer nonfrozen indices will exist in low order , i.e., with small .
Lemma 1 (Properties of ).
For any and , set has the following properties:

The boundaries of set are
(9) 
The size of set is
(10) 
The smallest element in set is
(11)
Proof.
Let us first introduce a notation for the binary representation of a positive integer with length . Given indicates a mixed string of 0 and 1, and denotes a uniform string of either 0 or 1, both with length . In set , observe that the elements are in the form of where the operator ’+’ is used for concatenation and is/are the most significant bits.

The smallest element of set in binary is
which is equivalent to
in decimal. Similarly, one can see that the largest element in set is
which is equivalent to in decimal.
Note that the largest element in set is while the smallest element follows the relationship discussed above.

Given the interval in part a of this lemma, we can find the size of set by .

It follows from part a of this lemma that the lower bound of the values in set in binary is which is equivalent to in decimal.
∎
Example 2.
Let us find the deepest updated stage while decoding any bit within set in the following lemma.
Lemma 2.
For any , and we have
(12) 
Proof.
Let us recall the notation for where the operator ’+’ is used for concatenation and is/are the most significant bits. According to (4), the maximum value for , i.e., the largest index for the least significant bit set to one for , is obtained when we have
which is the smallest element in set , i.e., .
For , although the notation is in the form of , the largest index for the least significant bit set to one is similarly obtained from which is the smallest in set ∎
Clearly, when , we have for any .
Remark 2.
Now we consider updating intermediate information for in different sets of .
Lemma 3.
For any , , we have
(13) 
Proof.
Corollary 1.
For any , , we have
(14) 
Remark 3.
Remark 4.
As per Remark 3 and the fact that updating the intermediate information is performed from stage to stage 0, rewinding the SC algorithm from bit to bit , does not require any additional update of the intermediate LLRs or partial sums.
V Efficient Partial Rewind
We learned in Section III that we could save memory significantly by knowing the required intermediate LLRs and partial sums needed for decoding each bit. However, there is a drawback to this efficiency. Since we use limited space for intermediate information instead of memory elements, we have to overwrite the current values we no longer need to proceed with decoding. In the normal decoding process, the overwriting operation does not cause any data corruption. However, if we need to move backward like in SCflip, shiftedpruning, or Fano decoding, we may no longer access the intermediate information as it may have been lost due to overwriting.
In this section, based on the properties of the SC process we studied in Section IV, a scheme is proposed such that rewinding the SC algorithm is performed efficiently by significantly fewer computations comparing with restarting the algorithm.
Suppose the SC algorithm is decoding bit and needs to rewind the SC process to bit , and . In SCflip scheme and shiftedpruningscheme, we have however, in Fano decoding, . Since the required intermediate information for decoding bit may partially be overwritten, we may need to rewind further to a position denoted by . From , the SC algorithm proceeds with the normal decoding up to position . We shift the pruning window at this position, or we flip the bit and then continue the normal SCbased decoding.
Now, the question is what the position is? Let us assume and . Then,
(15) 
Example 3.
Suppose and we need to rewind the SC algorithm from position to . We know that and . Therefore, according to (15), .
Recursion for Case : For the case in (15), we may choose a position , for rewinding, which is more efficient. To this end, let us and , then while :

first, truncate the binary representation of by removing the bits from position to the most significant bit (inclusive), i.e. position . Note that after truncation, we have a binary number with length .

secondly, find the new set such that for .

then, .
We can continue the above procedure recursively to minimize . Note that in this recursion, and are being replaced with new values at each iteration.
Example 4.
Suppose and we need to rewind the SC algorithm from position to . We know that and therefore . We truncate as mentioned above. We get , , and . Hence, the new is which is the same as before.
Example 5.
Suppose and we need to rewind the SC algorithm from position to . We know that and therefore . However, if we truncate as mentioned above, we get , , and . Hence, the new is .
One can observe that the recursion is not used in the schemes that the rewind is performed from the last bit index. The reason is that bit index and this set has only one other element which is . On the other hand, if we need to rewind the SC process to a bit index smaller than , the target bit index will fall into another set with different . Hence, this may be used for Fano decoding where the case is possible. Note that we do not numerically evaluate this approach for Fano decoding as we do not have any other approach to compare with. We can either use this approach or simply we can store all intermediate LLRs and partial sums and trade a significant complexity reduction with the memory efficiency.
Now, let us adapt the proposed approach for rewinding more than once. In the shiftedpruning scheme (and in the SCflip scheme), we may need to repeat the rewind of the SC process up to times. Therefore, we need to take this into our consideration. Assuming indicates the current iteration, and and denotes the and of iteration , then of the current iteration is obtained by considering as follows:
(16) 
As (16) shows, if the destination position of the current iteration is larger than the destination position of the previous iteration, the intermediate information is not valid. The reason is that some modification (bitflipping or shiftedpruning) occurred at position that affects not only the intermediate information but also the decoded data. In other words, we need to go to position and undo the modification and proceed with the decoding up to the position and then perform the modification of the current iteration. Note that if both and are in the same , then , hence there will be no difference.
Fig. 3 compares and for an example where 5 iterations are occurring.
Furthermore, when rewinding the list decoder from the last bit position, , to position , some of the paths that existed at position in the previous iteration might be eliminated in between and be replaced with other paths. This potential replacement should be addressed when we have a list of paths/candidates, such as in the shiftedpruning scheme, not in the SCflip scheme. To simplify the problem, we can limit the positions to . Because all the computations of the intermediate LLRs from this position, , up to the last position, , are performed solely based on the channel LLRs and partial sums of stage . Hence, we need to store the decoded data, , and the path metric of all the paths at position . Partials sums can be stored as well or can be computed simply by .
Vi Numerical Results
We show that in the additional decoding attempts in SC list decoding and SC decoding, the average complexity (in terms of required timesteps and node visits) can be significantly reduced by partial rewinding instead of full rewinding of SCbased decoder. Note that taking average over all the decoding attempts including the successful attempts in the first run does not give a good insight and a fair comparison in particular at medium and high SNR regimes. The reason is that only a small portion of the total attempts fail requiring additional attempts, e.g., less than 10 failures in decoding attempts in the FER range of . Hence, the impact of this small portion becomes negligible on the average number of total attempts per codeword at high SNR regimes.
Figures 4 and 5 compare the average computational complexity of shiftedpruning scheme with and without partial rewinding for two different codes. In Fig. 4, the FER and time complexity of polar code of (512,256+12) constructed with DEGA (2dB) [11] and concatenated with CRC12 with polynomial 0xC06 under SC list decoding with list size with shiftedpruning (SP) are shown. The FER before and after using the efficient partial rewind (PR) scheme clearly shows that the proposed efficient partial rewind scheme does not degrade the decoder’s performance as we expected. However, it reduces the average timesteps over additional iterations (when the decoding fails) by over 30% (from timesteps (or clock cycles) [2] down to about 700 timesteps). The average time steps over all the iterations also reduce, but at high SNRs, it approaches 1022. The reason is that at high SNR regimes, the number of errors, FER, is low. Compared with the total number of codewords decoded successfully, just a small number of codewords are failed to be decoded in the first attempt and need additional attempts/iterations.
As Fig. 5 shows, the reduction in the average time complexity for efficient partial rewind scheme improves for polar code P(512,128+12) constructed with DEGA (1 dB). The average timesteps over additional iterations by about 45% (from timesteps down to about 570 timesteps). The reason is that at a low code rate of , the positions for shifting the pruning window are mostly located in the interval where the partial rewinding can be effective in reducing the complexity. Recall that if , then . That means a full rewind is needed. One can guess that the reduction in the complexity would be less at high code rates where the position for shifting are dominantly located in as the reliability of these bitpositions are less relative to the ones in .
Similarly, we can show a significant reduction in the complexity of the additional attempts in the SCflip decoding algorithm. Fig. 6, 7, and 8 illustrate the reduction in the node visits on average for CRCpolar codes of length at rates . The metric used in the SCflip implementation is similar to the one in [3] as our purpose in this work is not the performance of SCflip but to show the reduction in the complexity. Hence, a similar result can be obtained by applying the partial rewind on any variant of the SCflip decoder. As can be seen, the FER remains unchanged by partial rewind, while the additional decoding attempts are performed with significantly lower node visits on average. This reduction increases at high SNR regimes as the targeted positions for bitflipping become more accurate and their number decreases. The main contribution to this decrease is related to 16 where in the fewer additional attempts, mostly one attempt.
Fig. 9 compares the time complexity at at rates . By recalling Remark 1, one can observe that at low code rates, on average decreases significantly comparing with high rates, therefore, we expect to visit a fewer nodes in the additional decoding attempts and consequently the time complexity reduces more than high code rates. Similar to node visits, this reduction increases at high SNR regimes as the targeted positions for bitflipping become more accurate and their number decreases. Note that the average time complexity over additional iterations does not depend on the code rate if we don’t use partial rewinding as we start redecoding from bit 0 for any code rate.
Vii Conclusion
When decoding fails in the first decoding attempt, a partial rewind of the SC process for additional attempts is needed in the memoryefficient SCbased decoders. In this paper, an efficient partial rewinding approach based on the properties of the SC algorithm is proposed. This approach relies on the properties of the SC process and its updating schedule. Then, this scheme is adapted to multiple rewinds, and to SC list decoding, where there exists more than one path comparing with SC decoding. The numerical results show a significant reduction in the average time and computational complexity of additional decoding attempts in the SCflip decoding and SC list decoding under the shiftedpruning scheme while the performance remains the same.
References
 [1] E. Arikan, “Channel polarization: A method for constructing capacityachieving codes for symmetric binaryinput memoryless channels,” IEEE Trans. Inf. Theory, vol. 55, no. 7, pp. 30513073, Jul. 2009.
 [2] C. Leroux, A. J. Raymond, G. Sarkis, and W. J. Gross, “A semiparallel successivecancellation decoder for polar codes,” IEEE Trans. Signal Process., vol. 61, no. 2, pp. 289299, Jan. 2013.
 [3] O. Afisiadis, A. BalatsoukasStimming, and A. Burg, “A lowcomplexity improved successive cancellation decoder for polar codes,” IEEE 48th Asilomar Conference on Signals, Systems and Computers, 2014, pp. 21162120.
 [4] M. Rowshan and E. Viterbo, “Improved List Decoding of Polar Codes by Shiftedpruning,” 2019 IEEE Information Theory Workshop (ITW), Visby, Sweden, 2019, pp. 15.
 [5] M. Rowshan and E. Viterbo, “Shifted Pruning for Path Recovery in List Decoding of Polar Codes,” 2021 IEEE 11th Annual Computing and Communication Workshop and Conference (CCWC), 2021, pp. 11791184.
 [6] Y. Lv, H. Yin and Y. Wang, “An Adaptive Ordered ShiftedPruning List Decoder for Polar Codes,” in IEEE Access, vol. 8, pp. 225181225190, 2020.
 [7] M. Rowshan, A. Burg and E. Viterbo, “PolarizationAdjusted Convolutional (PAC) Codes: Sequential Decoding vs List Decoding,” in IEEE Trans. on Vehicular Technology, vol. 70, no. 2, pp. 14341447, Feb. 2021.
 [8] M. Rowshan, A. Burg and E. Viterbo, “Complexityefficient Fano Decoding of Polarizationadjusted Convolutional (PAC) Codes,” 2020 Intl Symp. on Inf. Theory and Its Applications (ISITA), 2020, pp. 200204.
 [9] G. Berhault, C. Leroux, C. Jego and D. Dallet, ”Partial sums computation in polar codes decoding,” 2015 IEEE International Symposium on Circuits and Systems (ISCAS), 2015, pp. 826829.
 [10] Y. Fan and C. Tsui, ”An Efficient PartialSum Network Architecture for SemiParallel Polar Codes Decoder Implementation,” in IEEE Transactions on Signal Processing, vol. 62, no. 12, pp. 31653179, June 15, 2014.
 [11] P. Trifonov, “Efficient design and decoding of polar codes,” IEEE Trans. Commun., vol. 60, no. 11, pp. 3221–3227, Nov. 2012.
Comments
There are no comments yet.