Enhanced Machine Learning Techniques for Early HARQ Feedback Prediction in 5G

07/27/2018 ∙ by Nils Strodthoff, et al. ∙ 0

We investigate Early Hybrid Automatic Repeat reQuest (E-HARQ) feedback schemes enhanced by Machine Learning techniques as possible path towards ultra-reliable and low-latency communication (URLLC). To this end we propose Machine Learning methods to predict the outcome of the decoding process ahead of the end of the transmission. We discuss different input features and classification algorithms ranging from traditional methods to newly developed supervised autoencoders and their prospects of reaching effective block error rates of 10^-5 that are required for URLLC with only small latency overhead. We provide realistic performance estimates in a system model incorporating scheduling effects to demonstrate the feasibility of E-HARQ across different signal-to-noise ratios, subcode lengths, channel conditions and system loads.



There are no comments yet.


page 13

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The next generation 5G wireless mobile networks is driven by new emerging use cases, such as URLLC [1]. To mention a few URLLC applications, tactile internet, industrial automation and smart grids contribute to increasing demands on the underlying communication system which have not existed as such before [2]. Depending on the actual application either very low-latency or high reliability or a combination of both are required. In contrast to LTE, where services were provided in a best effort manner, 5G networks have to guarantee these requirements. In particular for URLLC, the ITU proposed an end-to-end latency of 1 ms and a packet error rate of [3]. These demanding requirements have emerged discussions in the 3GPP Rel. 16 standardization process on how to fulfill these. Self-contained subframes and grant-free access have been proposed to address these requirements on the air interface side [4]. However, the impact on well-known mechanisms in wireless mobile networks is still unclear. In particular, the HARQ procedure poses a bottleneck for achieving aforementioned latencies. HARQ is a physical layer mechanism that employs feedback to transmit at higher target BLER, while achieving robustness of the transmission by providing retransmissions based on the feedback (ACK - acknowledgment / NACK - non-acknowledgment). However, it imposes an additional delay on the transmission, designated as HARQ RTT. This lead to the abandonment of HARQ for the 1 ms end-to-end latency use case of URLLC at least for the initial URLLC specification in Rel. 15[5]. This decision implied that the code rate is lowered such that a single shot transmission, i.e. no retransmissions and no feedback, is possible. On the one hand, this simplifies the system design, however on the other hand it sacrifices the overall spectral efficiency of URLLC transmissions. Hence, reducing the RTT to enable HARQ for URLLC becomes a critical issue.

One possibility to achieve this is to use Early HARQ (E-HARQ) schemes [6, 7] where the feedback on the decodability of the received signal is provided ahead of the end of the actual transmission process. The crucial component in this setting is the classification algorithm that provides the feedback, which we aim to optimize using Machine Learning techniques.

Earlier approaches addressing the feedback prediction problem with the sole exception of [8] focused exclusively on one-dimensional input features as BER estimates in combination with hard thresholding as classification algorithms [6, 7]. In [9], authors introduced the so-called VNR to exploit the substructures of LDPC codes for prediction. However, only a single feature, i.e. a single decoder iteration, in combination with hard thresholding has been used. We expect improvements in prediction accuracy by extensions in several directions in combination with more complex classification algorithms: (a) the evolution of input features through several decoder iterations considered for the first time in [8], (b) higher-dimensional intra-message features that in the ideal case leverage knowledge about the underlying block code and (c) history features that leverage information about the channel state from past submissions that is available at the receiver.

Here we significantly expand the approach put forward in [8], where we discuss first E-HARQ results empowered by Machine Learning techniques. We present an extended theoretical discussion in particular including the extension to multiple retransmissions and a system model that incorporates scheduling effects for the system evaluation thereby allowing a much more precise evaluation of the evaluation of the performance of E-HARQ-systems in realistic environments. On the classification side, this is supplemented by extended experiments including different input features and classification algorithms such as a newly developed supervised autoencoder for a larger range of SNR conditions, subcode lengths and different channel models.

The paper is organized as follows: In Sec. II we review the E-HARQ feedback process and investigate the role of the classification algorithm in a simple probabilistic model and in a more realistic setting of limited system resources. In Sec. III we discuss Machine Learning approaches for the classification problem introducing different input features and algorithms. The classification performance as well as the system performance is evaluated in Sec. IV for different signal-to-noise ratios, subcode lengths and channel conditions. We summarize and conclude in Sec. V.

Ii Early HARQ Feedback

As discussed in the Introduction, E-HARQ approaches aim to reduce the HARQ RTT by providing the feedback on the decodability of the received signal at an earlier stage. This enables the original transmitter to react faster to the current channel situation and to provide additional redundancy at an earlier point. In regular HARQ, the feedback generation is strongly coupled to the decoding process. In particular, the receiver applies the decoder on the whole signal representing the total codeword. An embedded CRC enables to check the integrity of the decoded bit stream. The result of this check is transmitted back as HARQ feedback, either acknowledging correct reception (ACK) or asking for further redundancy (NACK). Providing early feedback (E-HARQ) implies decoupling the feedback generation from the decoding process, which introduces a misprediction probability since the actual outcome is not known afore. By taking that step, it is possible to use only a portion of the transmission and thus reducing the time from initial reception to transmitting the feedback (

). In total, the retransmission is scheduled earlier, hence also reducing the HARQ RTT, see Fig. 1. The time for transmitting the feedback and receiving the retransmission () is not affected by this. For LDPC codes, E-HARQ can be realized under exploitation of the underlying code structure by investigating the feedback prediction problem on the basis of so-called subcodes [9, 10] from the parity-check matrix, which we denote by the fraction of the subcode length to the full codelength with typical values ranging from 1/2 to 5/6, designated as subTTI in Fig. 1. Shorter subcode lengths reduce the RTT but at the same time render the prediction problem more complicated.

Fig. 1: Timeline of regular HARQ compared to early HARQ.

In this section, we first introduce a simple probabilistic system model in Sec. II-A to provide an easy tool that evaluates the performance of the here presented E-HARQ schemes. However, this model only provides a measure in means of the final BLER and additionally implies the assumption of infinite resources. Hence, in Sec. II-B, we provide a more realistic system model together with the analysis of implications of finite size systems in Sec. II-C. This model provides a more suitable tool to evaluate the performance in practical systems, such as 5G and LTE. The finite-size system argument establishes an optimal point of operation for the E-HARQ schemes that is specific for the available system resources and does not exist in a system with unlimited resources.

Ii-a Probabilistic model for single-retransmission E-Harq

We analyze single-retransmission E-HARQ in a simple probabilistic model. For notational simplicity, we focus on the case of a single retransmission, the corresponding expressions for multiple retransmissions can be found in App. A.

Fig. 2: Probabilistic model for single-retransmission E-HARQ (terminal nodes marked in bold face lead to an effective block error).

The structure of the probabilistic model for E-HARQ is reflected in Fig. 2. After the initial transmission we end up in an state with probability , where we follow the common scheme in imbalanced classification problems encoding the minority i.e. block error class as positive. In the case the codeword gets decoded correctly irrespective of the feedback sent and a false positive feedback only implies an unnecessary transmission, which has no effect on the performance under the infinite resources assumption. In the former case we send either ACK with probability , which leads to an effective block error, or NACK with probability . In the latter case the message gets retransmitted which leads to an effective block error with probability . The value for crucially depends on the design of the feedback system most notably on the code rate used for the retransmission. However, one has to keep in mind that a decreased block error rate for the retransmission due to a decreased code rate might lead to latency losses due to the necessity of accommodating longer retransmissions. For identical retransmissions using an independent channel realization we would have or even if the decoder makes use of information from both transmissions for example using chase combining. For later reference we also define the joint probability . This simple argument leads to an effective block error probability


where we introduced an effective conditional probability to incorporate effects of an imperfect feedback channel. For simplicity we model the latter as a binary symmetric channel with bit flip probability . Using Fig. 3, we then obtain

Fig. 3: Incorporating the impact of an imperfect feedback channel via an effective false negative rate.

Empirically we can replace and by estimated block error rates and the conditional probability

by the classifier’s false negative rate (FNR) as obtained from the confusion matrix. Obviously the lowest possible effective BLER is achieved for perfect feedback, i.e. 

, for which we have . Eq. 1 only depends on the baseline BLERs and and the classifier’s (effective) false negative rate with leading order contribution given by . In the limit where the the leading behavior is just and hence independent of the classification performance.

Considering the question of latency, the simplest approach is to consider the expected number of retransmissions . Therefore we evaluate the probability for a single retransmission. Again using Fig. 2, we obtain


where we defined in analogy to Eq. 2 an effective false positive rate (FPR)


for the conditional probability that can be identified empirically with the classifier’s FPR. The leading order contribution to Eq. 5 is given by and the number of expected retransmissions therefore profits from a decreased FPR. For the case of a single retransmission, the expected number of retransmissions coincides with the single-retransmission probability,


These results already hint at the crucial importance of adjusting the classifier’s working point by balancing FNR versus FPR: A reduction of the FNR leads to a smaller effective block error probability, see Eq. 1, but comes along with an increased FPR as the two kinds of classification errors counterbalance each other. This in turn leads to an increase in latency, see Eq. 5. From the present discussion it might seem a reasonable strategy to target an arbitrarily small FNR such that the effective block error probability approaches the theoretical limit. However, this argument only holds for a system with unlimited resources, as will be discussed below.

Ii-B System model

Fig. 4: Simple schematic illustrating the system setup used for evaluations.

In order to derive a tool for evaluation of the performance of the discussed predictors, in this section we introduce a more sophisticated system model that leans on the structure of today’s mobile network technologies. In cellular networks, such as LTE and 5G, OFDMA has been established due to its scheduling flexibility. Especially, opportunistic scheduling allows to use the best possible channel for a transmission. Here, we assume a simplified OFDMA system with equally sized resources, i.e. frequency resources and a defined duration in time, so-called TTI, as illustrated in Fig. 4. The HARQ mechanism, regular HARQ as well as E-HARQ, requests based on the received parts of the transmission a retransmission, which is scheduled at earliest after time slots.

The main advantage of E-HARQ over regular HARQ is the reduced HARQ RTT. Hence, depending on the latency constraint more HARQ layers might be used to improve the system performance. In this work, we evaluated two different system approaches, long and short TTI lengths. The HARQ time line is mainly comprised by the processing time, which in general scales with the TTI length [11] and the transmission time for the feedback, which is not dependent on the TTI length of the transmission. Thus, for long TTI lengths this time can be considered insignificant. However, for short TTI lengths this constant component has to be considered for E-HARQ as well as regular HARQ systems. Hence, for long TTI, we assumed for rate-1/2 E-HARQ, which means that the retransmission is received in the next TTI and for regular HARQ, so that for regular HARQ one TTI has to be skipped. Analogously, for short TTI, for rate-5/6 E-HARQ and for regular HARQ. For long and short TTI this allows depending on the system load up to two retransmissions in the E-HARQ-scheme compared to only one in the regular HARQ-scheme. Due to the scalability of the TTI length, the absolute value of might be set to an arbitrary value, e.g.  1 ms. Thanks to the aforementioned opportunistic scheduling possibilities of OFDMA, we assume that the retransmission is independent of the previous transmission, i.e. and the total BLER , where is the number of retransmissions. Furthermore, an i.i.d. arrival rate for each UE is assumed. Thus, a single UE can only have one new transmission per time slot. For simplicity the following argument is carried for a perfect feedback channel, i.e. for , which is a reasonable assumption considering the results of the previous implying that the feedback error probability is at most of subleading importance. The system parameters are summarized in Tab. I.

UE packet arrival rate - medium load - 0.3,
high load - 0.36
Number of UE - 20
Number of resources 10
per time slot -
Delay constraint - long symbols - 3,
short symbols - 11
long TTI HARQ RTT - 1 (E-HARQ 1/2),
2 (regular HARQ)
short TTI HARQ RTT - 5 (E-HARQ 5/6),
6 (regular HARQ)
BLER of (re-)transmissions - as given in Tab. III
TABLE I: System Evaluation Parameters

Ii-C Implications of finite system size

In practical systems, there is a trade-off between the FNR and FPR due to the limited amount of available resources. Whereas a lower FNR increases the effective BLER, as shown in the Sec. II-A, it increases the transmission overhead on the other hand. Depending on the available resources this leads to resource shortage, also causing additional delays since transmissions cannot be scheduled in the designated time slots. This brings us to the term of packet failure rate which represents the probability that a packet is delivered successfully within a given time constraint. Interestingly, there is an optimal operation point which captures the trade-off such that the packet failure rate is minimized.

For the assumptions on the system model described in the previous section, the packet failure probability is given as


where is the number of maximum allowed retransmissions and for . The times correspond to the time required for scheduling transmissions. Thus, is the probability to schedule the initial transmission within the time constraint, is the probability to schedule the initial and the first retransmission within the time constraint etc. For simplicity we only condition the scheduling probabilities on the previous transmission, thus . If we set all scheduling probabilities to one Eq. 6 reduces to Eq. 14 and can therefore be seen as generalized version of the effective BLER. However, the effective BLER does not consider the finite resources and thus cannot capture the actual performance of the evaluated HARQ schemes in a practical implementation. We will refer to this case as the infinite resource baseline compared to the finite resource baselines discussed below.

At first glance, Eq. 6 suggests minimizing the FNR . However, a closer examination reveals that the scheduling probabilities carry a dependence on both FNR and FPR via the underlying resource distribution function. FNR and FPR counteract each other in the sense that a decreased FNR will lead to an increase in the FPR. Considering the dependency on the resource distribution function, an increase of the FPR increases the load on the system, thus lowers the probability that a transmission and its retransmission is scheduled within the time constraint. This fact is already apparent from the expected number of retransmission as obtained in Eq. 5 which scales with the FPR at leading order. This suggests that the packet failure probability seen as a function of the FNR will show a minimum characterizing an optimal trade-off between FNR and FPR for the given system resources.

In Eq. 6, highly depends on the load of the system, since it is mainly a scheduling problem. Based on the resource distribution which is discussed in App. C, we can formulate the probability of scheduling the initial transmission arriving at time slot and retransmissions within a time constraint as follows:


where is the probability that a packet that has arrived at is scheduled in time slot . Under the assumption that the resource distribution function is not diverging, the initial argument of in Eq. 8 is set to . As mentioned before, is the scheduling probability for an additional transmission assuming that this single transmission does not affect the system probabilities. So, this means that from the slots till the slot the system is fully loaded and the observed transmission is not scheduled (random scheduling). We allow only in slot a lower load or the random scheduler picks the observed transmission. Hence, this is expressed by:


where is the resource distribution function, which is discussed in more detail in App. C. The scheduling probability is discussed in further detail in App. D.

The derived packet failure probability provides a good tool to evaluate the performance of the predictors in a practical system. Additionally, apart from comparing the different E-HARQ schemes among each other, it enables a performance comparison with regular HARQ, which is crucial if E-HARQ is considered for URLLC. Here, aside the system setup presented in the previous section, for regular HARQ the FNR and FPR is assumed to be zero. This is a valid assumption since the included CRC allows to minimize false prediction events such that they can be neglected.

Iii Machine Learning for early HARQ

The Machine Learning task of predicting the decodability of a message based on information from at most the first few decoder iterations is an inherently imbalanced classification problem. This imbalance is a direct consequence of the base BLERs of the order that are required in order to be able to reach effective BLERs of the order , see Eq. 1. Different ways of dealing with this imbalance have been explored, see [12] for a review, that can be categorized as cost-sensitive learning, rebalancing techniques and threshold moving. The discussion in this section focuses on the latter, see also [13] and references therein, in the sense of readjusting the decision boundary of any trained model that outputs probabilities for the predicted classes.

By moving the decision boundary one is able to investigate the discriminative power of a given classifier over a whole range of different working points. This is typically analyzed in terms of Receiver-Operation curves (ROC) or Precision-Recall (PR) curves. In order to summarize the classifier’s performance with a single number, one conventionally resorts to reporting area-under-curve (AUC) metrics. Here we focus on the PR curve and the corresponding area under the PR curve, AUC-PR, rather than the ROC-curve as the former has been shown to better reflect the classifier’s performance for highly skewed datasets

[14, 15]. However, when summarizing the discriminative power of a classifier using a single figure, one loses fine-grained information about classification performance at different working points. This is particularly true since the full AUC naturally covers the whole range values for the decision boundary, many of which are irrelevant for practical applications where the classification performance in the small FNR-regime is most relevant. In addition, the actual implementation of the classifier requires a definite choice for the decision threshold. Therefore we supplement the global AUC-PR information with an analysis based on FNR-PPR curves. It is worth noting that the FNR-FPR curves directly relate to ROC curves since the true positive rate TPR that is plotted on the ordinate of the ROC-curve relates to the FNR via TPR = 1 - FNR. FNR and FPR represents the natural choice in our case since they represent the key output figures from the system point of view, see Sec II-A.

Iii-a Input features

We distinguish single-transmission-features derived from a single transmission and history information from past transmissions. In principle all these features can be combined at will to form the set of input features for the classification algorithm.

The raw data for a single transmission provided by the simulation is given by (a posteriori) LLR values after different decoder iterations. E-HARQ approaches to reduce the HARQ RTT have been first discussed in [6] and [7]. This approach estimates the BER based on the LLR and utilizes a hard threshold to predict the decodability of the received signal. The LLR gives information on the likelihood of a bit being either or . Denoting as the observed sequence at the receiver, the LLR of the bit is defined as:


Having the LLR of a subcode or the whole codeword allows to calculate an estimated BER for the received signal vector, as stated here:


where is the length of the LLR vector. Based on this metric the decoding outcome is predicted, where a higher means a lower probability of successful decoding.

A further improved approach has been presented in [9] and [10]. The authors propose to exploit the code structure to improve the prediction performance. In case of LDPC codes, this is realized by constructing so-called subcodes from the parity-check matrix. Using a belief-propagation based decoder on the LLR of the subcodeword results in a posteriori LLR:


where is the set of check nodes which are associated to the variable node of and is the check-to-variable node message from check node to variable . Here we use the superscript in to denote the decoder iteration after which the posteriori LLR were extracted with the obvious identification . Again, the a posteriori LLRs are mapped to the same metric for each belief-propagation iteration, designated as VNR:


where is the length of the subcodeword and denotes the belief-propagation iteration. Hence, corresponds to . In [9], the authors used a hard threshold applied to predict decodability.

In the following we use the abbreviations and to denote the VNRs/LLRs extracted from the th decoder iteration. If is omitted we refer to the set of all values from zeroth to fifth decoder iteration.

Assuming the receiver is operating on the same channel across different transmissions, it might be possible to increase the prediction performance by incorporating information from previous transmissions. This includes all features used as single-transmission features and in addition features that are only available after the end of the decoding process. As two representative examples for history features we investigate VNRs from past submissions (VNR_HIST) and information about the euclidean distance between the correct codeword and the final decoder result before the hard decision (EUCD_HIST). Here one has to keep in mind that the latter information is only available if the correct codeword is known to the receiver as for example from a previous pilot transmission but strictly speaking it cannot be reliably obtained from an ordinary previous transmission as even a correct CRC does not imply a correctly decoded transmission. For a given set of history features we consider means of the history features under consideration extracted from different numbers of past transmissions (1,2,5,9) in order to allow the classifier to extract information from past channel realizations at different time scales.

Iii-B Classification algorithms

As discussed in the introduction, we can view the problem either as a heavily imbalanced classification problem or as an anomaly detection problem. Here we briefly discuss suitable algorithms for both of approaches. As examples for binary classification algorithms we consider hard threshold (HT) classifiers, logistic regression (LR) (with

regularization and balanced class weights)and Random Forests (RF). HT applied to

/-data (referred to as HT0 and HT5 in the following) yield the classifiers used in the literature so far [6, 9]. For anomaly detection [16] on distinguishes unsupervised, semi-supervised and supervised approaches depending on whether only unlabeled examples, only majority-class examples or labeled examples from both classes are available for training. As anomaly detection algorithms we consider Isolation Forests (IF) [17]

as classical tree-based semi-supervised anomaly detection algorithm and supervised autoencoder (SAE) as a novel neural-network based approach for supervised anomaly detection, see App. 

B for details. We leverage the implementations from scikit-learn [18]

apart from SAC that was implemented in PyTorch


Iv Results

Iv-a Simulation setup

Transport block size 360 bits
Channel Code Rate-1/5 LDPC BG2 with Z = 36,
see [20]
Modulation order and algorithm QPSK, Approximated LLR
Waveform 3GPP OFDM, 1.4 MHz,
normal cyclic-prefix
Channel type 1 Tx 1 Rx, TDL-C 100 n , 2.9 GHz,
3.0 km/h (pedestrian) or
100.0 km/h (vehicular)
Equalizer Frequency domain MMSE
Decoder type Min-Sum
Decoding iterations 50
VNR iterations 5
TABLE II: Link-level simulation assumptions for training and test set generation.

We compare classification performance of different classifiers based on AUC-PR and FNR-FPR curves. As external parameters we vary the SNR between 3.0 and 4.0 dB and subcode lengths between 1/2 and 5/6. The simulation setup used to produce training and test data follows the one reported in [9]. We use the raw simulation output as well as a number of derived features. Here we consider both single-transmission features as well as history-features that incorporate information from a number of past transmissions, see App. III-A for a detailed discussion. We then investigate the performance of a number of classification algorithms operating on these input features, see App. III-B for a detailed breakdown. In all cases we use 1M transmissions with independent channel realizations for training and evaluate on a test set comprising at least 1M transmissions. The size of the test set for each SNR/subcode combination is given in the second column of Tab. III

. Hyperparameter tuning is performed once for the pedestrian channel (at SNR 4.0 dB and subcode length 5/6) on an additional validation set also comprising 1M samples. We standard-scale all different sets of input features independently using training set statistics. In this way we obtain a reasonable input normalization that is required for certain classification algorithms while keeping relative difference within different input feature groups intact.

Iv-B Classification Performance

SNR SC ch #train/#test BLER HT0 HT5 LR RF IF SAE
4.0dB 5/6 ped 1M/3M 0.001604 0.811 0.902 0.905 0.907 0.890 0.908
4.0dB 1/2 ped 1M/4M 0.001626 0.801 0.799 0.834 0.832 0.827 0.834
3.5dB 5/6 ped 1M/1M 0.002841 0.844 0.920 0.921 0.924 0.912 0.926
3.5dB 1/2 ped 1M/4M 0.002777 0.821 0.814 0.847 0.846 0.839 0.847
3.0dB 5/6 ped 1M/1.5M 0.004742 0.863 0.927 0.934 0.934 0.923 0.934
3.0dB 1/2 ped 1M/1.5M 0.004742 0.851 0.840 0.872 0.871 0.865 0.874
3.5dB 1/2 veh 1M/3M 0.002866 0.824 0.818 0.851 0.850 0.846 0.851
TABLE III: Comparing classification performance based on AUC-PR (classifiers as specified in Sec. III-B).

We start by discussing the classification performance for different classification algorithms based on -features extending the analysis from [8]. The classification results are compiled in Tab. III. We compare AUC-PR that characterizes the overall discriminative power of the algorithm and which tends to 1 for a perfectly discriminative classifier. The largest improvements to the simplest thresholding method HT0 is seen for longer subcode lengths such as 5/6. In these cases more complex classification methods applied to the full VNR-range show only small improvements over the HT5 threshold baseline. A different picture emerges at smaller subcode lengths. Here using VNRs from higher decoder iterations (HT5) does not improve or even worsen the classification performance compared to HT0. Here more complex classification algorithms show their true strengths and show larger improvements compared to HT0/HT5. This is a plausible result since decreasing the subcode length renders the classification problem more complicated and more complex classifiers can profit more from this complication. If we assess the difficulty of the classification problem based on the scores achieved by the classifiers, a clear picture emerges: As discussed before decreasing the subcode length for fixed SNR renders the classification problem more difficult, whereas decreasing the SNR for fixed subcode length has the opposite effect most notably because of an increasing BLER. On the other hand the BLER sets the baseline for the HARQ performance, see Eq. 1, which overcompensates the positive effects of the improved classification performance. The overall best discriminative power across different SNR-values, subcode lengths and channel conditions shows the supervised autoencoder closely followed by regularized logistic regression. The fact that the AUC-PR results for LR, RF and SAE are so close just reflects a similar overall discriminative power of these algorithms despite of fundamentally different underlying principles.

(a) 4.0 dB subcode 5/6
(b) 4.0 dB subcode 1/2
(c) 3.5 dB subcode 5/6
(d) 3.5 dB subcode 1/2
Fig. 5: Selected examples for classification performance based on VNR-features in the pedestrian channel.

This does, however, not imply coinciding FNR-FPR curves, where the classifiers show rather different behavior in certain FNR regions, see Fig. 5 for selected results. Random Forests, for example, show in general a very good overall performance but are considerably weaker than other classifiers in the small FNR-regime. When looking at FNR-FPR curves as the ones presented in Fig. 5, one has to keep in mind that it is very difficult in the extremely imbalanced regime to obtain reliable estimates of the FNR as both the numerator (false negatives) and the denominator (sum of false negatives and true positives) are small numbers requiring large sample sizes for a stable evaluation. This applies in particular to the region of small FNRs below 0.001.

To summarize, we clearly demonstrated that incorporating the evolution of the VNR across the first five decoder iterations into more complex classification algorithms such as logistic regression or supervised autoencoders leads to gains in the overall classification performance in particular in comparison to hard threshold baselines. This conclusion holds for various SNR-values, subcode lengths and channel conditions. Implications of these findings for the system performance will be discussed in Sec. IV-C.

We restrict the investigation of history features to the SAE classifier as the best-performing classifier from the previous section. However, we checked that the qualitative conclusions about the importance of history features hold irrespective of the classification algorithm under consideration. In Tab. IV we discuss the impact of history features on the classification performance in addition to the VNR-features discussed above.

features 4.0dB 1/2 ped 3.5dB 1/2 ped 3.5dB 1/2 veh
VNR 0.834 0.847 0.851
VNR+VNR_HIST 0.860 0.872 0.852
VNR+EUCD_HIST 0.883 0.892 0.861
TABLE IV: Comparing classification performance based on AUC-PR upon including history features (for SAE).

Irrespective of SNR, subcode length and underlying pedestrian or vehicular channel model, we see an improvement in classification performance upon including history features with best results achieved by incorporating euclidean distance features. History information seems to lead to larger improvements in the pedestrian channel compared to the vehicular channel. This is in line with the the channel conditions remaining unchanged for a longer time in the pedestrian compared to the vehicular case.

There are different caveats to this result. First of all, as discussed in Sec. III-A, the euclidean distance is only known to the receiver if the underlying codeword is known as it would be the case for a previous pilot transmission, which would however lead to latency overheads. Therefore the result including euclidean history features most likely overestimates the improvements in classification performance that can be obtained from using history features. Secondly, the use of history features is at tension with the assumption of an independent channel realization for the retransmission in the sense of as used in our system model. It is very unlikely that the improvements in classification performance can compensate the loss of approximately one order of magnitude in the error rate for the retransmission of using the same channel compared to the baseline BLER of the order of for an independent retransmission. Therefore the system level analysis is carried out using VNR-features only. Nevertheless the results put forward here stress the prospects of further investigations of features that explicitly characterize the channel state such as explicit channel state information that could have been obtained by a pilot transmission preceding the transmission.

Iv-C System Performance

(a) 4.0 dB subcode 5/6
(b) 3.5 dB subcode 1/2
Fig. 6: Selected examples for system performance in the pedestrian channel for two-retransmission E-HARQ with unlimited system resources.

We start by discussing system performance based on the simple probabilistic model for E-HARQ with unlimited system resources as introduced in Sec. II-A. The results are obtained straightforwardly from the FNR-FPR-curves presented in Sec. IV-B using Eqs. 1 and 5 or the corresponding generalizations for multiple retransmissions Eqs.14 and 21. Here we adopt as in Sec. II-B. Here we present results for two retransmissions that are possible for E-HARQ in both TTI scenarios discussed in Sec.II-B. In fact, increasing the number of retransmissions beyond two does not lead to further noticeable improvements in the given FNR range. In all cases effective BLERs of the order are attainable. Decreasing the subcode length from 5/6 to 1/2 while keeping the same effective BLER of as a definite example requires an increase of 40% and 45% in retransmissions at SNR 4 dB and 3 dB respectively. Correspondingly, decreasing the SNR for fixed subcode length from 4 dB to 3 dB while again keeping the effective BLER fixed leads to an overhead of 70% and 77% in retransmissions for subcode 5/6 and 1/2 respectively. However, as discussed in Sec.II-C, the presented effective BLERs only represent theoretical lower bounds for actual packet failure rates that are achievable in actual systems as they do not incorporate scheduling effects. In this infinite system setting there is no distinguished working point for the classifier and the only way of discriminating between different classifiers in the system setting is to rank by the number of expected transmissions for fixed effective error probability.

(a) 3.5 dB subcode 1/2 (high load, )
(b) 3.5 dB subcode 1/2 (medium load, )
(c) 4.0 dB subcode 5/6 (high load, )
(d) 4.0 dB subcode 5/6 (medium load, )
Fig. 7: Exemplary system performance comparison for rate 1/2 and 5/6 prediction schemes in high load and medium load scenarios (blue dashed line indicates ).

Fig. 7 shows exemplary results of the packet failure rate over the FNR of the E-HARQ schemes under medium () and high system load () together with the regular HARQ-baseline and the infinite system results from Eq. 14. The upper figures Figs. 7(a) and 7(b) show the long TTI design, as described in Sec. II-B, at 3.5 dB. For the high load (Fig. 7(a)) as well as the medium load (Fig. 7(b)) scenarios, the E-HARQ schemes achieve a superior performance compared to the regular HARQ thanks to the additional retransmission which is possible within the same latency constraint. However, a packet failure rate less than is only achieved in the medium load scenario. Here, we note that the actual performance of the E-HARQ schemes is approximated well by the approach with infinite resources, at least for high packet failure rates above . Only in the lower region an attenuation of the decrease is visible, whereas all prediction schemes achieve a comparable performance. In the high load scenario in Fig.7(a), we see the trade-off behavior, discussed in Sec. II-C. The packet failure rate decreases only up to a certain minimum at the optimal FNR-FPR trade-off and starts increasing after passing that point. So, lowering the FNR further after passing that point increases the packet failure due to the resource shortage. In this region, the actual performance of the prediction schemes becomes critical. Hence, SAE and LR have the lowest optimum. HT0 and HT5 perform worse at their optimal operation points, whereas HT0 is still performing better than HT5.

(a) 3.0 dB subcode 1/2 (high load, and )
(b) 3.0 dB subcode 1/2 (high load, and )
Fig. 8: Effects of the scheduling gain in the high load regime for the strict and relaxed latency constraint.

The resource shortage effect is clearly visible in Fig. 8, where the same load is applied in both scenarios but the latency constraint is relaxed in Fig. 8(b). As obvious in Fig. 8(a), the packet failure rate for all schemes is far away from the targeted packet failure rate of . With a relaxed latency constraint, as shown in Fig. 8(b), the performance is closer to the target packet failure rate. This improvement is explainable by two effects. First, the E-HARQ schemes benefit from the additional retransmission, which is possible in the relaxed latency constraint and thus in total achieve still a better performance than the regular HARQ. However, the gap is smaller compared to the normal latency constraint. Especially in the high load scenario, the regular HARQ profits from the increased scheduling flexibility although it can only perform the same number of HARQ retransmissions. The resource shortage effect is also observable for the regular HARQ performance comparing the medium load and the high load scenarios. It is notable that the regular HARQ could at least achieve a packet failure rate less than in the medium load scenario, whereas it is performing even worse in the high load scenario. We can see that even more clearly in the short TTI design in Figs. 7(c) and 7(d). In the medium load scenario in Fig. 7(d), the regular HARQ achieves a packet failure rate of almost , which corresponds approximately to the ideal performance of HARQ. In this system setup the regular HARQ makes use of the whole scheduling flexibility and thus, at least for the medium load scenario, the influence of scheduling probabilities can be neglected for the regular HARQ. Despite the limited scheduling flexibilities of the E-HARQ schemes, they achieve a better performance than the regular HARQ. However, this changes in the high load scenario in Fig. 7(c). Here, we observe that the regular HARQ benefits from its scheduling gain and thus, achieves the lower packet failure rate. In the high load scenario, we see that all prediction schemes achieve a similar performance, except the HT0 which is remarkably less performing than the others.

As already visible in the previous results, there is no clear winning scheme for all the scenarios. However, to compare the overall performance of the schemes, we introduce the total score , where is the enumerator over all SNRs and prediction rates and is the enumerator over all HARQ schemes. In Tab. V we present the results for all scenarios, where the ”” sign indicates that an FNR larger than the optimal FNR has been used for evaluations. As already notable in Fig. 7, the available data does not allow arbitrary small FNR and thus the optimal operation point cannot be reached for the medium load case. Hence, we used for the medium load evaluations since it provides a sufficiently reliable estimation. The evaluation at fixed FNR underestimates the overall performance compared to regular HARQ but allow a reliable ranking between different classifiers. Obviously, for reaching the optimal point of operation more data is required in the medium load case.

Nevertheless, in the medium load regime, LR achieves by far the best overall performance. The other E-HARQ schemes achieve a similar performance, where HT0 is able to achieve a slightly better performance than the other two. Interestingly here, SAE has a worse performance compared to LR although it was the best performing classifier in the previous section. A closer inspection reveals that for very low FNR SAE cannot keep up with the other classifiers. Especially that region, being not relevant for the performance metrics of the previous section, explains the contradicting results. However, the expected performance for SAE is observed going to the high load regime. Here, SAE and LR are the best performing E-HARQ schemes far ahead HT0, HT5 and regular HARQ. As already noted in Fig. 7, in the high load regime the performance at higher FNR is key. Hence, SAE is again in a well-operating region. In this region, we also note that HT0 is performing the worst among the classifiers though having the second-best performance in the medium load regime.

Summa summarum, E-HARQ is able to achieve large gains in means of packet failure rate compared to regular HARQ under latency constraints. Especially, LR is a promising approach, which achieves a good overall performance in high load as well as medium load regimes. The SAE as best-performing algorithm in the high-load case and the more extendable approach compared to LR might provide a viable alternative if the performance at very low FNR is improved.

scenario regular HARQ HT0 HT5 LR SAE

medium load

3.0dB 1/2 ped
3.5dB 1/2 ped
4.0dB 1/2 ped
3.0dB 5/6 ped
3.5dB 5/6 ped
4.0dB 5/6 ped
3.5dB 1/2 veh
total score 6.2577 0.0685 0.0936 0.0075 0.0866

high load

3.0dB 1/2 ped
3.5dB 1/2 ped
4.0dB 1/2 ped
3.0dB 5/6 ped
3.5dB 5/6 ped
4.0dB 5/6 ped
3.5dB 1/2 veh
total score 3.2918 0.4599 0.3306 0.1713 0.1703
TABLE V: Comparing system performance at their optimal FNR-FPR trade-off, as described in Sec. II-C.

V Summary and Conclusions

In this work we investigated Machine Learning techniques for E-HARQ by means of more elaborate classification methods to predict the decoding result ahead of the final decoder iteration. We demonstrated that more complex estimators such as logistic regression or supervised autoencoder that exploit the evolution of the subcodeword during the first few decoder iterations lead to quantitative improvements in the prediction performance over baseline results across different SNR and channel conditions. We put forward a simple probabilistic model and a more elaborate system model incorporating scheduling effects to evaluate system performance in a realistic environment. In this way we were able to demonstrate the practical feasibility of reaching effective packet error rates of the order as required for URLLC across a range of different SNRs, subcode lengths and system loads. More importantly, we showed that enabling more HARQ layers by introducing E-HARQ improves the overall reliability over regular HARQ under strict maximum latency constraints.

Further improvements of the classification performance are conceivable extending the approach presented in this work. Our results suggest that history features incorporating channel information from previous transmissions positively influence the classification performance but remain to be investigated in more detail. Similarly it seems very likely that classification algorithms could profit from intra-message features that go beyond the simple averaging features such as VNRs considered in this work, which ideally directly incorporate the code structure of the underlying channel code. However, such features suffer from high dimensionality and large correlations. Here a challenge remains to identify the most discriminative set of input features and appropriate classification algorithms to further improve the classification performance.

Ultimately, more advanced classification algorithms, which are within reach using techniques presented in this work, might allow more fine-grained feedback instead of a binary NACK/ACK response. Incorporating this information on the level of the feedback protocol would allow to design custom feedback schemes with potentially large latency gains.


  • [1] R. El-Hattachi and J. Erfanian, “NGMN 5G White Paper,” tech. rep., Next Generation Mobile Networks (NGMN), 02 2015.
  • [2] T. Fehrenbach, R. Datta, B. Göktepe, T. Wirth, and C. Hellge, “URLLC Services in 5G Low Latency Enhancements for LTE,” in 2018 IEEE 88th Vehicular Technology Conference (VTC Spring), August 2018.
  • [3] ITU, “IMT Vision – Framework and overall objectives of the future development of IMT for 2020 and beyond ,” tech. rep., ITU, 2015.
  • [4] K. Takeda, L. H. Wang, and S. Nagata, “Latency reduction toward 5G,” in 2017 IEEE Wireless Communication, June 2017.
  • [5] MCC Support, “Final Report of 3GPP TSG RAN WG1 #92b v1.0.0,” tech. rep., 3GPP, 04 2018.
  • [6] G. Berardinelli, S. R. Khosravirad, K. I. Pedersen, F. Frederiksen, and P. Mogensen, “Enabling Early HARQ Feedback in 5G Networks,” in 2016 IEEE 83rd Vehicular Technology Conference (VTC Spring), pp. 1–5, May 2016.
  • [7] G. Berardinelli, S. R. Khosravirad, K. I. Pedersen, F. Frederiksen, and P. Mogensen, “On the benefits of early HARQ feedback with non-ideal prediction in 5G networks,” in 2016 International Symposium on Wireless Communication Systems (ISWCS), pp. 11–15, Sept 2016.
  • [8] N.Strodthoff, B. Göktepe, T. Schierl, W. Samek, and C. Hellge, “Machine Learning for early HARQ Feedback Prediction in 5G,” in 2018 IEEE Global Communications Conference Workshops (GLOBECOM), submitted 2018.
  • [9] B. Göktepe, S. Fähse, L. Thiele, T. Schierl, and C. Hellge, “Subcode-based early HARQ for 5G,” in 2018 IEEE International Conference on Communications Workshops (ICC), May 2018.
  • [10] Fraunhofer HHI, “Agressive Early Hybrid ARQ for NR,” TDoc R1-1700647, 3GPP RAN1-NR#1 Spokane (US), Jan 2017.
  • [11] S. Nagata, L. H. Wang, and K. Takeda, “Industry perspectives,” IEEE Wireless Communications, vol. 24, pp. 2–4, June 2017.
  • [12] P. Branco, L. Torgo, and R. P. Ribeiro, “A Survey of Predictive Modelling under Imbalanced Distributions,” CoRR, vol. abs/1505.01658, 2015.
  • [13] G. Collell, D. Prelec, and K. R. Patil, “Reviving Threshold-Moving: a Simple Plug-in Bagging Ensemble for Binary and Multiclass Imbalanced Data,” CoRR, vol. abs/1606.08698, 2016.
  • [14] B. R. Kiran, D. M. Thomas, and R. Parakkal, “An overview of deep learning based methods for unsupervised and semi-supervised anomaly detection in videos,” CoRR, vol. abs/1801.03149, 2018.
  • [15] J. Davis and M. Goadrich, “The Relationship Between Precision-Recall and ROC Curves,” in Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, (New York, NY, USA), pp. 233–240, ACM, 2006.
  • [16] V. Chandola, A. Banerjee, and V. Kumar, “Anomaly Detection: A Survey,” ACM Comput. Surv., vol. 41, pp. 15:1–15:58, July 2009.
  • [17] F. T. Liu, K. M. Ting, and Z.-H. Zhou, “Isolation forest,” in Data Mining, 2008. ICDM’08. Eighth IEEE International Conference on, pp. 413–422, IEEE, 2008.
  • [18] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay, “Scikit-learn: Machine Learning in Python,” Journal of Machine Learning Research, vol. 12, pp. 2825–2830, 2011.
  • [19] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in PyTorch,” in NIPS-W, 2017.
  • [20] MCC Support, “3GPP TS 38.212 v1.0.1,” tech. rep., 3GPP, 09 2017.
  • [21]

    B. Zong, Q. Song, M. R. Min, W. Cheng, C. Lumezanu, D. Cho, and H. Chen, “Deep Autoencoding Gaussian Mixture Model for Unsupervised Anomaly Detection,” in

    International Conference on Learning Representations, 2018.
  • [22]

    P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting and composing robust features with denoising autoencoders,” in

    Proceedings of the 25th international conference on Machine learning, pp. 1096–1103, ACM, 2008.
  • [23] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
  • [24]

    S. Ioffe and C. Szegedy, “Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift,” in

    ICML, 2015.
  • [25] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” CoRR, vol. abs/1412.6980, 2014.

Appendix A Probabilistic model for multiple-retransmission E-Harq

In this section, we present the generalization of the results from Sec. II-A. These are obtained straightforwardly using the same formalism as above. The generalization of the effective error probability from Eq. 1 to the case of retransmissions is given by the iterative relation


where for :


and otherwise , which reduces to Eq. 1 for . For simplicity we can work with independent retransmissions i.e. , where we used the shorthand notation . Explicit expression for up to three retransmissions are in this case given by


If we denote the set of binary sequences of length by , the probability for having retransmissions is given by


which again reduces to Eq. 3 for . Again we may set for independent transmissions. In this case Eq. A simplifies to


The total number of expected transmissions is then simply given by


Appendix B Supervised autoencoder for supervised anomaly detection

The supervised autoencoder is a neural-network-based supervised anomaly detection algorithm. It enjoys a number of advantages compared to for example shallow neural network classifiers applied directly to the input data that arise from the fact that the classifier is not applied to the data directly but rather to the bottleneck features of an autoencoder. Therefore it is able to work in heavily imbalanced scenarios as the one considered in this work and does not suffer from highly correlated input.

For the construction of the SAE we leverage the approach put forward in [21] albeit in a supervised anomaly detection setting. Similar to their work we use a regular multi-layer fully-connected autoencoder with loss as a backbone. In addition, we jointly train a fully-connected classifier operating on the bottleneck features that is trained using cross entropy loss, see Fig. 9. The idea behind the joint training is to allows the autoencoder to not only build a reduced representation but also to build bottleneck features that contain most discriminative information for the classification task. We also experimented with using features derived from the reconstruction error (measured using cosine distance and reduced Euclidean distance) as additional input to the classifier as proposed in [21] but found no improvement.

Fig. 9: Architecture for supervised anomaly detection using a jointly trained supervised autoencoder (: input, : reconstructed input, : bottleneck features, : predicted label).

There are multiple ways of preventing overfitting in this setting: early stopping, reducing the bottleneck dimension, implementing the SAE as a denoising autoencoder [22] or regularization using dropout [23]. In our case dropout regularization both in the classifier as well as in the autoencoder itself proved most effective.

The network configuration reads for the autoencoder [FC(,25), FC(25,10), FC(10,3), FC(3,10), FC(10,25), Lin(25,)] and for the classifier [FC(3,10), FC(10,5), Lin(5,2), SM] with FC(x,y)

[Lin(x,y), BN, ReLU, DO] and input dimension

. Here Lin(x,y) denotes a linear transformation layer, BN a Batch Normalization-layer

[24], ReLU a ReLU activation layer, DO a dropout layer at a dropout rate fixed via hyperparameter tuning (both 0.2) and SM a softmax activation layer. Optimization is performed using the Adam optimizer [25] at learning rate 0.001. To stabilize training oversampling the minority class samples by a factor of 100 turned out to be beneficial.

Appendix C Resource distribution function of a system with finite resources

The resource distribution function describes the probability of having a specific number of resources to be scheduled at a time slot . With the aforementioned system setup mainly three components contribute to resource allocations. The first are the packet arrival processes of the individual UE. These pose the main component. Additionally, there are the HARQ retransmissions, which depend on the error probability of the underlying channel code for a specific channel. However, to simplify analysis a uniform BLER has been assumed for each of the transmissions. The last component is the overload of the previous time slot due to resource shortage, which is then transfered to the next time slot. Hence, the resource distribution is described as follows:


with , and and being the probability of having arrival processes, being the probability of having HARQ retransmissions in time slot and being the probability of having resources overload in the time slot to be transferred to the next time slot.

The probability of arrival processes for

UE is described straightforwardly as a binomial distribution for



and otherwise , where is the probability of packet arrival of one UE at one time slot. This modeling implicitly assumes that one UE can only have at most one new transmission per time slot.

Formulating is a bit more intricate since for a limited allowed number of HARQ retransmissions initial packet transmissions have to be distinguished probability-wise from HARQ retransmissions. This would require to distinguish initial transmissions and first, second up to retransmissions as separate dependencies in and would require to specify scheduling rules, which would considerably complicate the whole analysis. However, this limitation is overcome by allowing unlimited HARQ retransmissions. This implies that this approach cannot be used to analyze for example single-retransmission HARQ since the HARQ retransmission term assuming an infinite number of retransmissions as implemented below would drastically overestimate the system load from HARQ retransmissions hence punishing FPR too much. Hence, is given for and as:


and otherwise except for , where is the number of system resources per time slot, is the HARQ RTT and the single-retransmission probability as in Eq. 5. Because of notational reasons, we chose using an infinite sum, which can be easily replaced by splitting the sum at and replacing the part from to by . Still, this way of evaluating the HARQ-contributions in the system still overestimates the load from retransmissions and therefore underestimates the system performance.

The last component is simply defined by a back reference to the resource distribution function in the previous slot:


For the sake of simplicity, we may assume . This assumption makes the resource distribution function at time slot only dependent on the previous time slot and is a valid assumption for the evaluated early HARQ schemes.

Fig. 10: Non-converging and converging resource distribution functions over time of an overloaded system (left) and a balanced system (right).

Here, the interesting question is, if the resource distribution converges for . By simulating the propagation of over , we gain an insight on that question, as presented in Fig. 10. As obvious in Fig. 10(a), choosing the parameters such that the system is massively overloaded results in divergence of the resource distribution function. However, in case of a balanced system the resource distribution function shows a strong convergence behavior, as noticeable in Fig. 10(b). From Eq. 22, the conditioned resource distribution function for and follows as


where and for :


otherwise .

Appendix D Scheduling probability in a moderately loaded finite system

The scheduling probability as the probability that a transmission arriving at is scheduled after TTI, is given as


As obvious, crucially depends on the resource distribution function , which is the probability that resources arrive at time slot

, and its probability distribution conditioned on the previous number of resource arrivals

. The properties and formulation of this distribution is evaluated more in detail in App. C.

However, the exact formulation of poses computational problems due to the infinite sums and the exponential growth of computation for increasing . Hence, we introduce Lemma 1 to simplify the computation of the scheduling probability.

Lemma 1.

For a moderately loaded system with and , the resource distribution function is approximated for sufficiently large time slots by


Assuming a converging behavior of the resource distribution function, there exits a time slot and a lower bound and an upper bound , such that for all . Additionally for an non-heavily loaded system which is required for URLLC traffic, we assume . Also, the lower bound is assumed to be sufficiently large, .

The resource distribution function at time slot is formulated as


The sum can be divided into two regions, below and above. Since for any and is close to the number of resources of the system, we approximate the conditional function by assuming resources in the previous time slot. For a moderately loaded system, this is a valid assumption, since the resource probability distribution function is decreasing fast for . Only for small arguments close to the deviation increases. However, the constraint regarding , which prevents underutilization, ensures that is getting very small in that region anyway. Hence, we approximate the conditional resource distribution probability for by


Using Lemma 1 for , the scheduling probability is approximated by