DENS-ECG: A Deep Learning Approach for ECG Signal Delineation

by   Abdolrahman Peimankar, et al.

Objectives: With the technological advancements in the field of tele-health monitoring, it is now possible to gather huge amounts of electro-physiological signals such as electrocardiogram (ECG). It is therefore necessary to develop models/algorithms that are capable of analysing these massive amounts of data in real-time. This paper proposes a deep learning model for real-time segmentation of heartbeats. Methods: The proposed algorithm, named as the DENS-ECG algorithm, combines convolutional neural network (CNN) and long short-term memory (LSTM) model to detect onset, peak, and offset of different heartbeat waveforms such as the P-wave, QRS complex, T-wave, and No wave (NW). Using ECG as the inputs, the model learns to extract high level features through the training process, which, unlike other classical machine learning based methods, eliminates the feature engineering step. Results: The proposed DENS-ECG model was trained and validated on a dataset with 105 ECGs of length 15 minutes each and achieved an average sensitivity and precision of 97.95 95.68 was evaluated on an unseen dataset to examine its robustness in QRS detection, which resulted in a sensitivity of 99.61 The empirical results show the flexibility and accuracy of the combined CNN-LSTM model for ECG signal delineation. Significance: This paper proposes an efficient and easy to use approach using deep learning for heartbeat segmentation, which could potentially be used in real-time tele-health monitoring systems.


page 1

page 2

page 3

page 4


Atrial Fibrillation Detection and ECG Classification based on CNN-BiLSTM

It is challenging to visually detect heart disease from the electrocardi...

Deep Learning-Based Arrhythmia Detection Using RR-Interval Framed Electrocardiograms

Deep learning applied to electrocardiogram (ECG) data can be used to ach...

Development of Interpretable Machine Learning Models to Detect Arrhythmia based on ECG Data

The analysis of electrocardiogram (ECG) signals can be time consuming as...

Robust R-Peak Detection in Low-Quality Holter ECGs using 1D Convolutional Neural Network

Noise and low quality of ECG signals acquired from Holter or wearable de...

Single-modal and Multi-modal False Arrhythmia Alarm Reduction using Attention-based Convolutional and Recurrent Neural Networks

This study proposes a deep learning model that effectively suppresses th...

Deep Recurrent Neural Networks for ECG Signal Denoising

We present a novel approach to denoise electrocardiographic signals (ECG...

Better-than-expert detection of early coronary artery occlusion from 12 lead electrocardiograms using deep learning

Early diagnosis of acute coronary artery occlusion based on electrocardi...

1 Introduction

Analysis of electrocardiogram (ECG) signals is one of the most important steps in the diagnosis of cardiac disorders. In order to achieve high diagnostic accuracies, almost all the ECG analysis tools/software require the knowledge about the location and morphology of different segment waveforms (P-QRS-T) in ECG records. For example, atrial fibrillation (AFIB) is one of the most common cardiac arrhythmias in elderly population [1, 2] and P-wave absence is one of the important and clinically useful features for the detection of AFIB [3, 4]. This makes P-wave delineation of great importance in clinical practice. In addition, most of the developed state-of-the-art algorithms for analysing ECG records are also dependant on the detection of QRS complexes (R-peaks) [5, 6, 7, 8, 9, 10, 11, 12, 13, 14]. These algorithms use extracted RR intervals to define various features, which can be used for classification of different arrhytmias. It should be noted that these methods usually utilise the annotated databases such as the Physionet datasets to validate their performance [15]. For example, [16, 17, 18, 19, 20, 21, 22, 23] proposed algorithms using RR intervals based features from the annotated databases to detect cardiac arrhythmias. However, these methods become extremely cumbersome to test on the newly collected raw ECG signals in real-world applications.

The latest advancements in tele-health monitoring systems provide the opportunity to collect huge amounts of ECG data. One of the most common ways for physicians/cardiologists to analyse ECG waveforms is through visual examination of the recordings. However, it is not always easy and in most cases difficult as well as extremely time consuming to analyse such huge amounts of data. Subjectivity is another big concern in such approaches. Therefore, it is of great interest to develop a reliable software for ECG signal delineation in order to specify the accurate location of different segment waves, such as the P-wave, QRS complex, and T-wave, for subsequent use in diagnostics.

Various state-of-the-art algorithms for ECG delineation have been introduced in the literature. Most of these algorithms are based on classical machine learning and digital signal processing techniques. [24] proposed a wavelet based ECG delineation algorithm. This algorithm was validated on different datasets achieving very high performance. [25] used generalised orthogonal forward regression with Gaussian mesa function models for automatic ECG wave extraction. [26] proposed a Bayesian model for P- and T- waves detection, which showed higher accuracy compared to previously published algorithms but at a higher computational cost.

Classical machine learning methods are required to define specific features from the data and use them as inputs in order to train the models. Feature engineering is an essential step to transform the raw data into a suitable internal representation from which the model is able to distinguish between the different classes [27]. On the other hand, deep learning models overcome this limitation by automatically extracting the relevant features directly from the data and outperforms many state-of-the-art models within different fields and applications of machine learning. This is because, deep learning models are capable of extracting highly abstract features from signals without the need for prior domain knowledge and expertise [27, 28].

The applications of deep learning in tele-health and biomedical engineering is growing exponentially in recent years. These methods have been used to solve many challenges in biomedical signal processing applications. For example, these methods were applied successfully to ECG arrhythmias classification [29, 30, 31, 32] and electroencephalogram (EEG) signal classification [33, 34, 35]. It has been shown in the literature that a combination of convolutional neural networks (CNN) and long short-term memory (LSTM) can enhance the classification/prediction performance as both the CNN and LSTM learn different (complex) functions from the input signals in the training phase[36, 37, 38, 39, 40]. In this paper, a deep combined CNN-LSTM model, named as the DENS-ECG algorithm, is proposed to automatically extract features from ECG records. These features are subsequently used to distinguish between four distinct component waveforms (i.e. P-wave, QRS complex, and T-wave) in each heartbeats.

The remainder of this paper consists of 4 sections. In Section 2, the background of the deep learning model and the proposed DENS-ECG algorithm for ECG signal delineation are briefly described. The experimental results are presented in Section 3. Section 4 presents the discussion and comparison with other state-of-the-art methods, followed by the conclusions in Section 5.

2 Materials and Methods

2.1 Dataset

In this study, PhysioNet QT database (QTDB) was used to train and validate the performance of the proposed algorithm [41]. There are 105 records in the QTDB and each has a length of 15 minutes with a sampling frequency of 250 Hz. In addition, the MIT-BIH Arrhythmia Database (MITDB) was used to test the model. It contains 48 half-hour ECG recordings, which were sampled at 360 Hz [42]. PyhsioNet’s WFDB python package [15] was used to read the signals and their corresponding annotations.

2.2 Pre-processing

First, each record was filtered and segmented accordingly. The signals were zero-phase filtered using a 3rd order Butterworth band-pass (Hz) filter to remove baseline wanders and high frequency noises [43]. The records were then segmented into smaller chunks of 1000 samples. It should be emphasized that 84 out of 105 records of QTDB were used for training the model and the remaining 26 records were used for evaluating the model.

2.3 Deep Learning Model Structure

As discussed, the model used in this study for ECG signal delineation is a combination of two well-known deep network structures. The first part of the model consists of three 1D convolutional layers, which extracts the high abstract features from ECG segments. The second part of the model includes two deep LSTM layers to process the features extracted by the convolutional layers. Lastly, the output of the second LSTM layer is passed through a dense layer with four neurons, which provide the posterior probabilities corresponding to each of the four classes. It is worth noting that the dense layer is a time distributed layer to keep the continuity of the ECG records.

2.3.1 CNN layer

Unlike traditional neural networks, in which each neuron is connected to every neuron in the adjacent layer, CNNs are able to exploit any existing spatial and temporal patterns in the data [27]. For this purpose, CNNs take advantage of four key attributes, which are: 1) establishing local connections; 2) shared weights; 3) very large number of layers/filters; and 4) reducing the complexity of the network [27]. For example, in 1D CNNs, different filters are defined by sliding a fixed window over the signal. The length of the window used in CNNs for the convolution process is known as the kernel size, denoted as . The outputs of these convolutions (between the filters and specific regions of the input signal) are the neurons in the resulted feature maps as illustrated in Figure 1. The weights of these connections and an overall bias are learned during the training process. It should be noted that there is only one set of weights corresponding to each feature map as shown in Figure 1. This convolution process can be expressed as follows [44]:


where is the activation or the output of the th neuron of the th filter for the th convolutional layer, is the kernel size,

is the neural activation function,

is the shared bias of the th filter, are the shared weights of the th filter, and are the corresponding inputs. The outputs of the neurons, ’s, are the filtered version of the input time series, which learns the same features at different locations as the filter (kernel) slides over the input signal. Applying various filters to the input time series leads to different feature maps in the output of the activation functions. As shown in Figure 1, the number of feature maps are equal to the number of applied filters (), which are each defined by a set of shared weights and a single bias (the biases ’s are not shown in the figure).

Figure 1: Schematic diagram of the convolution process. A kernel of size moves across the input signal. The corresponding weights (

) are fixed for all the convolution operation. The signals in each layer are appropriately zero-padded to keep the dimensions same as in the input layer. For example, if kernel size (


is odd,

zeros are padded to each end of the signal, otherwise the zero-padding size is .

2.3.2 LSTM layer

Recurrent Neural Networks (RNNs) are designed to work with sequential time-series which are capable of learning dependencies in sequential information. However, it has been shown that learning long-term dependencies are very challenging [45]. LSTM networks, a special type of RNNs, are capable of addressing the problem of unstable gradient and can handle long-term dependencies [46]. As depicted in Figure 2, there are three main parts in a LSTM block: (i) forget gate (), (ii) input gate (), and (iii) output gate (). Forget and output gates are mainly responsible to remove or add information to the memory block in the following way:


where is the input sequence to the LSTM at time step , which is actually the output of the last CNN layer here, and is the output sequence at time step . The , , , and

represent the weight vectors and

and are bias terms. These should be learned in the training phase of the LSTM. In addition, since , this controls the contribution of each unit in the memory block. Therefore, the memory is updated as:




Finally, the output vector is computed as:




Here, and are the weight vectors of the output gate, and is the output bias. From (6) and (7), in addition to input and previous output gate, the current memory plays an important role in the output gate. This provides LSTM with the ability to keep or forget the existing memory efficiently [47].

Bidirectional LSTM (BiLSTM) is a variant of the LSTM, which can process a sequence of data in both directions. Unlike LSTM, BiLSTM can also exploit the future context [48]. It consists of two hidden layers, which are fed forward to the output layer [48]. The outputs of BiLSTM are a function of forward and backward pass along with their corresponding weights and biases.

Figure 2: Schematic diagram of LSTM memory block.

2.3.3 Model training

In order to find the optimum network parameters (e.g. weights and biases) and consequently achieve the optimal performance, the network needs to be trained appropriately. It is a non-convex optimisation problem, which can be solved by using a cost function () in an iterative process [44, 49, 50]

. One of the most commonly used cost function for multi-class classification problems is the categorical cross-entropy loss (CCEL), which consists of a softmax and a cross-entropy loss (CEL). The CCEL is utilised to output a probability over the different classes. For this purpose, the class labels are one-hot encoded, which converts all the elements of the label vector into zero except for the true class. The CEL function can be formulated as follows:


where is the computed score from the network corresponding to the true class, is the number of classes, and is non-zero only for the true class. Thus, the CCEL can be calculated as:




2.4 Classification

The output of the LSTM layer, which are the extracted features from ECG signals, is fed into a TimeDistributed dense layer with four neurons with softmax activation functions. The latter ensures a classification per time stamp such that the sum of the neuron outputs is equal to 1, i.e., they can be interpreted as posterior probabilities. The output of the dense layer for the i

sample is classified to one of the four classes (P, QRS, T, or NW) as follows:


where is the predicted class for the i sample and represents the posterior probability.

2.5 Deep ECG Delineation Framework

The proposed deep learning model utilizes the advantages of ensemble learning technique to delineate ECG signals. The flowchart for the proposed DENS-ECG algorithm is illustrated in Figure 3, which is described step by step as follows:

Figure 3: Flowchart of the proposed DENS-ECG algorithm.
  1. Noise reduction: The ECG signals are filtered to remove noise and baseline wanders.

  2. Segmentation: In this step, the ECG signals are segmented into chunks of 1000 samples. These segments are then fed into the model as inputs. It should be noted that the continuity of the time series (ECG signal) within each segment is preserved to make it possible for the network to learn the pattern of different waveforms from each input.

  3. Separate the testing set from a non-testing set: The segmented ECG signals are divided into two sets. The non-testing set is used to train, validate and optimise the P-QRS-T waveforms delineation algorithm and the testing set is considered to evaluate the proposed model. The records in testing and non-testing sets are unique, i.e. no excerpt of the test records is included in the training process of the model.

  4. Cross validation: The model is trained using 5-fold cross validation technique [51]. A stratified 5-fold cross validation (5-fold CV, Figure 4) is used, where the distribution of samples in each fold is proportional to the size of the corresponding classes in the whole dataset. According to [52]

    , this method of cross validation leads to a more reliable performance in terms of bias and variance compared to traditional cross validation techniques.

    Figure 4: Schematic diagram of a stratified 5-fold cross validation technique. The training and validation sets are split according to the size of the four classes.
  5. Creating the model: In total, the model has eight layers, which includes the input layer, three 1D convolutional layers followed by two BiLSTM and a dropout layer. Finally, a time distributed wrapper is used for the dense layer, which configures the BiLSTM layer for the sequence prediction. The input segments are fed directly into three successive convolutional layers, which extract the temporal patterns (features) from the ECG signals. A kernel size of is applied in the three convolutional layers and the corresponding number of filters () are respectively 32, 64, and 128 for the three successive layers. In order to keep the same dimension in the input and convolution layers, zero padding is employed. For example, the output of the first convolutional layer is now a sequence of 32 features with the same dimension of the input signal (time series). This process is repeated for the other two layers. The output of the last convolutional layer is 128 highly abstract feature maps, which are used as inputs for the first BiLSTM layer with hidden units. The second BiLSTM layer has hidden units. The dropout probability in the dropout layer is set to 0.2. The dropout layer helps to avoid over-fitting problem during the training of the network. The dense layer has 4 hidden units and a softmax function is used as an activation function, which assigns a value between 0 and 1 to each sample of the input ECG signals.

  6. Model training and optimisation: The model is trained using Adam optimisation algorithm [49], which is different from the steepest gradient descent (SGD) optimisation algorithm. Adam is used for solving non-convex optimisation problems and is well-suited for large scale network. It has four hyper-parameters, which require to be fine-tuned; 1) the learning rate,

    , 2) the exponential decay rate for the first moment estimates,

    , 3) the exponential decay rate for the second-moment estimates, , and 4) the numerical stability parameter, . A random search technique is used to find the optimum values of these parameters [53]

    . This approach is shown to be more efficient compared to the grid search method for hyperparameters optimisation. The proposed DENS-ECG model contains 1,416,044 trainable parameters (weights and biases), which need to be optimised through the learning process. The model is implemented in Python 3.6.4 using

    keras API [54]. Figures 4(a) and 4(b)

    show the loss and accuracy of the model during the training phase, respectively. Moreover, an early stopping technique is used to prevent over-fitting of the model. The model stops training if there is no decrease in the value of validation loss after three epochs. Additionally, the dropout process is only applied during the training phase of the model. Therefore, the training becomes more challenging for the network which in turn alleviate over-fitting problem. As illustrated in Figures

    4(a) and 4(b), the model achieves higher performance on the validation set than train data, which shows that the model is trained properly without over-fitting.

    Figure 5: Training and validation curves of the model for: (a) loss and (b) accuracy.
  7. Evaluate the trained model: The trained model is then evaluated on the 26 unseen test records from QTDB dataset to examine the performance of the classifier. In addition, the model is tested on the unseen MITDB dataset for QRS detection.

3 Results

In this study, we used the QTDB from PhysioNet to develop the model [41]. In total, there are 105 ECG records of which 84 records (80%) are used for training/validation and the remaining 26 records (20%) for testing the model. In addition, the robustness of the proposed model is examined using a different dataset, the MITDB. Since there is only QRS peak annotation available for MITDB, the performance of the proposed DENS-ECG model is reported on detecting the QRS complexes for this dataset.

3.1 Classification Performance Metrics

The key factor in evaluating the performance of any classification system is the capability of the developed model in correctly classifying the new examples. Traditionally, the classification performance of binary problems can be interpreted in a confusion matrix as illustrated in Table

1, which can easily be extended to multi-class problems, as in our case. One of the most commonly used measures to report the performance of classification algorithms is the average accuracy, which can be calculated as:


However, in order to report the performance of classifiers on imbalanced datasets, other well-known metrics are used, which can be derived from Table 1 and are formulated as:


When , the measure in Eq.(15

) is called the balanced F-score (F1-score), which takes both

Precision and Sensitivity into account equally.

Predicted positive Predicted negative
Actual positive True positive (TP) False negative (FN)
Actual negative False positive (FP) True negative (TN)
Table 1: Confusion matrix.

Another commonly used qualitative and quantitative metric is the receiver operating characteristics (ROC), which is defined as the ratio between TP rate (Eq. (14)) and FP rate () [55]. This method graphically visualises the trade-off between TP rate and FP rate. In case of multi-class classification, each curve actually evaluate the target class versus all the other classes. Beside the curves corresponding to each class, macro- and micro-average curves can also be plotted. The macro- and micro-average of precision and sensitivity are computed as follows [56]:


where is the number of classes as defined earlier.

Furthermore, the area under the curve (AUC) of the ROC curves can be calculated as a scalar metric to evaluate the classification performance. The higher the AUC value, the better the classifier is.

In the following section, the performance of the proposed DENS-ECG model is evaluated using the defined classification metrics. It should be noted that F1-score, Precision (P+), and Sensitivity (Se) are appropriate metrics for reporting the performance of classification models on imbalanced datasets. The comparison of the proposed method with two other deep learning scenarios is also reported in Section 4.1.

3.2 QRS detection results

The QRS detection performance of the proposed DENS-ECG model on MITDB and QTDB databases are reported in Table 2. It should be noted that the results in this table are corresponding to the unseen 26 records of QTDB and the whole MITDB, which were not used during the training of the model. The proposed model achieves 99.61%, 99.52%, and 99.56% in Se, P+, and F1-score, respectively, on the well-known MITDB database for QRS detection. The model is also performed well on QTDB for QRS detection. As shown in Table 2, filtering the input signals improves the classification performance substantially. For example, the F1-score has been increased by more than 4% and 7% on MITDB and QTDB databases, respectively. In addition, the high classification performance on MITDB shows that the proposed model is well generalized, which can be used on different datasets in practice regardless of filtering technique and model parameters. The performance of DENS-ECG model is comparable with other published algorithms (Table 5) and the detailed discussion will be given in Section 4.2.

Metrics Se P+ F1-score Se P+ F1-score
Raw signal 96.81 92.01 95.75 85.54 99.24 91.89
Filtered signal 99.61 99.52 99.56 99.7 99.19 99.45
Table 2: QRS detection performance of the proposed DENS-ECG model for the MITDB and QTDB databases on the test set with and without filtering of the input signals.

3.3 ECG delineation results

The waveforms delineation performance of DENS-ECG model on QTDB database is given in Table 3. The average performance of the model for detecting start, peak, and end waveforms of each four classes are reported in this table. Overall, the model performs the best on QRS detection followed by T-wave and P-wave, respectively. For example, the precision of T-wave is 5% higher than the P-wave. However, the model sensitivity for P and T-waves detection are comparable to each other, which are more than 96.5% with slightly more favourable for T-wave detection. The F1 score, which represents both Se and P+, for P, QRS, T, and NW classes equals 93.01%, 99.45%, 96.12%, and 98.55%, respectively.

As shown in Table 3, filtering the input signals improves DENS-ECG performance substantially. For example, the sensitivity of the model has been increased by around 20%, 14%, 13%, and 14% for P, QRS, T, and NW detection. In contrast, the improvement of model precision is not as high as sensitivity when applying filtering to the input signals.

Metrics Se P+ F1-score
Raw signal 76.80 85.54 82.40 84.39 87.83 96.24 91.43 95.11 81.95 90.58 86.68 89.43
Filtered signal 96.53 99.7 96.81 98.75 89.74 99.19 95.44 98.36 93.01 99.45 96.12 98.55
Table 3: Average performance of the proposed DENS-ECG model on the test set for waveform delineation of four classes with and without filtering of the input signals.

The confusion matrices of the DENS-ECG model on the 5-fold CV and test set are shown in Figures 5(a) and 5(b). These also confirms that the proposed DENS-ECG model performs better on QRS detection compared to other three classes. Most of the incorrect cases in all three classes (P-wave, QRS, and T-wave) are classified into NW class. In other words, the model does not make incorrect classification between the three main classes (P-wave, QRS, and T-wave). As an example, 6.2%, 3.9% and 8.5% of P-wave, QRS, and T-wave classes are classified into NW class incorrectly on the test set, respectively. In addition, the small difference between 5-fold CV and test results shows that the model has been trained properly, which does not suffer from over-fitting problem.

Figure 6: Classification performance using confusion matrices for: (a) 5-fold CV and (b) test set of the proposed DENS-ECG algorithm. The numbers are in percentage.

The ROC curve of the DENS-ECG model on the 5-fold CV and test set are plotted in Figures 6(a) and 6(b). As shown in the zoomed areas, the model has the highest AUC on the QRS class compared to other classes. The AUC for the micro and macro-average are 0.992 and 0.99 on the test set, respectively, showing the promising classification performance for the DENS-ECG algorithm.

Figure 7: ROC curves (solid lines) for four classes using the proposed deep model (DENS-ECG) together with the curves for the micro- and macro-average ROC (dashed lines). (a) 5-fold CV and (b) test set.

As an example, Figures 7(a) and 7(b) show excerpts of DENS-ECG model predictions and its corresponding true annotations (labels). The developed algorithm performance for detecting P, QRS, T, and NW segments confirms the capability of the the deep model on delineation of these waveforms. Figure 7(a) shows an example of a nearly perfect classification in which all the four waveforms are classified correctly corresponding to the true labels. In contrast, as depicted in Figure 7(b), in the first T-wave segment (blue strip), there are some FN predictions. For example, the prediction of the first T wave segment shows a narrow FN detection, which is followed by a correct classification. There is also a similar pattern and some FP predictions for the second P wave segment (red strip). Furthermore, there are some FN predictions for the third P wave segment. In addition, at the beginning of the prediction, there is a FP prediction for a T wave segment.

Figure 8: Two excerpts of DENS-ECG model predictions with the corresponding detected P, QRS, T, and NW segments. The true labels (annotations) of the signals are also plotted to compare the results of the classification. (a) A prediction example with high classification rate, (b) A prediction example with some FPs and FNs.

4 Discussion

The proposed DENS-ECG model achieves a good performance on the MITDB and QTDB databases. In this section, the performance of DENS-ECG model is compared with other deep learning models as well as state-of-the-art algorithms for ECG signal delineation.

4.1 Comparison of DENS-ECG with other deep learning approaches

In this section, the performance of the proposed DENS-ECG method is compared with other deep models with different number of layers and architectures. This includes models with various number of CNN and BiLSTM layers as well as an end-to-end CNN and BiLSTM model. In all the models, the same parameters and optimisation method as utilised in DENS-ECG are used. As given in Table 4, different combinations of convolutional and BiLSTM layers are examined and their corresponding performance are reported in order to compare with DENS-ECG model, which consists of three convolutional and two BiLSTM layers. It should be noted that the combination of CNN and BiLSTM layers for DENS-ECG model was inspired from our previous research work on the detection of AFIB [8]. Unlike DENS-ECG model, which uses convolutional and BiLSTM layers, the end-to-end CNN and LSTM models only use convolutional and BiLSTM layers, respectively, to extract the features and classify the waveforms.

The end-to-end BiLSTM model consists of four layers, which are two BiLSTM layers followed by dropout and a time distributed dense layer. The number of hidden units of BiLSTM layers are the same as DENS-ECG model, which are 250 and 125, respectively. In addition, the dropout value is equal to 0.2 and there are four hidden units in the dense layer. The end-to-end CNN models consists of three convolutional layers followed by a time distributed dense layer. The number of filters in each layer are the same as DENS-ECG model, which are 32, 64, and 128, respectively.

As reported in Table 4, the average F1-score (last column) for all four classes show the better performance of DENS-ECG model with three convolutional and two BiLSTM layers. Although, the performance of the model with two convolutional and two BiLSTM layers is comparable with DENS-ECG model, adding one more convolutional layer leads to the highest possible performance of the model without increasing noticeable complexity to the model. The end-to-end CNN model has the lowest performance, which achieved 71.83% on average F1-score. On the other hand, the end-to-end BiLSTM model shows better performance than end-to-end CNN model but yet lower than other models by around 4% in average F1-score. Table 4 shows the effectiveness of the proposed DENS-ECG model to extract high abstract temporal features from the ECG signals in order to classify different waveforms.

Metrics Se F1-score Avg. F1-score
1 CNN + 1 BiLSTM 95.79 99.69 97.37 98.48 92.68 99.44 94.37 98.21 96.18
2 CNN + 1 BiLSTM 94.34 99.68 99.51 97.83 92.05 99.45 95.43 98.12 96.26
3 CNN + 1 BiLSTM 95.08 99.65 96.42 98.04 92.35 99.37 95.02 97.97 96.18
2 CNN + 2 BiLSTM 95.48 99.69 96.58 98.12 92.64 99.48 96.44 98.31 96.72
3 CNN 46.54 94.13 85.51 90.26 46.77 88.71 67.74 84.09 71.83
2 BiLSTM 88.12 96.37 91.39 95.04 87.30 95.24 92.62 94.62 92.45
DENS-ECG 96.53 99.70 96.81 98.75 93.01 99.45 96.12 98.55 96.78
Table 4: Comparison of the classification performance on the test set among DENS-ECG and other deep models architectures.
Methods Parameters P P P QRS QRS T T
[24] # beats 3194 3194 3194 3623 3623 3542 3542
Se 98.87 98.87 98.75 99.97 99.97 99.77 99.77
P+ 91.03 91.03 91.03 N/A N/A 97.79 97.79
[57] # beats 3194 3194 3194 3623 3623 3542 3542
Se N/A N/A N/A N/A N/A 92.6 92.6
P+ N/A N/A N/A N/A N/A N/R N/R
[25] # beats 3194 3194 3194 3623 3623 3542 3542
Se 91.2 91.2 91.2 N/R N/R 93.6 93.6
P+ N/R N/R N/R N/A N/A N/R N/R
DENS-ECG # beats 761 761 761 851 851 834 834
Se 95.49 97.69 96.41 99.75 99.36 97.71 95.87
P+ 88.77 90.84 89.06 N/A N/A 96.51 94.43
Table 5: Comparison of P, QRS, and T waves detection on QTDB dataset between DENS-ECG and other state-of-the-art methods. (N/R: Not Reported, N/A: Not Applicable)
Methods # beats TP FP FN Err. (%) Se P+
[24] 109428 109208 153 220 0.34 99.80 99.86
[58] 109481 109146 137 135 0.43 99.69 99.88
[59] 109809 109532 507 277 0.71 99.75 99.54
[60] 109963 109522 545 441 0.90 99.60 99.50
DENS-ECG 109494 109066 525 428 0.87 99.61 99.52
Table 6: Comparison of QRS detection on MITDB dataset among DENS-ECG and other state-of-the-art methods.

4.2 Comparison of DENS-ECG with other state-of-the-art methods

The proposed DENS-ECG model is evaluated on the test set (26 records) of QTDB database for ECG waveform delineation. The model is also compared with other ECG delineation algorithms as given in Table 5.

It should be taken into account that the number of heartbeats (annotations) used for evaluating DENS-ECG in Table 5 is less than other methods since only 26 out of 105 records are used as test set. Although, the performance of DENS-ECG model is not as high as the algorithm reported in [24], it outperforms other algorithms published in [57] and [25] in delineation of P- and T-wave. As it can be seen from Table 5, the capability of DENS-ECG model in QRSon and QRSend detection is comparable with other models with sensitivity equals to 99.75% and 99.36% for QRSon and QRSend, respectively. While the first three models in Table 5 outperform DENS-ECG especially in Pon, Pend, and Tend detection.

However, the proposed DENS-ECG model can be considered as a fully automated data-driven method, which does need minimum parameters tuning to be used in practice. Furthermore, the model is generalized enough to be applied on different datasets in practice. As reported in Section 3.2 and Table 6, the model performance on MITDB dataset shows its capability in analyzing unseen datasets. In terms of required preprocessing step in DENS-ECG model, it should be also noted that the filtering of MITDB dataset is identical to what was used for training the model on QTDB dataset.

As shown in Table 6, the results of DENS-ECG model in QRS detection is comparable with other algorithms (Se=99.61% and P+=99.52%). The wavelet-based model introduced by [24] has the highest performance for both sensitivity and precision (Se=99.8% and P+=99.86%) compared to other algorithms in Table 6 followed by the model proposed by [58]. From Table 6, the performance of proposed DENS-ECG model is comparable with the well-known [59] algorithm for QRS detection. Furthermore, the model performs slightly better than QRS detection algorithms proposed in [60].

5 Conclusion

One of the most challenging tasks in ECG waveform delineation has been the detection of P, QRS, and T waves. In this paper, a deep learning approach, named as the DENS-ECG, that combines the CNN-LSTM networks was proposed to predict the ECG waveforms. The laborious feature extraction step was omitted and the filtered ECG segments were directly used as inputs for training the model. The deep CNN-LSTM network was utilised to extract highly abstract temporal features from 1D ECG signals. These features were then used to classify four different classes (P, QRS, T and NW). The model was trained using stratified 5-fold cross validation technique. Finally, the trained model was tested on a completely unseen test sets to evaluate the performance of the classification algorithms. The outputs of the model at each time stamp were the posterior probabilities assigned to the four classes. The proposed model shows a high performance on the test sets with an average F1-score of 99.56 and 96.78 on the MITDB and QTDB datasets, respectively. The efficacy of the proposed DENS-ECG model in detecting ECG waveforms provides us with the opportunity to use this algorithm in house by cardiologists to analyze ECG recordings in order to diagnose cardiac arrhythmias such as AFIB.


The authors would like to thank the support given by the Innovation Fund Denmark (IFD) under Project No. 6153-00009B.


  • [1] Yu-ki Iwasaki, Kunihiro Nishida, Takeshi Kato, and Stanley Nattel. Atrial fibrillation pathophysiology: implications for management. Circulation, 124(20):2264–2274, 2011.
  • [2] Vias Markides and Richard J Schilling. Atrial fibrillation: classification, pathophysiology, mechanisms and drug treatment. Heart, 89(8):939–943, 2003.
  • [3] Ricardo Couceiro, Paulo Carvalho, Jorge Henriques, Manuel Antunes, Matthew Harris, and Jörg Habetha. Detection of atrial fibrillation using model-based ECG analysis. In

    2008 19th International Conference on Pattern Recognition

    , pages 1–5. IEEE, 2008.
  • [4] Masatake Fukunami, Takahisa Yamada, Masaharu Ohmori, Kazuaki Kumagai, Kiyoshi Umemoto, Akihiko Sakai, Nobuhiko Kondoh, Tetsuo Minamino, and Noritake Hoki. Detection of patients at risk for paroxysmal atrial fibrillation during sinus rhythm by P wave-triggered signal-averaged electrocardiogram. Circulation, 83(1):162–169, 1991.
  • [5] K Tateno and L Glass. Automatic detection of atrial fibrillation using the coefficient of variation and density histograms of RR and RR intervals. Medical and Biological Engineering and Computing, 39(6):664–671, 2001.
  • [6] Chao Huang, Shuming Ye, Hang Chen, Dingli Li, Fangtian He, and Yuewen Tu. A novel method for detection of the transition between atrial fibrillation and sinus rhythm. IEEE Transactions on Biomedical Engineering, 58(4):1113–1119, 2010.
  • [7] Abdolrahman Peimankar and Sadasivan Puthusserypady. Ensemble learning for detection of short episodes of atrial fibrillation. In 2018 26th European Signal Processing Conference (EUSIPCO), pages 66–70. IEEE, 2018.
  • [8] Rasmus S Andersen, Abdolrahman Peimankar, and Sadasivan Puthusserypady. A deep learning approach for real-time detection of atrial fibrillation. Expert Systems with Applications, 115:465–473, 2019.
  • [9] Felipe Alonso-Atienza, Jose Luis Rojo-Alvarez, Alfredo Rosado-Munoz, Juan J Vinagre, Arcadi Garcia-Alberola, and Gustavo Camps-Valls. Feature selection using support vector machines and bootstrap methods for ventricular fibrillation detection. Expert Systems with Applications, 39(2):1956–1967, 2012.
  • [10] Yuki Hagiwara, Hamido Fujita, Shu Lih Oh, Jen Hong Tan, Ru San Tan, Edward J Ciaccio, and U Rajendra Acharya. Computer-aided diagnosis of atrial fibrillation based on ecg signals: a review. Information Sciences, 467:99–114, 2018.
  • [11] Aya F. Khalaf, Mohamed I. Owis, and Inas A. Yassine. A novel technique for cardiac arrhythmia classification using spectral correlation and support vector machines. Expert Systems with Applications, 42(21):8361 – 8368, 2015.
  • [12] Rahime Ceylan, Yüksel Özbay, and Bekir Karlik. A novel approach for classification of ECG arrhythmias: Type-2 fuzzy clustering neural network. Expert Systems with Applications, 36(3, Part 2):6721 – 6726, 2009.
  • [13] Eduardo José Da S Luz, Thiago M Nunes, Victor Hugo C De Albuquerque, Joao P Papa, and David Menotti. Ecg arrhythmia classification based on optimum-path forest. Expert Systems with Applications, 40(9):3561–3573, 2013.
  • [14] M.R. Homaeinezhad, S.A. Atyabi, E. Tavakkoli, H.N. Toosi, A. Ghaffari, and R. Ebrahimpour.

    Ecg arrhythmia recognition via a neuro-svm–knn hybrid classifier with virtual QRS image-based geometrical features.

    Expert Systems with Applications, 39(2):2047 – 2058, 2012.
  • [15] Ary L Goldberger, Luis AN Amaral, Leon Glass, Jeffrey M Hausdorff, Plamen Ch Ivanov, Roger G Mark, Joseph E Mietus, George B Moody, Chung-Kang Peng, and H Eugene Stanley. Physiobank, Physiotoolkit, and PhysioNet: components of a new research resource for complex physiologic signals. circulation, 101(23):e215–e220, 2000.
  • [16] Philip De Chazal, Maria O’Dwyer, and Richard B Reilly. Automatic classification of heartbeats using ECG morphology and heartbeat interval features. IEEE transactions on biomedical engineering, 51(7):1196–1206, 2004.
  • [17] Tanis Mar, Sebastian Zaunseder, Juan Pablo Martínez, Mariano Llamedo, and Rüdiger Poll. Optimization of ECG classification by means of feature selection. IEEE transactions on Biomedical Engineering, 58(8):2168–2177, 2011.
  • [18] Can Ye, BVK Vijaya Kumar, and Miguel Tavares Coimbra. An automatic subject-adaptable heartbeat classifier based on multiview learning. IEEE journal of biomedical and health informatics, 20(6):1485–1492, 2015.
  • [19] Or Perlman, Amos Katz, Guy Amit, and Yaniv Zigel. Supraventricular tachycardia classification in the 12-lead ECG using atrial waves detection and a clinically based tree scheme. IEEE journal of biomedical and health informatics, 20(6):1513–1520, 2015.
  • [20] Heba Khamis, Jiayu Chen, J Stephen Redmond, and Nigel H Lovell. Detection of atrial fibrillation from RR intervals and PQRST morphology using a neural network ensemble. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pages 5998–6001. IEEE, 2018.
  • [21] Paweł Pławiak. Novel methodology of cardiac health recognition based on ECG signals and evolutionary-neural system. Expert Systems with Applications, 92:334–349, 2018.
  • [22] Roshan Joy Martis, U. Rajendra Acharya, K.M. Mandana, A.K. Ray, and Chandan Chakraborty.

    Application of principal component analysis to ECG signals for automated diagnosis of cardiac health.

    Expert Systems with Applications, 39(14):11792 – 11800, 2012.
  • [23] Hamid Khorrami and Majid Moavenian. A comparative study of dwt, cwt and dct transformations in ECG arrhythmias classification. Expert Systems with Applications, 37(8):5751 – 5757, 2010.
  • [24] Juan Pablo Martínez, Rute Almeida, Salvador Olmos, Ana Paula Rocha, and Pablo Laguna. A wavelet-based ECG delineator: evaluation on standard databases. IEEE Transactions on biomedical engineering, 51(4):570–581, 2004.
  • [25] Rémi Dubois, Pierre Maison-Blanche, Brigitte Quenet, and Gérard Dreyfus. Automatic ECG wave extraction in long-term recordings using Gaussian mesa function models and nonlinear probability estimators. computer methods and programs in biomedicine, 88(3):217–233, 2007.
  • [26] Chao Lin, Corinne Mailhes, and Jean-Yves Tourneret. P-and T-wave delineation in ECG signals using a bayesian approach and a partially collapsed gibbs sampler. IEEE transactions on biomedical engineering, 57(12):2840–2849, 2010.
  • [27] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 521(7553):436–444, 2015.
  • [28] Michael A Nielsen. Neural networks and deep learning, volume 2018. Determination press San Francisco, CA, USA:, 2015.
  • [29] Bahareh Pourbabaee, Mehrsan Javan Roshtkhari, and Khashayar Khorasani. Deep convolutional neural networks and learning ECG features for screening paroxysmal atrial fibrillation patients. IEEE Transactions on Systems, Man, and Cybernetics: Systems, 48(12):2095–2104, 2018.
  • [30] Mihaela Porumb, Saverio Stranges, Antonio Pescapè, and Leandro Pecchia.

    Precision medicine and artificial intelligence: A pilot study on deep learning for hypoglycemic events detection based on ECG.

    Scientific Reports, 10(1):1–16, 2020.
  • [31] U Rajendra Acharya, Hamido Fujita, Shu Lih Oh, Yuki Hagiwara, Jen Hong Tan, and Muhammad Adam. Application of deep convolutional neural network for automated detection of myocardial infarction using ECG signals. Information Sciences, 415:190–198, 2017.
  • [32] Abdolrahman Peimankar and Sadasivan Puthusserypady. An ensemble of deep recurrent neural networks for P-wave detection in electrocardiogram. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1284–1288. IEEE, 2019.
  • [33] Huy Phan, Fernando Andreotti, Navin Cooray, Oliver Y Chén, and Maarten De Vos. Joint classification and prediction CNN framework for automatic sleep stage classification. IEEE Transactions on Biomedical Engineering, 66(5):1285–1296, 2018.
  • [34] GB Kshirsagar and ND Londhe. Improving performance of devanagari script input-based P300 speller using deep learning. IEEE Transactions on Biomedical Engineering, 66(11):2992–3005, 2018.
  • [35] Hauke Dose, Jakob S Møller, Helle K Iversen, and Sadasivan Puthusserypady. An end-to-end deep learning approach to MI-EEG signal classification for BCIs. Expert Systems with Applications, 114:532–542, 2018.
  • [36] Krzysztof J Geras, Abdel-rahman Mohamed, Rich Caruana, Gregor Urban, Shengjie Wang, Ozlem Aslan, Matthai Philipose, Matthew Richardson, and Charles Sutton. Blending LSTMs into CNNs. arXiv preprint arXiv:1511.06433, 2015.
  • [37] SHI Xingjian, Zhourong Chen, Hao Wang, Dit-Yan Yeung, Wai-Kin Wong, and Wang-chun Woo. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Advances in neural information processing systems, pages 802–810, 2015.
  • [38] Tara N Sainath, Oriol Vinyals, Andrew Senior, and Haşim Sak. Convolutional, long short-term memory, fully connected deep neural networks. In 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 4580–4584. IEEE, 2015.
  • [39] Niklas Christoffer Petersen, Filipe Rodrigues, and Francisco Camara Pereira. Multi-output bus travel time prediction with convolutional LSTM neural network. Expert Systems with Applications, 120:426–435, 2019.
  • [40] Baoguang Shi, Xiang Bai, and Cong Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE transactions on pattern analysis and machine intelligence, 39(11):2298–2304, 2016.
  • [41] Pablo Laguna, Roger G Mark, A Goldberg, and George B Moody. A database for evaluation of algorithms for measurement of qt and other waveform intervals in the ECG. In Computers in cardiology 1997, pages 673–676. IEEE, 1997.
  • [42] George B Moody and Roger G Mark. The impact of the MIT-BIH arrhythmia database. IEEE Engineering in Medicine and Biology Magazine, 20(3):45–50, 2001.
  • [43] Lawrence J Christiano and Terry J Fitzgerald. The band pass filter. international economic review, 44(2):435–465, 2003.
  • [44] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. MIT press, 2016.
  • [45] Yoshua Bengio, Patrice Simard, and Paolo Frasconi. Learning long-term dependencies with gradient descent is difficult. IEEE transactions on neural networks, 5(2):157–166, 1994.
  • [46] Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735–1780, 1997.
  • [47] Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014.
  • [48] Alex Graves and Navdeep Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In International conference on machine learning, pages 1764–1772, 2014.
  • [49] Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.
  • [50] Yann N Dauphin, Razvan Pascanu, Caglar Gulcehre, Kyunghyun Cho, Surya Ganguli, and Yoshua Bengio. Identifying and attacking the saddle point problem in high-dimensional non-convex optimization. In Advances in neural information processing systems, pages 2933–2941, 2014.
  • [51] Jerome Friedman, Trevor Hastie, and Robert Tibshirani. The elements of statistical learning, volume 1. Springer series in statistics New York, 2001.
  • [52] Ron Kohavi. A study of cross-validation and bootstrap for accuracy estimation and model selection. In 14th International Joint Conference on Artificial Intelligence (IJCAI), volume 2, pages 1137–1143, 1995.
  • [53] James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. Journal of machine learning research, 13(Feb):281–305, 2012.
  • [54] François Chollet et al. Keras., 2015.
  • [55] Tom Fawcett. An introduction to ROC analysis. Pattern recognition letters, 27(8):861–874, 2006.
  • [56] Ricardo Baeza-Yates, Berthier Ribeiro-Neto, et al. Modern information retrieval, volume 463. ACM press New York, 1999.
  • [57] José Antonio Vila, Yi Gang, Jesús María Rodríguez Presedo, Manuel Fernández-Delgado, Senén Barro, and Marek Malik. A new approach for TU complex characterization. IEEE Transactions on Biomedical Engineering, 47(6):764–772, 2000.
  • [58] Jinkwon Kim and Hangsik Shin. Simple and robust realtime QRS detection algorithm based on spatiotemporal characteristic of the qrs complex. PloS one, 11(3), 2016.
  • [59] Jiapu Pan and Willis J Tompkins. A real-time QRS detection algorithm. IEEE transactions on biomedical engineering, (3):230–236, 1985.
  • [60] Riccardo Poli, Stefano Cagnoni, and Guido Valli. Genetic design of optimum linear and nonlinear QRS detectors. IEEE Transactions on Biomedical Engineering, 42(11):1137–1141, 1995.