A Graph-constrained Changepoint Detection Approach for ECG Segmentation

04/24/2020 ∙ by Atiyeh Fotoohinasab, et al. ∙ 0

Electrocardiogram (ECG) signal is the most commonly used non-invasive tool in the assessment of cardiovascular diseases. Segmentation of the ECG signal to locate its constitutive waves, in particular the R-peaks, is a key step in ECG processing and analysis. Over the years, several segmentation and QRS complex detection algorithms have been proposed with different features; however, their performance highly depends on applying preprocessing steps which makes them unreliable in real-time data analysis of ambulatory care settings and remote monitoring systems, where the collected data is highly noisy. Moreover, some issues still remain with the current algorithms in regard to the diverse morphological categories for the ECG signal and their high computation cost. In this paper, we introduce a novel graph-based optimal changepoint detection (GCCD) method for reliable detection of R-peak positions without employing any preprocessing step. The proposed model guarantees to compute the globally optimal changepoint detection solution. It is also generic in nature and can be applied to other time-series biomedical signals. Based on the MIT-BIH arrhythmia (MIT-BIH-AR) database, the proposed method achieves overall sensitivity Sen = 99.76, positive predictivity PPR = 99.68, and detection error rate DER = 0.55 which are comparable to other state-of-the-art approaches.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

The electrocardiogram (ECG) is a quasi-periodical, rhythmically repeating biomedical signal that includes information about the electrical activities of cardiac muscle. One cycle of ECG, called a heartbeat, is delineated by arrangements of P, QRS-complex and T waveforms as well as PQ and ST segments. Out of all the waveforms, the QRS-complex is the most striking waveform as it represents the ventricular depolarization and reflects the major part of the electrical activity of the heart. Precisely detecting the location of R-peak as the highest and the only positive peak within the QRS complex plays a critical role in obtaining the morphology of the QRS-complex. Besides, the localization of R-peak serves as the basis for the automated determination of the heart rate and thereby, it is a significant criterion for discerning any heart abnormality. Not only heart diseases but also many other diseases can be diagnosed in a non-invasive way using R-peak detection due to the relationship between heart rate variability (HRV) and several physiological systems (vasomotor, respiratory, central nervous, thermoregulatory, etc.). Revealing R-peaks is a simple task in noise-free ECG signals; however, the challenge emerges when the ECG signal is corrupted by noise and artifacts.

Several methods have been proposed to detect different waves in the ECG signal. However, the majority of these methods still suffer from several drawbacks such as misdetection of ECG waveforms with determinant morphological patterns in some life-threatening heart arrhythmias, or the high complexity of the algorithm that is a barrier for online processing or long-recording analysis. Typically, current methods consist of two main steps; pre-processing, and detection. In the pre-processing step, the algorithm attempts to eliminate the noise and artifacts, and highlight the relevant sections of the ECG [1, 19]

. In the second step, most of the methods locate R-peaks using the result from the pre-processing step in the first place, then other waves are detected by defining a set of heuristic rules

[9]. However, in real-time data processing of ambulatory care settings and remote monitoring systems, where the collected data is highly noisy, preprocessing-based algorithms are less effective. Besides, the performance of most detection algorithms depends highly on the used dataset, and also, incorrect detection of R-peak leads to the incorrect identification of other waves.

Most of the state-of-the-art methods in the field of ECG waves detection are either based on the wavelet transform, Hidden Markov models (HMM), and simple mathematical operations such as differentiation, integration, and squaring. Wavelet transform is a common approach to capture non-stationary behaviors of the ECG signal

[17]. However, the caveat with the Wavelet transform is the difficulty in determining a proper mother wavelet derived from the various shape of the QRS-complex and finding the required threshold in the detection step. More importantly, discrete wavelet transforms (DWT) often fail to derive reliable results from short-recordings of the signal. Algorithms stem from simple mathematical operations are computationally efficient, and thereby ideal for real-time analysis of the large data sets. Pan and Tompkins [16] algorithm and its modifications [5]

are the most popular techniques in this category. However, their high performance is guaranteed in case of the ECG signal is weakly disturbed by the noise. Hidden Markov models (HMMs) are also graphical models that have highly drawn attention in the task of ECG segmentation in order to consider the temporal dependency among the waveforms. In addition to the aforementioned methods, some studies addressed the problem of ECG delineation using deep learning. Although deep learning-based algorithms demonstrated high performance in the classification problems, they rely on large scale datasets for training the algorithm and suffer from the imbalanced class problem

[15, 12, 14, 13, 11, 18].

In this work, we leverage a graph-constrained changepoint detection (GCCD) model introduced by Hocking et al. in [7] in order to detect R-peak positions in a fast and effective way. Hocking et al. [7] proposed an algorithm that can solve changepoint detection problems with graph constraints on segment’s means with log-linear complexity in the amount of data, thereby has high efficiency on huge data sets. Their model can represent a time series signal as a set of constant value segments of varying lengths, as well as their precise locations over space or time. They have shown that their model has high accuracy in detecting abrupt changes in genomic data such as DNA copy number profiles [8] and ChIP-sequencing [6]. In this paper, we indicate that this model not only is applicable for a periodic signal but also can outperform other state-of-the-art methods concerned with ECG delineation.

There are a few studies in the literature that have exploited changepoint detection models to find abrupt changes in heart rhythm. However, up to our best knowledge, this is the first study where a changepoint detection model is used to segment ECG waveforms. One of the key contributions of this work is using graphs to some extent employ biological prior knowledge of the signal into the model and lessen the complexity of the model. The proposed method does not need a pre-processing step since it leverages the sparsity of changepoints to denoising the signal as well as detecting abrupt changes. As a result, it can be more effective in real-time processing of noisy data. Besides, GCCD takes the whole cardiac beat into account to detect and study each wave separately, as opposed to most of the current methods which use the location of R-wave to detect other waves.

The rest of the paper is organized as follows. In the next section, we describe the proposed Graph-Constrained Changepoint Detection model and its application in localization of R-peak. Section III provides a description of the dataset used in this study and a discussion of the results as well as a comparison between the performance of the proposed algorithm and other state-of-the-art algorithms. Finally, IV summarizes this research work and its contribution.

Ii Methodology

The proposed method treats the problem of ECG waveforms delineation as a changepoint detection problem over a non-stationary ECG signal. It represents the periodic ECG signal as piece-wise locally stationary pieces so as each piece is the mean of one segment of data points. The model takes a raw ECG signal and a predefined graph, which encodes expected changepoints from the signal, as inputs and yields the onset/offset and the mean of desired segments. Figure 1 illustrates an overview of the proposed algorithm in detecting waveforms of a normal ECG. In the next sections, we will briefly describe the GCCD model.

Fig. 1: An overview of the Graph-Constrained Changepoint Detection (GCCD) model. The GCCD model takes a constraint graph and a raw ECG signal as inputs, and then, detects segments corresponding to the nodes of the constraint graph at the output. The up/down arrows above each edge represents the expected up/down change in the segment means, respectively. In this model, the proposed constraint graph has a vertex for each main waveform (i.e. ) as well as Baselines (-), which are intermediate states.

Ii-a Graph-Constrained Changepoint Detection Model

Let us assume a constrained changepoint model represented with a graph , where the vertices represent the hidden states or segments (not necessarily a waveform), and the directed edges represent the possible changes between the states or segments. Each edge encodes the following associated data based on the prior knowledge of expected sequences of changes:

  • The source and target vertices/states for a changepoint from to .

  • A non-negative penalty constant which is the cost of changepoint .

  • A constraint function which defines the possible mean values before and after each changepoint . If is the mean before the changepoint and is the mean after the changepoint, then the constraint is . These functions can be used to constrain the direction (up or down) and/or the magnitude of the change (greater/less than a certain amount).

Mathematically, given the input signal and the graph , finding changepoints  , segment means , and hidden states can be executed by solving the following optimization problem:

s. t (2)

The above equation consists of a data-fitting term involving the negative log-likelihood of each data point, and a model complexity term involving the cost of each changepoint . More specifically, the positive penalty parameter regularizes the number of predicted changepoints/segments by the model. A larger penalty

yields a changepoint vector

which is more sparse, or more specifically, reduces the number of changepoints. The constraint function also encodes the expected up/down change and the least amplitude gap between the mean of two states. For all possible changes such that , the constraint (2) enforces no change in mean and state variables. For all possible changes such that , the constraint (3) enforces a change in mean specified by the constraint function , and a change in state . Besides, the above optimization problem exploits the dynamic programming algorithm to recursively compute cost functions over the data points and hidden states.

Ii-B ECG Waveforms Detection

At first, the topology of the constraint graph and its corresponding parameters are defined based on the different possible morphologies for each waveform (P-, QRS-, T-waveforms, etc.) over one heart cycle. This constraint graph specifies biological prior knowledge about the expected morphological changes within one cycle of ECG. More specifically, each node in the constraint graph serves an expected hidden state/segment in the signal, and each edge encodes the required conditions for the transition to a new state. These conditions are determined based on the expected minimum amplitude difference of two successive segments and the polarity of a waveform. Then, the GCCD model segments the raw ECG signal using the prior knowledge provided by the constraint graph. It is worth re-emphasizing that any preprocessing step is no longer required with the GCCD model.

Iii Experimental Studies

Iii-a Dataset

We evaluated the GCCD model using the MIT-BIH arrhythmia (MIT-BIH-AR) database consists of 48 ECG recordings taken from 47 subjects, and each recording is sampled at 360 Hz and 30 min in length with 200 mV amplitude resolution [10, 4]. Each recording is composed of two ambulatory ECG channels from the modified lead II (MLII) and one of the modified leads V1, V2, V4, or V5. In this study, only the MLII was used, and thereby the records 102 and 107 were excluded from consideration. The database contains annotations of both heartbeat class information and R-wave positions information verified by two or more expert cardiologists independently. We employed the provided annotations for R-wave positions to evaluate the performance of the proposed algorithm.

Iii-B Experimental Results

Fig. 2 demonstrates an example of the performance of this proposed model in the localization of R-peak for a window of two records (# 100 and #230) from the MIT-BIH-AR dataset. The performance of R-peak detection algorithms is usually evaluated in terms of sensitivity (Sen), positive predictivity rate (PPR) and detection error rate (DER), which are calculated by:


In order to compare the performance of the proposed method against other methods in the literature, we adopted the above statistical metrics for the evaluation of the GCCD algorithm. Table I represents the R-peak detection success for GCCD algorithm considering all 48 records in the MIT-BIH dataset. As the table shows, the proposed method achieves notable results, Sen = 99.76, PPR = 99.68, and DER = 0.55, in R-peak detection. The constraint graph for each record is defined manually by matching the overall morphological properties of the signal to pre-specified categories for each waveform [20]. All the Sen and PPR values are higher than 99% except 6 cases. For 26 cases of detection in the dataset, Sen and PPR values are 100% separately. Records where the Sen and PPR values are less than 99% are mainly due to two reasons: in some cycles of the record, the morphology of the QRS-complex possesses no R-peak, or the amplitude of the R-peak is too low compared to the amplitude of its next negative peak; S wave. It worth to mention that the results for records 105 and 108, as the noisiest records in the data set, demonstrate the capability of the model in the detection of R-peak locations in the presence of high noise level without using a preprocessing step for removing the noise. Table II compares the performance of the GCCD algorithm against other state-of-the-art methods. Results in this table present that the performance of the proposed model without using any preprocessing step is comparable to the other R-wave detection methods in the literature using a preprocessing step.

Fig. 2: The demonstration of R-peak detection using the proposed model for records 100 and 230 from the MIT-BIH-AR dataset. a-b: The proposed model represents records of 100 and 230, respectively, as piece-wise locally stationary segments (the blue lines). Extracted R-peak positions are marked as red labels. c-d: The graph structure for above two proposed models. Each graph has a vertex for each state including state “R” for the R-wave. Below each edge we show the penalty , which is either a constant or zero; above we show the constants in the constraint function ( for a non-decreasing change which is shown with , for a non-increasing change which is shown with , is the minimum magnitude of change).
Record Total beats TP FP FN Sen (%) PPR (%) DER (%)
100 2273 2273 0 0 100 100 0
101 1865 1865 0 0 100 100 0
102 2187 2187 0 0 100 100 0
103 2084 2084 0 0 100 100 0
104 2229 2572 24 5 99.77 98.93 1.3
105 2572 2572 0 0 100 100 0
106 2027 2022 0 5 99.75 100 0.24
107 2137 2137 0 0 100 100 0
108 1765 1765 2 0 100 99.88 0.11
109 2532 2532 0 0 100 100 0
111 2124 2120 5 4 99.76 99.81 0.42
112 2539 2539 0 0 100 100 0
113 1795 1795 0 0 100 100 0
114 1879 1864 34 15 99.20 98.20 2.6
115 1953 1535 0 0 100 100 0
116 2412 2410 1 2 99.95 99.91 0.12
117 1535 1535 0 0 100 100 0
118 2278 2276 4 2 99.91 99.82 0.26
119 1987 1987 0 0 100 100 0
121 1863 1863 0 0 100 100 0
122 2476 2476 0 0 100 100 0
123 1518 1518 0 0 100 100 0
124 1619 1617 10 2 99.87 99.38 0.74
200 2601 2521 50 80 96.92 98.05 4.99
201 1963 1962 9 1 99.94 99.54 0.50
202 2136 2134 7 2 99.90 99.67 0.42
203 2980 2934 43 46 98.45 98.55 2.98
205 2656 2656 0 0 100 100 0
207 1860 1812 81 48 97.41 95.72 6.99
208 2955 2946 2 9 99.69 99.93 0.37
209 3005 3005 0 0 100 100 0
210 2650 2648 32 2 99.92 98.80 1.28
212 2748 2748 0 0 100 100 0
213 3251 3249 3 2 99.93 99.90 0.15
214 2262 2262 10 0 100 99.55 0.44
215 3363 3361 6 2 99.94 99.82 0.23
217 2208 2208 0 0 100 100 0
219 2154 2154 0 0 100 100 0
220 2048 2048 0 0 100 100 0
221 2427 2425 3 2 99.91 99.87 0.20
222 2483 2472 0 11 99.55 100 0.44
223 2605 2604 3 1 99.96 99.88 0.15
228 2053 2050 5 3 99.85 99.75 0.38
230 2256 2256 0 0 100 100 0
231 1571 1571 0 0 100 100 0
232 1780 1779 1 1 99.94 99.94 0.11
233 3079 3063 11 16 99.48 99.64 0.87
234 2753 2753 0 0 100 100 0
Total 109,494 109,148 346 261 99.76 99.68 0.55
TABLE I: Performance evaluation of the proposed method using the MIT-BIH-AR database
Method TP FP FN Sen (%) PPR (%) DER (%)
Pan and Tompkins [16] 115,860 507 277 99.76 99.56 0.68
Park et al. [17] 109,415 99 79 99.93 99.91 0.163
Chen et al. [2] 109,250 203 193 99.82 99.81 0.36
Farashi [3] 109,692 163 273 99.75 99.85 0.40
Sharma and Sunkaria [19] 109,381 136 113 99.50 99.56 0.93
Castells-Rufas and Carrabina [1] 108,880 353 614 99.43 99.67 0.88
Yazdani and Vesin [21] 109,357 108 137 99.87 99.90 0.22
GCCD Model 109,148 346 261 99.76 99.68 0.55
TABLE II: Comparison of performance of several R-peak detection methods using the MIT-BIH-AR database

The results provided in this research work justifies that changepoint detection models are promising approaches for the detection of ECG waveforms. The proposed model has the ability to detect R-peak positions for various morphologies of the ECG. Additionally, GCCD model can potentially detect other waveforms than R wave (i.e, P,Q,S, and T waves) by including their corresponding knowledge to the constraint graph. In this research work, we only detect R-peak positions since the MIT-BIH-AR database has just provided with the annotations for this peak. As mentioned before, for some cases in the database, the performance of the GCCD model is notably low due to the absence of the R-peak or the low amplitude of this peak in some cycles. The efficiency of the GCCD model for these specific cases can be enhanced by using a multi-path constraint graph. A multi-path constraint graph is useful in identifying all ECG waveforms as well as QS-complex possessing no R-peak. Additionally, the constraint graph can be learned in a batch mode based on each record to surmise the model from the manual definition of the constraint graph, which is considered as future work.

Iv Conclusion

The detection of ECG waveforms plays a paramount role in most of the automated ECG analysis tools. The most common features found in the literature for the detection of several heart diseases are computed from the morphological characteristics of R-peak, as the most important fiducial point within one cycle of the ECG signal. In this paper, we applied for the first time a changepoint detection model, named graph-constrained changepoint detection model, in order to extract the positions of R-peak. The constraint graph encodes prior knowledge of expected changepoints per heart cycle based on the possible morphologies for the ECG signal. The proposed model can detect R-peak locations in a log-linear complexity and requires no preprocessing steps. The evaluation results demonstrated the proposed method has a high performance for the detection of R-peak locations, especially in noisy records, compared to the existing algorithms in the literature while their performance highly relies on removing noise in the preprocessing step.


  • [1] D. Castells-Rufas and J. Carrabina (2015) Simple real-time qrs detector with the mamemi filter. Biomedical Signal Processing and Control 21, pp. 137–145. Cited by: §I, TABLE II.
  • [2] C. Chen and C. Chuang (2017) A qrs detection and r point recognition method for wearable single-lead ecg devices. Sensors 17 (9), pp. 1969. Cited by: TABLE II.
  • [3] S. Farashi (2016) A multiresolution time-dependent entropy method for qrs complex detection. Biomedical Signal Processing and Control 24, pp. 63–71. Cited by: TABLE II.
  • [4] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C. Peng, and H. E. Stanley (2000) PhysioBank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals. circulation 101 (23), pp. e215–e220. Cited by: §III-A.
  • [5] R. Gutiérrez-Rivas, J. J. Garcia, W. P. Marnane, and A. Hernández (2015) Novel real-time low-complexity qrs complex detector based on adaptive thresholding. IEEE Sensors Journal 15 (10), pp. 6036–6043. Cited by: §I.
  • [6] T. D. Hocking, G. Rigaill, and G. Bourque (2015) PeakSeg: constrained optimal segmentation and supervised penalty learning for peak detection in count data. In Proc. 32nd ICML, pp. 324–332. Cited by: §I.
  • [7] T. D. Hocking, G. Rigaill, P. Fearnhead, and G. Bourque (2017) A log-linear time algorithm for constrained changepoint detection. arXiv preprint arXiv:1703.03352. Cited by: §I.
  • [8] T. D. Hocking, G. Schleiermacher, I. Janoueix-Lerosey, V. Boeva, J. Cappo, O. Delattre, F. Bach, and J. Vert (2013-05) Learning smoothing models of copy number profiles using breakpoint annotations. BMC Bioinformatics 14 (164). Cited by: §I.
  • [9] Z. Hou, Y. Dong, J. Xiang, X. Li, and B. Yang (2018) A real-time qrs detection method based on phase portraits and box-scoring calculation. IEEE Sensors Journal 18 (9), pp. 3694–3702. Cited by: §I.
  • [10] G. B. Moody and R. G. Mark (2001) The impact of the mit-bih arrhythmia database. IEEE Engineering in Medicine and Biology Magazine 20 (3), pp. 45–50. Cited by: §III-A.
  • [11] S. Mousavi, F. Afghah, and U. R. Acharya (2019) SleepEEGNet: automated sleep stage scoring with sequence to sequence deep learning approach. PloS one 14 (5). Cited by: §I.
  • [12] S. Mousavi, F. Afghah, and U. R. Acharya (2020) HAN-ecg: an interpretable atrial fibrillation detection model using hierarchical attention networks. arXiv preprint arXiv:2002.05262. Cited by: §I.
  • [13] S. Mousavi, F. Afghah, A. Razi, and U. R. Acharya (2019) ECGNET: learning where to attend for detection of atrial fibrillation with deep visual attention. In 2019 IEEE EMBS International Conference on Biomedical & Health Informatics (BHI), pp. 1–4. Cited by: §I.
  • [14] S. Mousavi and F. Afghah (2019) Inter-and intra-patient ecg heartbeat classification for arrhythmia detection: a sequence to sequence deep learning approach. In ICASSP 2019-2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 1308–1312. Cited by: §I.
  • [15] S. Mousavi, A. Fotoohinasab, and F. Afghah (2020)

    Single-modal and multi-modal false arrhythmia alarm reduction using attention-based convolutional and recurrent neural networks

    PloS one 15 (1), pp. e0226990. Cited by: §I.
  • [16] J. Pan and W. J. Tompkins (1985-03) A real-time qrs detection algorithm. IEEE Transactions on Biomedical Engineering BME-32 (3), pp. 230–236. External Links: Document, ISSN 0018-9294 Cited by: §I, TABLE II.
  • [17] J. Park, S. Lee, and U. Park (2017) R peak detection method using wavelet transform and modified shannon energy envelope. Journal of healthcare engineering 2017. Cited by: §I, TABLE II.
  • [18] I. Parviziomran, S. Cao, H. Yang, S. Park, and D. Won (2019) Data-driven prediction model of components shift during reflow process in surface mount technology. Procedia Manufacturing 38, pp. 100–107. Cited by: §I.
  • [19] T. Sharma and K. K. Sharma (2017) QRS complex detection in ecg signals using locally adaptive weighted total variation denoising. Computers in biology and medicine 87, pp. 187–199. Cited by: §I, TABLE II.
  • [20] R. Tafreshi, A. Jaleel, J. Lim, and L. Tafreshi (2014) Automated analysis of ecg waveforms with atypical qrs complex morphologies. Biomedical Signal Processing and Control 10, pp. 41–49. Cited by: §III-B.
  • [21] S. Yazdani and J. Vesin (2016) Extraction of qrs fiducial points from the ecg using adaptive mathematical morphology. Digital Signal Processing 56, pp. 100–109. Cited by: TABLE II.