Fine-grained ECG Classification Based on Deep CNN and Online Decision Fusion

01/19/2019 ∙ by Jing Zhang, et al. ∙ 6

Early recognition of abnormal rhythm in ECG signals is crucial for monitoring or diagnosing patients' cardiac conditions and increasing the success rate of the treatment. Classifying abnormal rhythms into fine-grained categories is very challenging due to the the broad taxonomy of rhythms, noises and lack of real-world data and annotations from large number of patients. This paper presents a new ECG classification method based on Deep Convolutional Neural Networks (DCNN) and online decision fusion. Different from previous methods which utilize hand-crafted features or learn features from the original signal domain, the proposed DCNN based method learns features and classifiers from the time-frequency domain in an end-to-end manner. First, the ECG wave signal is transformed to time-frequency domain by using Short-Time Fourier Transform. Next, specific DCNN models are trained on ECG samples of specific length. Finally, an online decision fusion method is proposed to fuse past and current decisions from different models into a more accurate one. Experimental results on both synthetic and real-world ECG datasets convince the effectiveness and efficiency of the proposed method.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 1

page 3

page 7

page 12

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Electrocardiogram (ECG), which records the electrical depolarization-repolarization patterns of the heart’s electrical activity in the cardiac cycle, is widely-used for monitoring or diagnosing patients’ cardiac conditions [1, 2, 3, 4, 5, 6] as well as identification [7]. The diagnosis is usually made by well-trained and experienced cardiologists, which is expensive and sometimes inconvenient, for instance, patients should go to hospitals. Therefore, automatic monitoring and diagnosing systems are in great demand in clinics, community medical centers and home health care programs. Though great advances have been made in ECG filtering, detection and classification in the past decades [8, 9, 10, 11, 4, 12], it is still a challenging problem for efficient and accurate ECG classification due to the disturbing noise, various types of symptoms and differences between patients.

Before classification, a pre-processing filtering step is usually needed to remove a variety of noises from the ECG signal, including the power-line interference, base-line wander, muscle contraction noise, etc. Traditional approaches like low-pass filters and filter banks can reduce noise but may also lead some artifacts [13]. Combining signal modeling and filtering together may alleviate this problem, but it limits to a single type noise [14, 15]. Recently, different noise removal methods based on wavelet transform has been proposed by leveraging its superiority in multi-resolution signal analysis [16, 17, 18, 19]. For instance, S. Poungponsri and X.H. Yu proposed a novel adaptive filtering approach based on wavelet transform and artificial neural networks which can efficiently removal different types of noises [18].

For ECG classification, classical methods usually consists of two sequential modules: feature extraction and classifier training. Hand-crafted features are extracted in time domain or frequency domain, including amplitudes, intervals and higher-order statistics, etc. Various methods have been proposed such as filter banks

[20]

, Kalman filter

[21]

, Principal Component Analysis (PCA)

[22, 23, 24, 25, 26], frequency analysis, wavelet transform (WT) [27, 28, 29, 8, 8, 30, 31]

and statistical methods. Classifier models including Hidden Markov Models (HMM)

[32, 33]

, Support Vector Machines (SVM)

[34], Artificial Neural Networks (ANN) [8, 9, 27, 35, 11, 36], and mixture-of-experts method [37] have also been studied. Among them, large amount of methods are based on artificial neural network (ANN) due to its better modeling capacity. For example, L.Y. Shyu et al. proposed a novel method for detecting Ventricular Premature Contraction (VPC) using wavelet transform and fuzzy neural network [8]. By using same wavelet for QRS detection and VPC classification, their method has less computational complexity. I. Guler and E.D. Ubeyli propose to use combined neural network model for ECG beats classification [9]. Statistical features based on discrete wavelet transform are extracted and used as the input of first level networks. Then, sequential networks were trained using the outputs of the previous level networks as input. Unlike previous methods, T. Ince proposed a new method which uses a robust and generic ANN architecture and trains a patient-specific model with morphological wavelet transform features and temporal features for each patient [27]. Besides, some approaches have been proposed by combining several hand-crafted features to provided enhanced performance [38, 39].

Though the above methods have achieved good performance, they exhibit some common drawbacks: 1) the hand-crafted features rely on domain knowledge of experts and should be designed and tested carefully. And the classifier should have appropriate modeling capacity of such features. 2) The types of ECG signals are usually limited or coarse-grained, e.g., 25 types. On the one hand, for a new type of ECG pattern, the discrimination of existing features should be examined first and maybe new features should be designed elaborately again. On the other hand, their performance for fine-grained classification is still unclear since it requires features with better discrimination and classifiers with better modeling capacities.

In the past few years, deep learning based methods including deep belief networks, deep convolutional neural networks and recurrent neural networks have been widely used in many research fields and achieve remarkable performance, such as speech recognition, image classification and object detection. S. Kiranyaz et al. proposed a 1-D convolutional neural networks for patient-specific ECG classification

[4]. They design a simple but effective network architecture and utilize 1-D convolutions to processing the ECG wave signal directly. B. Pourbabaee et al. utilize deep convolutional neural networks to learn ECG features for screening paroxysmal atrial fibrillation (PAF) patients. They experimental results convince the representation capability of deep CNN. Recently, G. Clifford et al. organized the PhysioNet/Computing in Cardiology Challenge 2017 for AF rhythm classification from a short single lead ECG recording. A large number of real-world ECG samples from patients are collected and labelled. It facilitates the research on the challenging AF classification problem. Both hand-crafted feature based method and deep learning based method have been proposed and reached the top entries [40, 41, 42]

. For example, S. Hong et al. propose a ensemble classifiers based method by combining expert features and deep features together

[40]. T. Teijeiro et al. propose a combined two classifiers based method, i.e., the first classifier evaluates the record globally using aggregated values for a set of high-level and clinically meaningful features, and the second classifier utilizes a Recurrent Neural Network fed with the individual features for each detected heartbeat [42]

. M. Zabihi proposed a hand-crafted feature extraction and selection method based on a random forest classifier

[41].

In this paper, we proposed a new deep CNN based method for ECG classification in this paper. Different from previous methods, 1). We first transform the original ECG signal into time-frequency domain by Short-Time Fourier Transform (STFT). 2). Then, the time-frequency characteristics of each pattern are learned by a CNN of 2-D convolutions. 3). Finally, we propose an online decision fusion method to fuse past and current decisions from different models into a more accurate one. 4). We examine the proposed method for fine-grained ECG classification on a synthetic ECG dataset which consists of 20 types of ECG signals. Moreover, we also evaluate its performance on a real-world ECG dataset and compare it with state-of-the-art methods. The experimental results convince the effectiveness and efficiency of the proposed method.

The rest of the paper is organized as follows. In Sect. II, we briefly formulate the ECG classification problem. Then, we present the proposed method in Sect. III including short-time Fourier transform, architecture of the proposed CNN and the online decision fusion method. In Sect. IV, we present the experimental results on both a synthetic ECG dataset and a real-world ECG dataset to verify the effectiveness of the proposed method and compare its performance with state-of-the-art methods. Finally, we conclude the paper in Sect. V and point out some potential directions of the future work.

Ii Problem formulation

Given a set of ECG signals and their corresponding labels, the target of a classification method is to predict their labels correctly. As depicted in Sect. I, it usually consists of two sequential modules: feature extraction and classifier training. Once the classifier is obtained, it can be used for unseen samples prediction, i.e., testing phase. Mathematically, we denote the set of ECG wave signals as:

(1)

where is the sample: , is the sample length. is the category of , and is the number of total categories. is the index set of all samples. The feature extraction can be described as follows:

(2)

where is the corresponding feature representation of signal . Usually, the feature vector is more compact than the original signal , i.e., . is a mapping function from the original signal space to the feature space, and is the parameters associated with the mapping . It is usually determined according to domain knowledge of experts or cross-validation. Given the feature representation, a classifier predicts its category as follows:

(3)

where is the parameters associated with the classifier . is the prediction. The frequently-used classifiers include SVM [34], ANN [35, 11, 36], Random Forest [43], HMM [32, 33], DCNN [4], etc. Given the training samples, the training of a classifier can be formulated as an optimization problem of its parameter as follows:

(4)

where is the index set of training samples.

is a loss function which depicts the loss of assigning a prediction category

for a sample with label , e.g., margin loss in SVM model and cross-entropy loss in models of ANN or Random Forest.

Fig. 1: The pipeline of the proposed method for fine-grained ECG classification.
Fig. 2: (a) Exemplar original ECG wave signals from each category. (b) Corresponding spectrograms of (a) by using Short-Time Fourier Transform.

For deep neural networks models, feature extraction (learning) and classifier training are integrated together in the neural network architecture as an end-to-end model. The parameters are optimized for training samples by using the error back propagation algorithm. Mathematically, it can be formulated as:

(5)

where is the deep neural networks model with parameters . For a modern deep neural networks architecture, e.g., DCNN, it usually consists of many sequential layers like convolutional layers, pooling layer, nonlinear activation layer, and fully connected layer, etc. Therefore, is a nonlinear mapping function with powerful modeling capacity which can map the original high-dimension input data to a low-dimension feature space, where they are more discriminative, representative and compact.

Iii The proposed approach

In this paper, we proposed a new deep CNN method for fine-grained ECG classification. First, the ECG wave signal is transformed to time-frequency domain by using Short-Time Fourier Transform. Next, specific DCNN models are trained on training samples of specific length. Finally, an online decision fusion method is proposed to fuse past and current decisions from different models into a more accurate one. Figure 1 shows pipeline of the proposed method. We present its details in the following parts.

Iii-a Short-Time Fourier Transform

Though wave signals in the original time domain can be used as input of DNN to learn features, a time-frequency representation may be a better choice. Inspired by the work in speech recognition areas [44, 45], where they show spectrogram features of speech are superior to MFCC with DNN, we first transform the original ECG wave signal into the time-frequency domain by using Short-Time Fourier Transform to obtain the ECG spectrogram representation. Mathematically, it can be described as follows:

(6)

where is the window function, e.g., Hamming window. is the spectrogram of , which has a two-dimension structure. Figure 2 shows the spectrogram examples.

Iii-B Architecture of the proposed CNN

Network Type Input Size Number Filter Pad stride
Proposed Network () Conv1 1x32x4 32 3x3 1 1
Pool1 32x32x4 - 4x1 0 (4,1)
Conv2 32x8x4 32 3x3 1 1
Pool2 32x8x4 - 4x2 0 (4,2)
Fc3 32x2x2 64 - - -
Fc4 64x1x1 20 - - -
Params 18,976
Complexity111Evaluated with FLOPs, i.e. the number of floating-point multiplication-adds. 3.5x10
Network in [4] Conv1 1x1x512 32 1x15 (0,7) (1,6)
Conv2 32x1x86 16 1x15 (0,7) (1,6)
Conv3 16x1x15 16 1x15 (0,7) (1,6)
Pool3 16x1x3 - 1x3 0 1
Fc4 16x1x1 10 - - -
Fc5 10x1x1 20 - - -
Params 12,360
Complexity 1.7x10
TABLE I: Network architectures for color constancy.

Since we use the two-dimension spectrogram as input, we design a Deep CNN architecture which involves 2-D convolutions. Specifically, the proposed architecture consists of 3 convolutional layer and 2 fully-connected layer. There is a max pooling layer and a ReLU layer after the first two convolutional layer and a max pooling layer after the last convolutional layer. Details are shown in Table 

I. As can be seen, it is rather light-weight with 18,976 parameters and 3.5x10 flops computation. We also present the network architecture in [4] as a comparison. Feature channels are kept the same. Filter sizes and strides are adapted to the data length used in this paper. It can be seen that the proposed network has a comparable amount of parameters and computational cost with the one in [4]. As we will show in Sect. IV, the proposed method is rather computational efficient and can achieve a real-time performance even in a embedded devices.

With the spectrogram

as input, the DCNN model predicts a probability vector for a classification problem, i.e.,

subjected to and . Then the model parameter can be learned by minimizing the cross-entropy loss as follows:

(7)

where is the one-hot vector representation of label , i.e., .

It should be noted that the width of spectrogram is related to the length of ECG wave signal given the window function. Given the sampling rate, signals of longer length contains more beats. Usually, single beat is detected and classified [9]. Since more beats contains more information, it will lead a more accurate result. In this paper, the length of each sample in the synthetic ECG dataset is 16384 at a sampling rate of 512Hz, which lasts 32s. We split each sample into sub-samples which have a same length of 512 points. Therefore, each sample last 1s and contains beats. It is noteworthy that we do not explicitly extract the beat from the raw signal but use it as the input directly after the Short-Time Fourier Transform. Then, we train our CNN model on this dataset. Besides, to compare the performance of models for longer samples, we also split each sample into sub-samples of different lengths, e.g., 2s, 4s, 8s, 16s. We use them to train our CNN models correspondingly and denote all these length-specific models as , respectively. Though training samples of different length corresponds to different spectrograms of different widths, we use the same architecture for all the above models and only change the pooling strides along columns correspondingly while keeping the fully-connected layers fixed.

Iii-C Online decision fusion method

For online testing, as the length of signal is growing, we can test it at different time by using above models in a sequential manner. As illustrated in Fig. 1, lower level models make decisions based on local signals of shorter length, higher level models make decisions based on global signals of longer length. These models can be seen as different experts which focus on different volume of information. Their decisions may be complementary, and probably can be fused to a more accurate one. To this end, we proposed an online decision fusion method. Mathematically, it can be described as follows:

(8)

where is the fusion result, is the maximum level for a signal of specific length . is the segment of for level model , and is the number of segments at level. For example, when the total length of

in the present moment is 2048,

will be 3, and will be 4, 2, 1, respectively. is the fusion weight for model subjected to . It can be seen from Eq.(8), decision at each segment at the same level is treated equally and being averaged. It is reasonable since there is no prior knowledge for each segment and each decision is made by using the same model. For decisions from models at different scale, we assign a weight for each of them. In the experiment part, we compare its influence on the final fusion result.

Iv Experiments

In this section, we present the experimental results on both synthetic and real-world datasets and compare the proposed method with previous ones. First, we conducted extensive experiments on a synthetic dataset constructed by using a ECG simulator to inspect the proposed method for fine-grained ECG classification. Then, we evaluate the performance of the proposed method on a real-world dataset for classifying Atrial Fibrillation signals from other three types. For clarity, we will present them in the following two sub-sections, respectively.

First, we present the definition of the measures used in this paper. The notation of each element in the confusion matrix is defined in Table 

II. Then, the Accuracy, Sensitivity, Specificity and F1 score can be calculated as follows.

(9)

Here, is number of samples which belong to the category and are predicted to the category. is the number of categories.

(10)

where class represents the symptomatic classes, e.g., RAF, FAF, etc.

(11)

where class represents the normal class.

(12)
(13)
Predicted Labels
Class Class Class
Ground Truth Labels Class
Class
Class
TABLE II: Definition of the confusion matrix. is number of samples which belong to the category and are predicted to the category. is the number of categories.
Sensitivity
Methods RAF FAF AF SA AT ST PAC VB VTr PVCCI
SVM+FFT 0.950.02 0.730.01 0.770.08 0.770.05 0.850.02 0.780.10 0.760.08 0.810.04 0.710.10 0.820.03
1D CNN [4] 0.960.01 0.920.04 0.840.06 0.390.03 0.900.01 0.960.02 0.200.09 0.920.03 0.190.11 0.870.02
Proposed 0.970.01 0.990.01 0.940.01 0.710.02 0.940.01 0.990.01 0.590.20 0.930.01 0.180.19 0.890.03
SVM+CNN Feature 0.980.01 0.990.01 0.970.01 0.750.01 0.970.01 0.990.01 0.820.04 0.960.01 0.340.12 0.950.01
Sensitivity Specificity
Methods VTa RVF FVF AVB-I AVB-II AVB-III RBBB LBBB PVC N
SVM+FFT 0.840.02 0.860.03 0.780.06 0.820.02 0.840.05 0.760.07 0.720.09 0.810.05 0.760.10 0.860.02
1D CNN [4] 0.970.01 0.970.02 0.960.02 0.700.17 0.940.03 0.680.05 0.910.04 0.980.01 0.680.05 0.950.01
Proposed 0.990.01 0.990.01 0.980.01 0.770.15 0.920.03 0.860.04 0.950.01 0.980.01 0.720.05 0.960.01
SVM+CNN Feature 1.000.00 1.000.00 0.990.01 0.950.01 0.980.01 0.930.03 0.970.01 0.980.01 0.750.04 0.980.01
TABLE III:

Sensitivity, Specificity of different methods on the training set. Mean scores and the Standard deviations (mean

std) are reported.
Sensitivity
Methods RAF FAF AF SA AT ST PAC VB VTr PVCCI
SVM+FFT 0.880.11 0.440.14 0.500.21 0.690.13 0.530.27 0.560.21 0.570.13 0.420.29 0.380.14 0.540.24
1D CNN [4] 0.960.01 0.930.04 0.830.07 0.380.04 0.900.02 0.950.03 0.200.05 0.930.02 0.150.15 0.850.03
Proposed 0.980.01 0.990.01 0.950.03 0.720.02 0.940.02 0.960.03 0.620.22 0.940.01 0.320.23 0.910.02
SVM+CNN Feature 0.970.01 0.980.01 0.960.01 0.720.02 0.950.01 0.960.04 0.850.03 0.960.01 0.020.02 0.950.03
Proposed(Fusion) 0.990.01 1.000.00 0.990.01 1.000.00 1.000.00 1.000.00 1.000.00 1.000.00 0.990.01 0.990.01
Sensitivity Specificity
Methods VTa RVF FVF AVB-I AVB-II AVB-III RBBB LBBB PVC N
SVM+FFT 0.730.09 0.600.18 0.450.32 0.500.30 0.660.14 0.560.21 0.460.07 0.480.33 0.400.24 0.600.18
1D CNN [4] 0.960.01 0.960.02 0.940.04 0.500.35 0.930.08 0.640.07 0.890.10 0.980.01 0.690.05 0.950.02
Proposed 0.990.01 0.980.02 0.990.01 0.560.30 0.930.06 0.870.05 0.950.02 0.980.01 0.570.25 0.960.02
SVM+CNN Feature 0.980.01 0.970.02 0.980.02 0.480.15 0.950.05 0.890.05 0.950.02 0.980.01 0.570.22 0.970.02
Proposed(Fusion) 1.000.00 1.000.00 0.990.01 0.860.20 1.000.00 0.950.05 0.980.02 1.000.00 0.990.01 0.990.01
TABLE IV: Sensitivity, Specificity of different methods on the test set. Mean scores and the Standard deviations (meanstd) are reported.

Iv-a Experiments on a synthetic ECG dataset

Iv-A1 Dataset and parameter settings

To verify the effectiveness of the proposed method, we construct a synthetic dataset by using a ECG simulator. The simulator can generate different types of ECG signals with different parameter settings. In this paper, we choose 20 categories of ECG signals including: Normal(N), Rough Atrial Fibrillation(RAF), Fine Atrial Fibrillation(FAF), Atrial Flutter(AF), Sinus Arrhythmia(SA), Atrial Tachycardia(AT), Supraventricular Tachycardia(ST), Premature Atrial Contraction(PAC), Ventricular Bigeminy(VB), Ventricular Trigeminy(VTr), Premature Ventricular Contraction Coupling Interval(PVCCI), Ventricular Tachycardia(VTa), Rough Ventricular Fibrillation(RVF), Fine Ventricular Fibrillation(FVF), Atrio-Ventricular Block I(AVB-I), Atrio-Ventricular Block(AVB-II), Atrio-Ventricular Block(AVB-III), Right Bundle Branch Block(RBBB), Left Bundle Branch Block(LBBB), and Premature Ventricular Contractions(PVC). There are total 2426 samples, about 120 samples per category. Each sample has a maximum length of 16384 points at a sampling frequency 512Hz. In the following experiments, we use 3-fold cross-validation to evaluate the proposed method.

Parameters are set as the following. We use the Hamming window of length 256 in Short-Time Fourier Transform and the overlap size is 128. The CNN model is trained in a total of 20,000 iterations with a batchsize of 128. The learning rate decreases by half from 0.01 to 6.25x, every 5,000 iterations. The momentum and the decay parameter are set to 0.9 and 5x

, respectively. We implemented the proposed method in CAFFE

[46]. All the experiments are conducted on a workstation with Nvidia GTX Titan X GPUs if not specified.

Iv-A2 Comparisons with previous methods

We compare the performance of the proposed method with previous methods including SVM based on Fourier transform, the pilot Deep CNN method in [4] which uses 1-D convolutions, and SVM based on the learned features of the proposed method. We report the sensitivity and specificity scores of different methods on the training set and test set for all categories. We also report the average classification accuracies. The standard deviations of each index on the 3-fold cross-validation are also reported. Results are summarized in Table III, Table IV and Table V.

Methods Training Set Test Set
SVM+FFT 0.81(0.04) 0.56(0.12)
1D CNN [4] 0.83(0.01) 0.81(0.04)
Proposed 0.88(0.03) 0.87(0.03)
SVM+CNN Feature 0.93(0.01) 0.87(0.02)
Proposed(Fusion) - 0.99(0.01)
TABLE V: Average classification accuracies of different methods on the training set and test test. Standard deviations (Std.) are listed in the brackets.

It can be seen that method in [4] outperforms traditional methods, e.g., features based on FFT coefficients and SVM classifier. However, it is inferior to the proposed one which using 2-dimensional spectrogram as input. By learning features of time-frequency characteristics from spectrogram, the proposed method achieves better classification accuracy. In addition, we use the learned features of the proposed method to train a SVM classifier. The corresponding results are denoted as “SVM+CNN Feature”. Compared with the SVM with FFT features, performance of this classifier is significantly boosted. It convinces that the proposed method learns a more discriminative feature representation of ECG signal. Interestingly, it is marginal better than the proposed CNN model which indeed uses a linear classifier in the last layer of the proposed network architecture. It is reasonable since a more sophisticated nonlinear radial basis kernel is used in the SVM classifier. But it also shows a tendency toward overfitting since a larger gain is achieved on the training set.

Moreover, from Table III and Table IV, we can find that categories of SA, PAC, VTr, AVB-I and PVC are hard to be distinguished. We’ll shed light on the reason by analyzing the learned features through the visualization technique [47, 48, 49] as well as by analyzing the confusion matrix between categories.

Iv-A3 Analysis on learned features and confusion matrix between categories

To show the effectiveness of the proposed method on learning feature representation, we plot the learned features to visually inspect them. For example, we obtain the learned features of all training data by calculating the responses of the penultimate layer. Then, we employ the t-Distributed Stochastic Neighbor Embedding (t-SNE) method proposed in [47, 48, 49] to visually inspect them. The visualization results of features for all the categories are shown in the up-left sub-figure in Fig. 3. As can be seen, some categories such as Normal(N), RVF, FVF, ST, RBBB, LBBB, PVCCI, VTa and RAF, are clustered and separated with other categories. However, some categories such as SA, PAC, PVC, VTr, AVB-I and AVB-III, are mixed with other categories as indicated by the red circles. To show it more clearly, we selected the concerned categories and only plot their feature visualization results. They are shown in the subsequent sub-figures in Fig. 3. For example, SA tends be mixed with AT and PVC, and PVC tends to be mixed with PAC and VB. Nevertheless, they are well separated from the category of Normal, which implies that the proposed method can predict the category of Normal correctly. It is consistent with the high specificity scores in Table III and Table IV.

Fig. 3: Visualization of the learned features of the proposed network by using t-SNE [47]. (a): Feature clusters of all categories on the test set. (b)-(e): Feature clusters of most confusion categories (denoted by the red circles). Best viewed on screen.
(a)
(b)
Fig. 4: LABEL:sub@Fig:ConfusionMatrixLevel1 Confusion matrix of the proposed method at level 1 on the test set. LABEL:sub@Fig:ConfusionMatrixFusion Confusion matrix of the proposed online decision fusion method on the test set. From left to right and from top to down, the categories are RAF, FAF, AF, SA, AT, ST, PAC, VB, VTr, PVCCI, VTa, RVF, FVF, AVB-I, AVB-II, AVB-III, RBBB, LBBB, PVC and N, respectively. Each row represents the misclassified numbers corresponding to assigning samples in a specific category to other categories.
(a)
(b)
Fig. 5: Results of the proposed online decision fusion method. LABEL:sub@Fig:accurayMultiScaleWholeAugFusionMu Means of the classification accuracies. LABEL:sub@Fig:accurayMultiScaleWholeAugFusionStd Standard deviations (Std) of the classification accuracies.

In addition, we also calculate the confusion matrix between categories. The results are shown in Fig. 4(a). Each row represents the numbers corresponding to assigning samples in a given category to other categories. It is more clear that which category tends to be mixed with others, e.g., SA, VTr, AVB-I and PVC. The results are consistent with the visual inspection results in Fig. 3. The above visual inspection method and the confusion matrix are convenient to understand the confusion between some fine-grained categories, which is very helpful for real applications.

Iv-A4 Online decision fusion performance

We test the fusion method in Sect. III-C at different levels : 2, 3, 4, 5, and 6. Two kinds of fusion weights are used: the uniform one and the one preferring high level models. The fusion weight in the latter case is calculated according to the following equation:

(14)

Fig.5 shows the means and standard deviations of the classification accuracies of the proposed fusion methods and the proposed single scale method, respectively. First, it can be seen that model achieves the best performance among all the single models at different levels. The reason may be that it makes a trade-off between data length and number of model decisions. Compared with model , the input data length is 16 times larger. Compared with model , which only make a single decision on the whole sequence, model can make 4 decisions on four different sub-sequence, these decisions can be fused into a more accurate one.

Then, it can be seen that the fusion results are consistently better than the results of single model. And the performance is improved consistently with the growth of data length. It convinces the idea that fusing decisions from different model may lead a more robust and accurate decision since these models are trained on samples covering different scopes of original data. Besides, it can be seen that using nonuniform weights does not provide any advantage over the uniform one. We hypothesize that the nonuniform weight strategy prefers the model with the largest amount of data and neglects the decisions from the models with smaller amount of data. Though it is better than the single model, the gains are indeed very marginal. Especially at higher levels, the performance is largely dominated by the model at hight level.

As can be concluded, the proposed online decision fusion method with uniform weights at level 6 achieves the best result, i.e., highest average classification accuracy and lowest standard deviation. For example, the accuracy is boosted from 87% (single model at level 1) to 99%, and the standard deviation is reduced from 0.03 (single model at level 1) to 0.011. The sensitivity and specificity indexes of it are shown in Table IV, which shows significant boost than other methods. The confusion matrix shown in Fig. 4(b) shows the similar results. These experimental results clearly convince the effectiveness of the proposed online decision fusion method.

Level Titan X TX2 TX2 (BatchSize: x10)
Mode GPU CPU GPU CPU GPU CPU
1 0.01 0.17 0.17 0.46 0.12 0.44
2 0.03 0.21 0.27 0.59 0.13 0.55
3 0.05 0.26 0.37 0.80 0.14 0.72
4 0.10 0.42 1.22 1.57 0.18 1.39
5 0.21 0.82 2.01 2.91 0.27 2.79
6 0.33 1.33 2.73 5.38 0.56 5.66
TABLE VI: Running times (ms) of the proposed method at different settings.

Iv-A5 Computational complexity and running time analysis

As depicted in Table I, the total computational cost of the proposed architecture is only 3.5x10 flops. We records the running times of the proposed method at GPU and CPU modes, respectively. Results are shown in Table VI. It can be seen, the running time is only 0.33ms even if it is tested for the whole sequence (level 6). To further examine the computational efficiency of the proposed method, we test it on the NVIDIA Jetson TX2 embedded board both at GPU and CPU modes. Again, the proposed method can achieve a real-time speed. Interestingly, the running times at GPU mode and CPU mode are comparable. We hypothesize that enlarging the batch size may make full advantage of the GPU acceleration. As can be seen, after enlarging the batch size 10 times, the superiority of GPU mode is significant. Generally, these results imply that the proposed method is very efficient and promising to be integrated in a portable ECG monitor instrument with limited computational resources.

Iv-B Experiments on a real-world ECG dataset

In addition, we also conducted extensive experiments on a real-world ECG dataset which is used as the benchmark dataset for the PhysioNet/Computing in Cardiology Challenge 2017 [50]. The dataset consists of three parts: training set, validation set and the test set. The training set contains 8,528 single lead ECG recordings lasting from 9 s to just over 60 s. The validation set and test set contain 300 and 3,658 ECG recordings of similar lengths, respectively. The ECG recordings were sampled as 300 Hz. Each sample is labelled into four categories: Normal rhythm, AF rhythm, Other rhythm and Noisy recordings. Some examples of the ECG waveforms in PhysioNet dataset are shown in Fig. 6. Only labels of training set and validation set are publicly available. Labels of test set are kept private and the corresponding results should be submitted to the test server during the challenge. More details can be found in [50, 51].

We train our model on the training set. Scores both on training set and validation set are reported and compared with the top entries in the challenge. It is noteworthy that we add two more convolutional layers after each of the first and second convolutional layers in the network depicted in Table I. It leads to a network architecture with a stronger modelling capacity, which can handle the real-world ECG signals better. The corresponding number of convolutional filters and kernel sizes are the same with their preceding counterparts. The first fully-connected layer is kept the same. The output number of the last fully-connected layer is modified to be four to keep consistent with the number of categories. Each sample in the dataset is cropped or duplicated to have a length of 16,384. All other hyper-parameters are kept the same with the experiments in the above sub-sections if not specified. We train the model at each level three times with random seeds, and report the average scores and the standard deviations.

Fig. 6: Examples of the ECG waveforms in PhysioNet dataset [50].

Iv-B1 Performance of the proposed online fusion model and comparisons with the top entries in the challenge

(a)
(b)
Fig. 7: Results of the proposed online decision fusion method on the PhysioNet dataset [50]. LABEL:sub@Fig:onlineFusionAccMean_PhysioNet Means of the classification accuracies of three training models. LABEL:sub@Fig:onlineFusionAccStd_PhysioNet Standard deviations (Std) of the classification accuracies of three training models.
(a)
(b)
Fig. 8: Results of the proposed online decision fusion method on the PhysioNet dataset [50]. LABEL:sub@Fig:onlineFusionF1Mean_PhysioNet Means of the F1 scores of three training models. LABEL:sub@Fig:onlineFusionF1Std_PhysioNet Standard deviations (Std) of the F1 scores of three training models.
Fig. 9: Visualization of the learned features of the proposed network on the PhysioNet dataset [50] by using t-SNE [47].

We report the experimental results on the aforementioned dataset in terms of mean accuracies and F1 scores. To keep consistent with the evaluation protocol in [50, 51], we report the average F1 scores for the first three categories. In addition, we also include the average F1 scores for all categories. The results are plotted in Fig. 7 and Fig. 8. It can be seen that best results are achieved at level 4 () and level 5 () by the proposed online fusion method, i.e., classification accuracy and F1 score. Meanwhile, the results of single models are also competetive. Best results are achieved at level 2 () with a accuracy and a F1 score. It is consistent with the experimental result in Sect. IV-A4 that neither model nor model achieve the best results. The model make a trade-off between data length and number of model decisions and achieves the best results. The comparison results between the proposed method and the top entries in the challenge are shown in Table VII. It can be seen that the proposed methods can achieve comparable or better results than the top entries both on training set and validation set.

Rank Entry accuracy F1 score F1 score (all categories)
Validation Train Validation Train Validation Train
1 Teijeiro et al. [42] - - 0.912 0.893 - -
1 Datta et al. - - 0.990 0.970 - -
1 Zabihi et al. [41] - - 0.968 0.951 - -
1 Hong et al. [40] - - 0.990 0.970 - -
5 Baydoun et al. - - 0.859 0.965 - -
5 Bin et al. - - 0.870 0.875 - -
5 Zihlmann et al. - - 0.913 0.889 - -
5 Xiong et al. - - 0.905 0.877 - -
- Proposed (level 4) 0.9920.002 0.9980.001 0.9890.002 0.9960.002 0.9920.002 0.9950.003
- Proposed (fusion, level 4) 1.00.0 0.9990.001 1.00.0 0.9940.006 1.00.0 0.9910.009
TABLE VII: Accuracies and F1 scores on the PhysioNet dataset of the proposed methods and the top entries in the challenge. F1 score stands for the average F1 score for the first three categories, i.e., Normal rhythm, AF rhythm and Other rhythm. F1 score (all categories) stands for the average F1 score for all categories.

Iv-B2 Analysis on learned features

Like Sect. IV-A3, we also plot the learned features by the model . We obtain the learned features of all samples in the validation set by calculating the responses of the penultimate layer. Then, we employ the t-Distributed Stochastic Neighbor Embedding (t-SNE) method proposed in [47, 48, 49] to visually inspect them. The visualization results of features for all the categories are shown in Fig. 9. As can be seen, samples in each categories are almost clustered together and separated with other clusters. For several samples in category of ”Other rhythm”, they are near the clusters of ”Normal” and ”Noisy”. It implies those samples are either mislabelled which should be carefully checked or easy to be confused with other categories which should be carefully handled.

(a)
(b)
(c)
(d)
Fig. 10: Spectrograms and corresponding response maps. (a)-(d) shows the results two example samples for categories AF rhythm, Normal rhythm, Other rhythm and Noisy rhythm, respectively. In each sub-figure, the spectrograms and corresponding response maps of , , and layers are shown from the top row to the bottom row. Hot color represents strong response.

In addition, we also plot the spectrograms and their corresponding response maps of , , and layers for each category in Fig. 10. As can be seen, the first convolutional layer acts like a basic feature extractor which strengthens the useful parts in the spectrograms. Then, features corresponding to low and medium frequencies are pooled and contribute more to the final classification. From the response maps of , we can see that the proposed network can generate strong responses in specific frequency zones and accumulate useful features along the temporal axis. By doing so and together with the online fusion process, the proposed network can learn effective and discriminative features and make accurate classification.

V Conclusion and future work

In this paper, we introduce a novel deep CNN based method for fine-grained ECG signal classification. It can learn discriminative feature representation from the time-frequency domain by calculating the Short-Time Fourier Transform of the original wave signal. In addition, the proposed online decision fusion method fuses past and current decisions from different models and generates a more accurate one. Experimental results on a synthetic 20-category ECG dataset and a real-world AF classification dataset convince the effectiveness of the proposed method. Moreover, the proposed method is computational efficient and promising to be integrated in a portable ECG monitor instrument with limited computational resources.

Future research may include the following two directions: 1) Constructing more compact and efficient network architectures to handle complex real-world ECG data. 2) Improving the online decision method in a recursive manner which not only uses the past decisions but also the past learned features.

Vi Acknowledgement

This work is supported by the Natural Science Foundation of China (61751304), NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization (U1509203, U1709215) and Zhejiang Natural Science Foundation of China (LY17F010020).

References

  • [1] I. D. Castro, C. Varon, T. Torfs, S. Van Huffel, R. Puers, and C. Van Hoof, “Evaluation of a multichannel non-contact ecg system and signal quality algorithms for sleep apnea detection and monitoring,” Sensors, vol. 18, no. 2, p. 577, 2018.
  • [2] M. Nappi, V. Piuri, T. Tan, and D. Zhang, “Introduction to the special section on biometric systems and applications,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 44, no. 11, pp. 1457–1460, 2014.
  • [3] K. A. Sidek, I. Khalil, and H. F. Jelinek, “Ecg biometric with abnormal cardiac conditions in remote monitoring system,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 44, no. 11, pp. 1498–1509, 2014.
  • [4] S. Kiranyaz, T. Ince, and M. Gabbouj, “Real-time patient-specific ecg classification by 1-d convolutional neural networks,” IEEE Transactions on Biomedical Engineering, vol. 63, no. 3, pp. 664–675, 2016.
  • [5] B. Pourbabaee, M. J. Roshtkhari, and K. Khorasani, “Deep convolution neural networks and learning ecg features for screening paroxysmal atrial fibrillatio patients,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, 2017.
  • [6] A. Szczepański and K. Saeed, “A mobile device system for early warning of ecg anomalies,” Sensors, vol. 14, no. 6, pp. 11 031–11 044, 2014.
  • [7] Z. Zhao, L. Yang, D. Chen, and Y. Luo, “A human ecg identification system based on ensemble empirical mode decomposition,” Sensors, vol. 13, no. 5, pp. 6832–6864, 2013.
  • [8] L.-Y. Shyu, Y.-H. Wu, and W. Hu, “Using wavelet transform and fuzzy neural network for vpc detection from the holter ecg,” IEEE Transactions on Biomedical Engineering, vol. 51, no. 7, pp. 1269–1273, 2004.
  • [9] I. Guler and E. D. Ubeyli, “Ecg beat classifier designed by combined neural network model,” Pattern recognition, vol. 38, no. 2, pp. 199–208, 2005.
  • [10] S. Mitra, M. Mitra, and B. B. Chaudhuri, “A rough-set-based inference engine for ecg classification,” IEEE Transactions on instrumentation and measurement, vol. 55, no. 6, pp. 2198–2206, 2006.
  • [11]

    T. Mar, S. Zaunseder, J. P. Martínez, M. Llamedo, and R. Poll, “Optimization of ecg classification by means of feature selection,”

    IEEE transactions on Biomedical Engineering, vol. 58, no. 8, pp. 2168–2177, 2011.
  • [12] W. Li, J. Li, and Q. Qin, “Set-based discriminative measure for electrocardiogram beat classification,” Sensors, vol. 17, no. 2, p. 234, 2017.
  • [13] Y. Wu, R. M. Rangayyan, Y. Zhou, and S.-C. Ng, “Filtering electrocardiographic signals using an unbiased and normalized adaptive noise reduction system,” Medical Engineering & Physics, vol. 31, no. 1, pp. 17–26, 2009.
  • [14] J. Yan, Y. Lu, J. Liu, X. Wu, and Y. Xu, “Self-adaptive model-based ecg denoising using features extracted by mean shift algorithm,” Biomedical Signal Processing and Control, vol. 5, no. 2, pp. 103–113, 2010.
  • [15] M. Blanco-Velasco, B. Weng, and K. E. Barner, “Ecg signal denoising and baseline wander correction based on the empirical mode decomposition,” Computers in biology and medicine, vol. 38, no. 1, pp. 1–13, 2008.
  • [16] V. Bhateja, S. Urooj, R. Mehrotra, R. Verma, A. Lay-Ekuakille, and V. D. Verma, “A composite wavelets and morphology approach for ecg noise filtering,” in International Conference on Pattern Recognition and Machine Intelligence.   Springer, 2013, pp. 361–366.
  • [17] W. Jenkal, R. Latif, A. Toumanari, A. Dliou, O. El Bcharri, and F. M. Maoulainine, “An efficient algorithm of ecg signal denoising using the adaptive dual threshold filter and the discrete wavelet transform,” Biocybernetics and Biomedical Engineering, vol. 36, no. 3, pp. 499–508, 2016.
  • [18] S. Poungponsri and X.-H. Yu, “An adaptive filtering approach for electrocardiogram (ecg) signal noise reduction using neural networks,” Neurocomputing, vol. 117, pp. 206–213, 2013.
  • [19] Y. Xu, M. Luo, T. Li, and G. Song, “Ecg signal de-noising and baseline wander correction based on ceemdan and wavelet threshold,” Sensors, vol. 17, no. 12, p. 2754, 2017.
  • [20] V. X. Afonso, W. J. Tompkins, T. Q. Nguyen, and S. Luo, “Ecg beat detection using filter banks,” IEEE transactions on biomedical engineering, vol. 46, no. 2, pp. 192–202, 1999.
  • [21] N. Zeng, Z. Wang, and H. Zhang, “Inferring nonlinear lateral flow immunoassay state-space models via an unscented kalman filter,” Science China Information Sciences, vol. 59, no. 11, p. 112204, 2016.
  • [22] F. Castells, P. Laguna, L. Sörnmo, A. Bollmann, and J. M. Roig, “Principal component analysis in ecg signal processing,” EURASIP Journal on Applied Signal Processing, vol. 2007, no. 1, pp. 98–98, 2007.
  • [23] V. Monasterio, P. Laguna, and J. P. Martinez, “Multilead analysis of t-wave alternans in the ecg using principal component analysis,” IEEE Transactions on Biomedical Engineering, vol. 56, no. 7, pp. 1880–1890, 2009.
  • [24] R. J. Martis, U. R. Acharya, K. Mandana, A. K. Ray, and C. Chakraborty, “Application of principal component analysis to ecg signals for automated diagnosis of cardiac health,” Expert Systems with Applications, vol. 39, no. 14, pp. 11 792–11 800, 2012.
  • [25] Y. Ozbay, R. Ceylan, and B. Karlik, “A fuzzy clustering neural network architecture for classification of ecg arrhythmias,” Computers in Biology and Medicine, vol. 36, no. 4, pp. 376–388, 2006.
  • [26] M. Kallas, C. Francis, L. Kanaan, D. Merheb, P. Honeine, and H. Amoud, “Multi-class svm classification combined with kernel pca feature extraction of ecg signals,” in Telecommunications (ICT), 2012 19th International Conference on.   IEEE, 2012, pp. 1–5.
  • [27] T. Ince, S. Kiranyaz, and M. Gabbouj, “A generic and robust system for automated patient-specific classification of ecg signals,” IEEE Transactions on Biomedical Engineering, vol. 56, no. 5, pp. 1415–1426, 2009.
  • [28] E. Jayachandran et al., “Analysis of myocardial infarction using discrete wavelet transform,” Journal of medical systems, vol. 34, no. 6, pp. 985–992, 2010.
  • [29] A. Daamouche, L. Hamami, N. Alajlan, and F. Melgani, “A wavelet optimization approach for ecg signal classification,” Biomedical Signal Processing and Control, vol. 7, no. 4, pp. 342–349, 2012.
  • [30] J. Ródenas, M. García, R. Alcaraz, and J. J. Rieta, “Wavelet entropy automatically detects episodes of atrial fibrillation from single-lead electrocardiograms,” Entropy, vol. 17, no. 9, pp. 6179–6199, 2015.
  • [31] M. García, J. Ródenas, R. Alcaraz, and J. J. Rieta, “Application of the relative wavelet energy to heart rate independent detection of atrial fibrillation,” Computer methods and programs in biomedicine, vol. 131, pp. 157–168, 2016.
  • [32] M. Javadi, R. Ebrahimpour, A. Sajedin, S. Faridi, and S. Zakernejad, “Improving ecg classification accuracy using an ensemble of neural network modules,” PLoS one, vol. 6, no. 10, p. e24386, 2011.
  • [33] W. Liang, Y. Zhang, J. Tan, and Y. Li, “A novel approach to ecg classification based upon two-layered hmms in body sensor networks,” Sensors, vol. 14, no. 4, pp. 5994–6011, 2014.
  • [34] S. Osowski, L. T. Hoai, and T. Markiewicz, “Support vector machine-based expert system for reliable heartbeat recognition,” IEEE transactions on biomedical engineering, vol. 51, no. 4, pp. 582–589, 2004.
  • [35] M. Barni, P. Failla, R. Lazzeretti, A.-R. Sadeghi, and T. Schneider, “Privacy-preserving ecg classification with branching programs and neural networks,” IEEE Transactions on Information Forensics and Security, vol. 6, no. 2, pp. 452–468, 2011.
  • [36]

    J.-S. Wang, W.-C. Chiang, Y.-L. Hsu, and Y.-T. C. Yang, “Ecg arrhythmia classification using a probabilistic neural network with a feature reduction method,”

    Neurocomputing, vol. 116, pp. 38–45, 2013.
  • [37] A. R. Hassan and M. A. Haque, “An expert system for automated identification of obstructive sleep apnea from single-lead ecg using random under sampling boosting,” Neurocomputing, vol. 235, pp. 122–130, 2017.
  • [38] J. Oster and G. D. Clifford, “Impact of the presence of noise on rr interval-based atrial fibrillation detection,” Journal of electrocardiology, vol. 48, no. 6, pp. 947–951, 2015.
  • [39] Q. Li, C. Liu, J. Oster, and G. D. Clifford, “Signal processing and feature selection preprocessing for classification in noisy healthcare data,” Machine Learning for Healthcare Technologies, vol. 2, p. 33, 2016.
  • [40] S. Hong, M. Wu, Y. Zhou, Q. Wang, J. Shang, H. Li, and J. Xie, “Encase: An ensemble classifier for ecg classification using expert features and deep neural networks,” in Computing in Cardiology (CinC), 2017.   IEEE, 2017, pp. 1–4.
  • [41] M. Zabihi, A. B. Rad, A. K. Katsaggelos, S. Kiranyaz, S. Narkilahti, and M. Gabbouj, “Detection of atrial fibrillation in ecg hand-held devices using a random forest classifier,” 2017.
  • [42] T. Teijeiro, C. A. García, D. Castro, and P. Félix, “Arrhythmia classification from the abductive interpretation of short single-lead ecg records,” arXiv preprint arXiv:1711.03892, 2017.
  • [43] N. Emanet, “Ecg beat classification by using discrete wavelet transform and random forest algorithm,” in Soft Computing, Computing with Words and Perceptions in System Analysis, Decision and Control, 2009. ICSCCW 2009. Fifth International Conference on.   IEEE, 2009, pp. 1–4.
  • [44] L. Deng, M. L. Seltzer, D. Yu, A. Acero, A.-r. Mohamed, and G. Hinton, “Binary coding of speech spectrograms using a deep auto-encoder,” in Eleventh Annual Conference of the International Speech Communication Association, 2010.
  • [45] L. Deng, J. Li, J.-T. Huang, K. Yao, D. Yu, F. Seide, M. Seltzer, G. Zweig, X. He, J. Williams et al., “Recent advances in deep learning for speech research at microsoft,” in Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on.   IEEE, 2013, pp. 8604–8608.
  • [46] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell, “Caffe: Convolutional architecture for fast feature embedding,” in Proceedings of the 22nd ACM international conference on Multimedia.   ACM, 2014, pp. 675–678.
  • [47] L. v. d. Maaten and G. Hinton, “Visualizing data using t-sne,” Journal of machine learning research, vol. 9, no. Nov, pp. 2579–2605, 2008.
  • [48] L. Van Der Maaten, “Accelerating t-sne using tree-based algorithms.” Journal of machine learning research, vol. 15, no. 1, pp. 3221–3245, 2014.
  • [49] L. Van der Maaten and G. Hinton, “Visualizing non-metric similarities in multiple maps,” Machine learning, vol. 87, no. 1, pp. 33–55, 2012.
  • [50] “The physionet/computing in cardiology challenge 2017,” https://physionet.org/challenge/2017/.
  • [51] G. Clifford, C. Liu, B. Moody, L. Lehman, I. Silva, Q. Li, A. Johnson, and R. Mark, “Af classification from a short single lead ecg recording: The physionet computing in cardiology challenge 2017,” Computing in Cardiology, vol. 44, 2017.