Robust Active Learning for Electrocardiographic Signal Classification

11/21/2018
by   Xu Chen, et al.
SAS
0

The classification of electrocardiographic (ECG) signals is a challenging problem for healthcare industry. Traditional supervised learning methods require a large number of labeled data which is usually expensive and difficult to obtain for ECG signals. Active learning is well-suited for ECG signal classification as it aims at selecting the best set of labeled data in order to maximize the classification performance. Motivated by the fact that ECG data are usually heavily unbalanced among different classes and the class labels are noisy as they are manually labeled, this paper proposes a novel solution based on robust active learning for addressing these challenges. The key idea is to first apply the clustering of the data in a low dimensional embedded space and then select the most information instances within local clusters. By selecting the most informative instances relying on local average minimal distances, the algorithm tends to select the data for labelling in a more diversified way. Finally, the robustness of the model is further enhanced by adding a novel noisy label reduction scheme after the selection of the labeled data. Experiments on the ECG signal classification from the MIT-BIH arrhythmia database demonstrate the effectiveness of the proposed algorithm.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 2

11/18/2019

The Effectiveness of Variational Autoencoders for Active Learning

The high cost of acquiring labels is one of the main challenges in deplo...
06/07/2017

Active Learning for Structured Prediction from Partially Labeled Data

We propose a general purpose active learning algorithm for structured pr...
04/12/2021

Active learning for medical code assignment

Machine Learning (ML) is widely used to automatically extract meaningful...
02/25/2022

Learning ECG Representations based on Manipulated Temporal-Spatial Reverse Detection

Learning representations from electrocardiogram (ECG) serves as a fundam...
08/15/2021

Deep Active Learning for Text Classification with Diverse Interpretations

Recently, Deep Neural Networks (DNNs) have made remarkable progress for ...
10/27/2020

Graph-based Reinforcement Learning for Active Learning in Real Time: An Application in Modeling River Networks

Effective training of advanced ML models requires large amounts of label...
11/21/2016

Unsupervised Learning for Lexicon-Based Classification

In lexicon-based classification, documents are assigned labels by compar...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Many advanced machine learning algorithms require massive amount of labeled data to demonstrate the full potential of the techniques. As annotated data is usually expensive to obtain and requires the expertise of people who are experts to label them, the goal of active learning (AL)

ICDM2016AL NIPS2008AL multiclassus querybycommittee expectedmodelchange cvpr2013 icml2016 icml2017 is to facilitate the data collection process by automatically selecting the best data sample to label. Frequently, ECG signal classification often faces the problem of unbalanced classes because naturally normal examples which constitute the majority class in classification problems are massive amount ECG . For active learning, it is usually important to predict and query for the minority classes, namely, abormal heart beats in ECG signals. Another cause for class imbalance problem in ECG signal classification is the limitations due to cost, difficulty or privacy on collecting instances of abnormal heart beats. The work in CIKM07

proposed an efficient SVM based active learning selection strategy to address the active learning for unbalanced classes utilizing the information of support vectors for querying the data in classification boundaries for binary classification. However,

CIKM07 did not utilize the important information of the clustering structures of the unlabeled data for better selection of the data instances. More recently, the work in LAL2017 reports that the more unbalanced the classes are, the furtherfrom the optimum made by uncertainty sampling is. In addition to that, noisy labels obtained from the selection process and manual labelling pose another challenge for active learning as the noisy labels could propagate through the learning process if they are not well treated.

In this paper, we propose an effective solution to address these challenges in a unified framework called robust active label spreading (RALS) thanks to two features. First, by selecting the most informative instances based on an information theoretical measure and manifold embedding within the local clusters, the proposed RALS algorithm is capable of selecting the samples from different classes in a more diversified manner, which results in better prediction performance. Second, a novel noise reduction scheme is applied to further boost the prediction performance.

Figure 1: The detailed block diagram of the proposed RALS algorithm.

2 The Proposed Algorithm

Dimensionality Reduction, Clustering and Query Selection

The first stage of RALS algorithm relies on label spreading NIPS2004

. The label spreading algorithm is a well known graph-based semi-supervised learning algorithm. It calculates the similarity measure and propagates the labels by the measure for prediction. It also generates the label distribution matrix

which consists of the predicted probability for every class for each sample, where

refers to the number of classes and refers to the number of samples. In order to select the data from different classes, here t-Distributed Stochastic Neighbor Embedding(t-SNE) tsne

is applied to the label distribution matrix due to its good performance for high dimensional datasets. As shown in Fig.1, once the dimensionality reduction is conducted, K-means clustering is applied to the label distribution matrix with reduced dimensionality (

by ). The clustering and the query selection benefit from each other through the iterations, which eventually boosts the classification performance. Given the dimensionality reduction and the clustering results, the top

most informative points with their classified labels are selected based on the information theoretical measure as distance measure. Here we propose to utilize the local average minimum information theoretical distance within the clusters for selection of the most informative data instances in a diversified manner. Typically, pair-wise symmetric kullback leibler (SKL) divergences

KL1951 within the local cluster are calculated. The distances are then summed up for each sample and the top minimum distances are selected:

where

represents the discrete probability distribution for the

th sample and the th class in the classification function. represents the the total number of instances in the th cluster. The above optimization guarantees to select the the instances with the minimum distance within the local clusters. Compared to the selection of data instances with minimum distance globally, the proposed algorithm tends to diversify the top selections over different clusters representing different classes. Furthermore, we leverage the bucketing technique bucketcv

which has been successfully applied to computer vision problems to ensure that the samples from minority classes are selected. Specifically, the selected number of data instances Q needs to satisfy

. Namely, if more than samples within the same cluster are selected, the current cluster is skipped and the query goes to the next data in the ranking list, which results in better classification performance.

Noisy Label Reduction

Moreover, a novel noisy label reduction relying on an effective confidence score measure is proposed based on the criteria of best vs second best (BSVB) to enhance the active learning performance. Typically, for each selected data sample after ranking, the ratio of the largest estimated class probability to the second largest estimated class probability is calculated where this information can be retrieved from the label distribution matrix. Subsequently, the ratio is compared to the user set threshold. The selected data are added into the labeled set if the ratio is larger than the threshold as:

where represent the estimated label, and represent the class with the highest and the second highest label distribution. is the threshold which represents the confidence level of the users. Therefore, by adding the estimated labels passed from the noise reduction step into the labeled dataset, the noisy labels in the selection are significantly reduced. The new augmented labeled dataset after adding the selected data samples are applied to label spreading algorithm again to learn the next enhanced model. The full algorithm is illustrated as follows.

1:procedure Algorithm I: Robust Active Label Spreading(, )
2:     while  do is the total number of labeled data, is the total number of initial labeled data and is the number of selected labeled data in each iteration
3:         Run label spreading algorithm NIPS2004 , output label distribution matrix .
4:         Apply t-SNE on for dimensionality reduction. Conduct K-means clustering on where the number of classes is set to be the number of classes in the labeled dataset.
5:         Calculate the local minimum distance within each clusters relying on symmetric KL divergence
6:         Utilize the bucketing technique to ensure the selected samples from the minority classes.
7:         Apply BVSB criteria for nosiy label reduction. Add the selected samples and estimated labels to labeled dataset and repeat steps (3)-(7).
8:         return are the selected label indices.
9:     end while
10:end procedure
Figure 2: The comparision of the precision recall curves with the uncertainty sampling (US)USKDD2009 , USDM CIKM07 and the propose method (RALS) on ECG dataset for the minority classes including Class A (Fig.2(a)) and LB (Fig.2(b)).The comparision of the total accuracy (Fig.2(c)) and the average accuracy (Fig.2(d)) vs the labeled samples for all the six classes of ECG data.
Labeled SVMALCIKM07 USDM multiclassus Ours Labeled SVMALCIKM07 USDM multiclassus Ours
N (25) 63% 67% 71% RB (25) 66% 70% 73%
N (75) 79% 83% 85% RB (75) 71% 74% 78%
N (125) 82% 87% 89% RB (125) 74% 78% 82%
N (175) 85% 87% 90% RB(175) 78% 82% 87%
A (25) 68% 71% 73% LB (25) 68% 72% 76%
A (75) 72% 73% 76% LB (75) 72% 76% 78%
A (125) 76% 80% 82% LB (125) 76% 83% 86%
A (175) 82% 83% 85% LB (175) 78% 85% 89%
Table 1: Comparison of the accuracy among CIKM07 , USDM multiclassus and the proposed RALS algorithms for ECG dataset when using N, A, RB, LB as the positive class for binary classification respectively.

3 Experiments on Real ECG Data

The proposed algorithm is evaluated on the real ECG data from the MIT-BIH arrhythmia database MITdatabase . The considered beats include the following six classes: normal sinus rhythm (N), atrial prematuring beat (A), ventricular premature beat (V), right bundle branch block (RB), paced beat (P), and left bundle branch block (LB). The total 21344 beats were selected and preprocessed from the recordings of 20 patients as ECG including 12338 N, 344 A, 2194 V, 1982 RB, 3498 P and 988 LB beats. Subsequently, a discrimative subset of features are extracted as described in ECG

: 1) ECG morphology features and 2) three ECG temporal features. The segmented ECG cycles are normalized to the same periodic legnth. The total feature dimension is 303 and is reduced to 6 dimensions using t-SNE. In label spreading algorithm, RBF kernel is chosen to calculate the affinity matrix. The gamma parameter in the RBF kernel is 0.25. The weight parameter

is 0.2 . 100 data samples of the selected labeled beats is added in each iteration. The threshold is 100.

The RALS algorithm is competed with two state-of-the-art active learning methods: SVM based active learning (SVMAL) described in CIKM07 and USDM described in multiclassus . s CIKM07 can only be applied to binary classification, the performance for binary classification is firstly compared. Typically, the four beat classes N, A, RB and LB are selected as the positive class respectively and the rest five classes are considered as the negative class. We initialize all methods with 25 beats and in each iteration 50 beats are added in the learning stage. Table 1 shows the proposed RALS algorithm consistently outperforms CIKM07 and multiclassus under unbalanced classes with a large margin. Specifically, with a small number of labeled data (25 labeled beats), RALS achieves at least 5 % and 3 % better precision performance compared to uncertainty sampling with maximum entropy CIKM07 and USDM multiclassus respecitvley. The performance gain is mainly attributed to the fact that RALS employs a better selection approach within clusters and noisy label reduction. Therefore, it enables the selection of informative data instances in a diversified manner. Subsequenlty, we evaluate the multi-class classification performance by comparing our algorithm with uncertainty sampling (US) based on maximum entropy USKDD2009 and USDM multiclassus

. In Fig.2, we demonstrate the precision and recall curve comparison with the uncertainty sampling (US), USDM

CIKM07 and the propose method (RALS) for ECG dataset for the minority classes including Class A and Class LB. The area under the curve (AUC) for RALS is 0.736 for Class A and 0.753 for Class LB as the best method out of the three algorithms. Fig.2(c) and Fig2(d) plot the comparison of the total accuracy and the average total accuracy vs the labeled samples using the uncertainty sampling (US) USKDD2009 , USDM multiclassus

and the proposed RALS method. RALS is the top performer for both evaluation metrics again, which confirms the advantages of RALS in selecting diversified samples and reducing mislabeling risks.

4 Conclusions

The proposed RALS algorithm optimizes the selection process for labeled data by selecting the data instances with the minimum average distances in the local clusters and choosing the candidates relying on the bucketing technique. The proposed noise reduction scheme further reduces the noise labels and enhances the classification performance. We applied RALS to the real ECG signal classification and achieved superior performances over the state-of-the-art approaches.

References

  • [1] S. Hao, P. Zhao, J. Lu, S. C. H. Hoi, C. Miao, and C. Zhang. SOAL: Second-Order Online Active Learning. International Conference on Data Mining, 2016.
  • [2] Y. Guo and D. Schuurmans. Discriminative Batch Mode Active Learning. Advances in Neural Information Processing, pages 367–443, 2008.
  • [3] Y. Yang, Z. Ma, F. Nie, X. Chang, and A. G. Hauptmann. Multi-class active learning by uncertainty sampling with diversity maximization. International Journal of Computer Vision, 113(2):113–127, 2015.
  • [4] R. Gilad-bachrach, A. Navot, and N. Tishby. Query by committee made real. Advances in Neural Information Processing Systems, 2005.
  • [5] R. Sznitman and B. Jedynak.

    Active testing for face detection and localization.

    IEEE Transac- tions on Pattern Analysis and Machine Intelligence, 2010.
  • [6] X. Li and Y. Guo. Adaptive Active Learning in Image Classification.

    IEEE Conference on Computer Vision and Pattern Recognition

    , 2013.
  • [7] A. Singla, S. Tschiatschek, and A. Krause. Actively learning hemimetrics with applications to eliciting user preferences. International Conference on Machine Learning, 2016.
  • [8] L. Maystre and M. Grossglause. Just sort it ! A simple and effective approach to active preference learning . International Conference on Machine Learning, 2017.
  • [9] E. Pasolli and F. Melgani. Active Learning Methods for Electrocardiographic Signal Classification. IEEE Transactions on Information Technology on Biomedicine, 14, 2010.
  • [10] S. Ertekin, J. Huang, L. Bottou, and C.L.Giles. Learning on the Border: Active Learning in Imbalanced Data Classification. The Conference on Information and Knowledge Management, 2007.
  • [11] K. Konyushkova and R. Sznitman. Learning Active Learning from Data. Advances in Neural Information Processing, 2017.
  • [12] D. Zhou, O. Noble, T. Lal, J. Weston, and B. Scholkopf. Learning with Local and Global Consistency. Advances in Neural Information Processing, 2003.
  • [13] L. van der Maaten and G. Hinton. Visualizing data using t-SNE. Journal of Machine Learning Research, pages 2579–2605, 2008.
  • [14] S. Kullback and R.A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, pages 79–86, 1951.
  • [15] A. Choukroun and V. Charvillat. Dimensionality reduction for active learning with nearest neighbour classifier in text categorisation problems. Scandinavian Conference on Image Analysis (SCIA), 2003.
  • [16] P. Donmez, J. G.Carbonell, and J. Schneider. Efficiently Learning the Accuracy of Labeling Sources for Selective Sampling. ACM SIGKDD Conference on Knowledge Discovery and Data Mining, 2009.
  • [17] R. Mark and G. Moody. The impact of MIT-BIH Arrhythmia Database. IEEE Eng in Medcine and Biology, 20:45–50, 2001.