Log In Sign Up

MINA: Multilevel Knowledge-Guided Attention for Modeling Electrocardiography Signals

Electrocardiography (ECG) signals are commonly used to diagnose various cardiac abnormalities. Recently, deep learning models showed initial success on modeling ECG data, however they are mostly black-box, thus lack interpretability needed for clinical usage. In this work, we propose MultIlevel kNowledge-guided Attention networks (MINA) that predict heart diseases from ECG signals with intuitive explanation aligned with medical knowledge. By extracting multilevel (beat-, rhythm- and frequency-level) domain knowledge features separately, MINA combines the medical knowledge and ECG data via a multilevel attention model, making the learned models highly interpretable. Our experiments showed MINA achieved PR-AUC 0.9436 (outperforming the best baseline by 5.51 performance and strong interpretability against signal distortion and noise contamination.


Interpretable ECG classification via a query-based latent space traversal (qLST)

Electrocardiography (ECG) is an effective and non-invasive diagnostic to...

Method of diagnosing heart disease based on deep learning ECG signal

The traditional method of diagnosing heart disease on ECG signal is arti...

Interpretable Deep Learning for Automatic Diagnosis of 12-lead Electrocardiogram

Electrocardiogram (ECG) is a widely used reliable, non-invasive approach...

Nonlinear and statistical analysis of ECG signals from Arrhythmia affected cardiac system through the EMD process

The human heart is a complex system exhibiting stochastic nature, as ref...

SIM-ECG: A Signal Importance Mask-driven ECGClassification System

Heart disease is the number one killer, and ECGs can assist in the early...

Generating an Explainable ECG Beat Space With Variational Auto-Encoders

Electrocardiogram signals are omnipresent in medicine. A vital aspect in...

HAN-ECG: An Interpretable Atrial Fibrillation Detection Model Using Hierarchical Attention Networks

Atrial fibrillation (AF) is one of the most prevalent cardiac arrhythmia...

1 Introduction

Heart diseases are among the leading causes of death of the world [benjamin2018heart]. The routine monitoring of physiological signals is deemed important in heart disease prevention. Among existing monitoring technologies, electrocardiography (ECG) is a commonly used non-invasive and convenient diagnostic tool that records physiological activities of heart over a period of time. Deciphering ECG signals can help detect many heart diseases such as atrial fibrillation (AF), myocardial infarction (MI), and heart failure (HF) [jama_af, yanowitz2012introduction].

Figure 1: Normal ECG signal and Abnormal ECG signal show different patterns across different levels.

An example of real world ECG signal is shown in Fig.1. ECG signals from cases and controls of heart diseases show different patterns at 1) beat level, 2) rhythm level, and 3) frequency level, each representing different anomalous activities of the heart. For example, beat level morphology such as P wave (atrial depolarization) and QRS complex (ventricular depolarization) can reflect conditions related to heart electric conduction. Rhythm level patterns capture rhythm features across beats and reflect cardiac arrhythmia conditions (abnormal heart rhythms). Frequency level is about frequency variations and sheds light on the diagnosis of ventricular flutter and ventricular fibrillation. Learning these patterns to support diagnoses has been an important research area in ECG analysis [roopa2017survey, expert_1, expert_3, tateno2001automatic].

In real clinical settings, in addition to the demand of an accurate classification, the interpretability of the results is equally important  [tsai2003computer]. Cardiologists need to provide both diagnosis and detailed explanations to support diagnosis [std]. Also, many heart diseases do not pose abnormal ECG diagram constantly [benjamin2018heart, yanowitz2012introduction], especially during the early stage of the diseases. Therefore, interpretability of the results, particularly highlighting diagnosis related parts of the data, is crucial for early diagnosis and better clinical decisions.

Traditional machine learning methods either learn time domain patterns including beat level 

[ladavich2015rate, purerfellner2014p] and rhythm level [huang2011novel], or extract frequency patterns using signal processing techniques such as discrete wavelet transform  [garcia2016application]. However, time domain approaches are easily affected by noise or signal distortion [RODRIGUEZ2015261]; while frequency domain methods cannot model rare events or some temporal dynamics that occur in time domain. Besides, they all require laborious feature engineering, and their performance also relies on the quality of the constructed features.

Recently, deep learning models showed initial success in modeling ECG data. Convolutional neural networks (CNN) were used to learn beat level patterns 

[tbe, 2017arXiv170701836R, hannun2019cardiologist]

. Recurrent neural networks (RNN) are suitable for capturing rhythm features 

[schwab2017beat, hong2017encase, zihlmann2017convolutional]. Moreover, attention mechanism is employed to extract interpretable rhythm features  [schwab2017beat]. Despite their progress, these models were either black-box or only highlighted one aspect of patterns (such as rhythm features as in [schwab2017beat]), thus lack the comprehensive interpretability of the results for real clinical usage.

In this work, we propose MultIlevel kNowledge-guide Attention model (MINA) to learn and integrate different levels of features from ECG which are aligned with clinical knowledge. For each level MINA extracts level-specific domain knowledge features and uses them to guide the attention, including beat morphology knowledge that guides attentive CNN and rhythm knowledge that guides attentive RNN. MINA also performs attention fusion across time- and frequency domains. We proposed new evaluation approaches by interfering ECG signals with noise and signal distortion. We evaluated interpretability and robustness of the model by tracking intermediate reactions across layers from multilevel attentions to the final predictions.

Experimental results show MINA can correctly identify critical beat location, significant rhythm variation, important frequency component and remain robust in prediction under signal distortion or noise contamination. Tested on the atrial fibrillation prediction, MINA achieved PR-AUC (outperforming the best baseline by ). Finally, MINA also showed strong result interpretability and more robust performance than baselines.

2 Related Work

Traditional methods include time domain methods such as beat level methods [ladavich2015rate, purerfellner2014p] and rhythm level ones  [tateno2001automatic, huang2011novel, oster2015impact], both depending on segmentation by detecting QRS complex. However, time domain methods rely on the accuracy of QRS detection, thus are easily affected by noise or signal distortion. Frequency domain approaches, on the other hand, cannot model rare events and other time-domain patterns and thus lack interpretability. Moreover, both types of features are subjective.

Recently, deep neural networks (DNNs) have been used in ECG diagnosis [tbe, 2017arXiv170701836R, hannun2019cardiologist, zihlmann2017convolutional, hong2017encase, schwab2017beat]. Many of them have demonstrated state-of-the-art performance due to their ability in extracting effective features [2017arXiv170701836R, hong2017encase]

. Some of them build an end-to-end classifier  

[tbe, 2017arXiv170701836R, zihlmann2017convolutional], others build a mixture model which combines traditional feature engineering methods and deep models  [hong2017encase, schwab2017beat]. However, existing deep models are insufficient in three aspects. First, they neglect the characteristics of ECG signals when design model architecture, namely, beat morphological, rhythm variations. Second, they only analyze ECG signals in time domain. Last, they are “black-box” and thus not interpretable. In real world medical applications, interpretability is critical for clinicians to accept machine recommendations and implement intervention.

3 Method

In this section, we will introduce the model design of MINA. Section 3.1 provides an overview and introduces all notations. Section 3.2 describes the basic framework, including each layer of MINA. Section 3.3 proposes our new attention mechanism which is integrated in MINA. Section 3.4 describes how we evaluate interpretability and robustness. Fig.2 depicts the architecture of MINA.

3.1 Overview of Mina

Here we briefly describe the framework and introduce notations used throughout this paper. Assume we are given a single lead ECG signal

and use it to predict class probability. We firstly transform it into multi-channel signals with

channels across different frequency bands where th signal is denoted as . We then split each into segments . Next we apply CNN and RNN consecutively on to obtain beat level attention and rhythm level attention . This follows by a fully connected layer that transforms into . We then take weighted average to integrate across all channels to output frequency attention , which will be used in prediction. To improve model accuracy and interpretability, we propose a knowledge guided attention

to learn attention vectors from beat-, rhythm-, and frequency levels, denoted as

, , and respectively. More details will be described in Section 3.2. The notations are summarized in Table 1. Detailed configurations of MINA are introduced in the Implementation Details section.

Figure 2: MINA takes raw ECG signals as input and outputs probabilities of disease onset. MINA used knowledge-guided attention to learn informative beat-, rhythm-, and frequency level patterns, and then performs attentive signal fusion for improved prediction.
Notation Definition
,,, # of classes, # of frequency channels, # of segments, segment length
, , Original ECG signal, signals after transformation, th signal
, , Segment of ECG with length , segments (), th segment
, CNN layer output, th column in , th segment output
Output of beat level attention
, Output of beat level attention of segments, th segment output
, Bi-LSTM layer output, th column in
Output of rhythm level attention
, Output of rhythm level attention of channels, th channel output

Weight matrix and bias vector in fully connected layer

, Fully connected layer output, th column in
Output of freq. level attention
Weight matrix and bias vector in prediction layer
, , ; , , Predicted probability, class weight, one-hot label; th value in each vector
, , Beat level attention weights, th value in , segment attention
, Rhythm level attention weights, th value in
, Frequency level attention weights, th value in
, , Beat-, rhythm-, and frequency level knowledge feature
, Beat-, rhythm- level st layer attention weights
Frequency level st layer attention weights
, , Beat-, rhythm-, and frequency level st layer attention biases
, , Beat-, rhythm-, and frequency level nd layer attention weights

Function of standard deviation; function of power spectral density

Interfered signals, attention weights and predictions
Table 1: Notations for MINA

3.2 Description of Mina

Signal Transformation and Segmentation In order to utilize the frequency-domain information, we employ an efficient strategy by decomposing original ECG signals into different frequency bands (where each band is regarded as a channel). Then we can concurrently model signals of each channel.

Specifically, we propose a new time-frequency transformation layer to transform a single lead ECG signal into multi-channel ones. Here we use Finite Impulse Response (FIR) bandpass filter [Oppenheim:1996:SAS:248702] to transform single lead ECG signal into multi-channel ECG signals .

Then for each channel, we split into a sequence of equal length segments. Unlike previous deep models  [schwab2017beat, tbe] that perform segmentation using QRS complex detection, which is easily affected by signal quality, we simply use sliding window segmentation. By cutting each of th segment is indexed by and , we receive equal length segments (without the loss of generality, we assume that , otherwise we can cut off last remain part which is shorter than ). In general, segment length needs to be shorter than the length of one heart beat, so that we can extract patterns in beat level. Detailed configurations can be found in Implementation Details section.

Beat Level Attentive Convolutional Layer For beat level patterns, we mainly consider the abnormal wave shapes or edges. To locate them from signals, we design an attentive convolutional layer. Formally, given segments , we perform 1-D convolution on each of them and output convolved features: , , is the number of filters,

is the output length of segments after convolution, which is determined by hyperparameters like stride of CNN.

operations are shared weights of segments. Then instead of traditional global average pooling which treats all features homogeneously, we propose a knowledge-guided attention to aggregate these features and get beat level attention , where represents the weight for convolved features, is the th column in , . Thus the model can focus more on significant signal locations and have better beat level interpretation. Details of knowledge-guided attention will be introduced in Section 3.3.

Rhythm Level Attentive Recurrent Layer For rhythm level patterns, we mainly consider the abnormal rhythm variation. To capture them from beat sequences, RNNs are a natural choice due to their abilities to learn on data with temporal dependencies. Again to improve interpretability and accuracies, we use knowledge guided attention with rhythm knowledge.

Specifically, we use a bidirectional Long Short-Term Memory network 

[schuster1997bidirectional] (Bi-LSTM) to get rhythm level annotations of segments. The bidirectional LSTM is denoted here as . We concatenate the forward and backward outputs of Bi-LSTM and receive the rhythm level feature , , . Here we use knowledge-guided attention with rhythm knowledge to output the rhythm level attention , where represents the weight of th rhythm level hidden state .

Fusion and Prediction At the beginning we decompose ECG signals into multiples channels (i.e., frequency bands) and learn rhythm level features from each channel . Now we will perform attention fusion across all channels to have a more comprehensive view about the signal.

We first perform fully connected transformation: , where , , and . means broadcasting to all column vectors in and applies addition. Then, since the importance of these channels may not be homogeneous, we will take weighted average of to calculate frequency level attention where is the weight of , , . We use frequency knowledge, signals with greater energy are more informative, to determine the weight . Here we use power spectral density to measure energy.

Last, given integrated features we make prediction using , where , and optimize the weighted cross entropy loss , where is the number of classes, is the ground truth , is the weight vector with the same shape as , is the indication function. is adjusted to handle with class imbalance problem which is common in medical area.

3.3 Knowledge Guided Attention of Mina

We now describe how to compute multilevel attention weights . The attention mechanism can be regarded as a two-layer neural network: the st fully connected layer calculates the scores for computing weights; the nd fully connected layer computes the weights with via softmax activation.

In the first layer, the scores are computed based on the following features. (1) Multilevel outputs , , extracted by MINA. (2) Domain knowledge features including beat level , rhythm level , and frequency level . Concretely, three levels of domain knowledge features can be represented as below.

  • [leftmargin=5mm]

  • Beat Level : For beat level knowledge we mainly consider the abnormal wave shapes or sharply changed points such as QRS complex  [kashani2005significance]. To represent it we compute first-order difference and a convolutional operation on each segment to extract the beat level knowledge feature ), and , is the th value in . Detailed configurations of are introduced in Implementation Details section.

  • Rhythm Level : Attention weights focus on rhythm level variation, such as severe fluctuation in ventricular fibrillation disease  [yanowitz2012introduction]. To characterize it we compute standard deviation on each segment in to extract the rhythm level knowledge feature vector , where calculate standard deviation of each in ,

  • Frequency Level : On frequency level, signals with greater energy contain more information and thus need more attention [yanowitz2012introduction]. So we use power spectral density (PSD), a popular measure of energy, to extract the frequency level knowledge feature vector , where calculate PSD [Oppenheim:1996:SAS:248702] using a periodogram of each in .

Then, we concatenate model outputs and knowledge features to compute scores and attention weights.

where, represent weights and biases in the first layer, represent weights in the second layer. is addition with broadcasting.

3.4 Method for Evaluating Interpretability and Robustness

To evaluate the interpretability and robustness of MINA, we perturb the signals and observe attention weights and prediction results. The evaluation method is illustrated in Fig. 3.

Concretely, we add signal distortion (low frequency interferer) or noise (high frequency interferer) to the original ECG signal and get

, here we choose baseline signal distortion and white noise. For the perturbed signals

, we applied MINA to generate prediction and output multilevel attention weights . We compare them with the original results and from unperturbed data.

To evaluate the interpretability of MINA, we visually check whether attention weights are in line with medical evidences. For beat level attention weights of segments ] and ] , we align them to input ECG signals , where the th attention weight approximately corresponds from to . Then we visualize the values and verify whether high relates to beat level medical evidence. For rhythm level attention weight and , we align them to segments , where corresponds to . Then we verify whether high relates to rhythm level medical evidence. For frequency level attention weight and , we align them to channels , where corresponds to . Likewise, we check whether high relates to frequency level medical evidence.

We evaluate the robustness of MINA based on the two tasks: (1) we visually compare whether the new attention weights after perturbation are still in line with medical evidences, using the same way above, (2) we gradually change the interfered amplitude and evaluate the overall performance changes. The more robust model will be less impacted. Moreover, these results can also be used to evaluate interpretability, since interpretable model can highlight meaningful information, while also suppress unrelated parts.

Figure 3: Analysis of multi-level attention change (Orange) and final prediction change (Blue).

4 Experiments

In this section, we first describe the dataset used for the experiments, followed by the description of the baseline models. Then we discuss the model performance.

4.1 Source of Data

We conducted all experiments using real world ECG data from PhysioNet Challenge 2017 databases  [clifford2017af]. The dataset contains 8,528 de-identified ECG recordings lasting from 9s to just over 60s and sampled at 300Hz by the AliveCor device, 738 from AF patients and 7790 from controls as predefined by the challenge. We first divided the data into a training set (75%), a validation set (10%) and a test set (15%) to train and evaluate in all tasks. Then, we preprocess them to get equal length data, where . The summary statistics of the data is described in Table 2. In this study, the objective is to discriminate records of AF patients from those of controls.

Type # recording # of points
Mean StDev Max Median Min
AF 738 9631 3703 18062 9000 2996
non-AF 7790 9760 3222 18286 9000 2714
Table 2: Data profile of PhysioNet Challenge 2017 dataset

4.2 Baseline Models

We will compare MINA with the following models: 1. Expert: A combination of extracted features used in AF diagnosis including: rhythm features like sample entropy on QRS interval [expert_1]

; cumulative distribution functions

[tateno2001automatic]; thresholding on the median absolute deviation (MAD) of RR intervals [expert_3]; heart rate variability in Poincare plot [park2009atrial]; morphological features like location, area, duration, interval, amplitude and slope of related P wave, QRS complex, ST segment and T wave; frequency features like frequency band power. We used QRS segmentation method in  [pan1985real]

and trained an LR classifier using these features. Then, we build both logistic regression (


) and random forest (

ExpertRF) on above extracted features. 2. CNN

: Convolutional layers are performed on ECG segments with shared weights. We use global average pooling to combine features, and fully connect (FC) layer and softmax layer for prediction. The model architecture is modified based on  

[tbe] to handle ECG segments. The hyper-parameters in CNN, FC and softmax are the same as MINA to match the model complexity. 3. CRNN: We used shared weights convolutional layers on ECG segments, and replaced the global average pooling with bi-directional LSTM. Then FC and softmax are applied to the top hidden layer. The architecture is modified based on  [zihlmann2017convolutional], but only keep one convolutional layer. Other hyper-parameters in CNN, RNN, FC and softmax are the same as MINA. 4. ACRNN: Based on CRNN, with additional beat level attentions and rhythm level attentions. Other hyper-parameters are the same as MINA.

4.3 Implementation Details

In convolutional layers of CNN, CRNN, ACRNN and MINA, we use one layer for each model. The number of filters is set to 64, the filter size is set to 32 and strider is set to 2. Pooling is replaced by attention mechanism. of has one filter with size set to 32, the strider is also 2. In recurrent layers of CRNN, ACRNN and MINA, we also use one single layer for each model, the number of hidden units in each LSTM is set to 32. The dropout rate in the fully connected prediction layer is set to 0.5. In sliding window segmentation, we use non-overlapping stride with

. Deep models are trained with the mini-batch of 128 samples for 50 iterations, which was a sufficient number of iterations for achieving the best performance for the classification task. The final model was selected using early stopping criteria on validation set. We then tested each model for 5 times using different random seeds, and report their mean values with standard deviation. All models were implemented in PyTorch version 0.3.1, and trained with a system equipped with 64GB RAM, 12 Intel Core i7-6850K 3.60GHz CPUs and Nvidia GeForce GTX 1080. All models were optimized using Adam

[adam], with the learning rate set to 0.003. Our code is publicly available at

4.4 Performance Comparison

Performance was measured by the Area under the Receiver Operating Characteristic (ROC-AUC), Area under the Precision-Recall Curve (PR-AUC) and the F1 score. The PR-AUC is considered a better measure for imbalanced data like ours [davis2006relationship]. Table 3 shows MINA outperforms all baselines, and shows higher PR-AUC than the second best models.

ExpertLR 0.9350 0.0000 0.8730 0.0000 0.8023 0.0000
ExpertRF 0.9394 0.0000 0.8816 0.0000 0.8180 0.0000
CNN 0.8711 0.0036 0.8669 0.0068 0.7914 0.0090
CRNN 0.9040 0.0115 0.8943 0.0111 0.8262 0.0215
ACRNN 0.9072 0.0047 0.8935 0.0087 0.8248 0.0229
MINA 0.9488 0.0081 0.9436 0.0082 0.8342 0.0352
Table 3: Performance Comparison on AF Prediction

5 Interpretability and Robustness Analysis

5.1 Mina Automatically Extracts Clinically Meaningful Patterns

Figure 4: From ECG signal of AF patient (left top), MINA learns beat level attention which points to the position of significant QRS complexes and abnormal P waves. Rhythm level attention shows the abnormal RR interval. The frequency channel that receives highest attention correspond to the frequency bands where QRS complex is dominant.

When reading an ECG record (upper left in Fig. 4), cardiologists will make AF diagnosis based on following clinical evidences: 1) the absence of P wave: a small upward wave before QRS complex; 2) the irregular RR interval such as the much wider one between the th and the th QRS complex.

MINA learns these patterns automatically via beat-, rhythm-, and frequency level attention weights. From Fig. 4, the beat level attentions point to where QRS complex or absent P waves occur. The rhythm level attentions indicate the location of abnormal RR interval, which precisely matches the clinical evidence. Besides, from the frequency level attentions, we notice channel 10Hz-50Hz receives the highest attention weight so MINA pays more attention to it. In fact, QRS complex, the most significant clinical evidence in ECG diagnosis, is known to be dominant in 10Hz-50Hz [tateno2001automatic, expert_3, expert_1].

5.2 Mina Remains Interpretable and Robust Against Baseline Signal Distortion

The baseline wander distortion is a low frequency noise with slow but large changes of the signal offset. It is a common issue that drops ECG analysis performance. In this experiment, we mimic the real world setting by distorting data and observe whether MINA can still make robust and interpretable predictions.

Figure 5: (a) Signal in Fig.4 interfered by baseline wander distortion. (b) Channel 1 (low attention weights) shows no significant patterns. (c) Channel 2 (higher attention weights) remains meaningful patterns similar to original data. (d) MINA has much lower PR-AUC drop than baselines.

For the experiment we interfered the signal in Fig. 4 with baseline wander distortion. The interfered signal is plotted in Fig. 5(a). From the original frequency attention in Fig. 4, it is easy to see Channel 1 (0.5Hz) has the lowest weights, while Channel 2 (0.5Hz-50Hz) weights much higher. Thus Channel 1 can be interpreted as baseline component while Channel 2 as clean signal component. MINA pays more attention to Channel 2 than Channel 1. After signal distortion, the importance of both channels remain the same, which is also reflected from their beat level and rhythm level attentions. Channel 1 shows no significant patterns, but the more informative Channel 2 have similar beat- and rhythm level patterns as unperturbed data, which indicates the interpretability of MINA will be less impacted by data distortion.

To evaluate model robustness, we compare the performance change along the increase of distortion amplitude on the entire test set. As shown in Fig.5(d), MINA still has much lower performance drop even after distortion by large amplitude. While all baselines start to have large performance drop even with little distortion. This is mainly thanks to frequency attention fusion. In training process, the model already identified Channel 1 a baseline signal. Thus baseline distortion will have less impact on important signals in clean signal channel. Since baseline signal distortion occurs in real clinical setting, MINA will provide more accurate prediction in these scenarios.

5.3 Mina Remains Interpretable and Robust in the Presence of Noise

The high frequency noise contamination is another common issues. For this experiment, we perturbed the signal in Fig. 4 with white noise. The perturbed signal is in Fig. 6(a). Similar to last experiment, from original frequency attentions we know Channel 3 (50Hz) has lower weights. It is a channel known for high noise. While Channel 2 (0.5Hz-50Hz) weights much higher and is known as a clean signal channel.

Figure 6: (a) Signal in Fig.4 perturbed by noise. (b) Channel 3 (lower attention weights) shows no significant patterns. (c) Channel 2 (higher attention weights) remains meaningful patterns similar to original data. (d) MINA has much lower PR-AUC drop than baselines.

After noise contamination, the noise impacts more to the noise Channel which is less important in the prediction of MINA, but the more informative Channel 2 have similar beat- and rhythm level patterns as unperturbed data, which indicates the interpretability of MINA will be less impacted by noise contamination. In Fig. 6(d), we compare the PR-AUC change along the increase of noise amplitude on the entire test set. MINA is less impacted by noise than other methods, demonstrating more robust performance in the presence of noise thanks to frequency attention fusion.

6 Conclusion and Future Work

In this paper, we propose MINA, a deep multilevel knowledge-guided attention networks that interpretatively predict heart disease from ECG signals. MINA outperformed baselines in heart disease prediction task. Experimental results also showed robustness and strong interpretability against signal distortion and noise contamination. In future, we can extend to a broad range of disease where ECG signals can be treated as additional information in the diagnosis, on top of other health data such as electronic health records. Then we will need to investigate interpretable prediction based on multimodal data, which is a possibly rewarding avenue of future research.


This work was supported by the National Science Foundation, award IIS-1418511, CCF-1533768 and IIS-1838042, the National Institute of Health award 1R01MD011682-01 and R56HL138415. We also thanks valuable discussions with Li Jiang from BOE.


Appendix A Background of Electrocardiography (ECG)

The Electrocardiography (ECG) is a test that measures the electrical activity of the heartbeat. With each beat, an electrical impulse travels through the heart and causes the muscle to squeeze and pump blood from the heart. Then ECG signals will record the timing of the top and lower chambers.

A normal heart beat in ECG is shown in Fig.7. Usually a “P wave” which is characterized by the right and left atria or upper chambers will arrive first, following by a flat line indicating when electrical impulse goes to the bottom chambers. Then next wave called ventricular depolarization (QRS complex) arrive. The next wave is called ventricular repolarization (ST segment, T wave), which represents electrical recovery or return to a resting state for the ventricles. Together we also have “U wave” that represents papillary muscle repolarization.

Figure 7: A normal heart beat.

ECG signals offer two types of information: 1) the time intervals measures how long the electrical wave needs to pass through the heart: normal or slow, fast or irregular; 2) the amount of electrical activity passing through the heart shows whether the size of parts of the heart become abnormal.

The time domain features for heart disease diagnosis include beat level and rhythm level.

  • [leftmargin=5mm]

  • In beat level, an unusual p-wave may indicate disease such as atrial fibrillation (AF), ectopic atrial pacemaker, atrial enlargement et al. An unusual QRS complex may indicate disease such as left/right bundle branch block and ventricular tachycardia. An unusual ST segment and T wave may indicate myocardial infarction, ischemia, and left ventricular hypertrophy.

  • In rhythm level, the analysis is usually based on intervals between QRS complexes, which is called RR interval. Long RR interval may indicate sinus bradycardia, short RR interval may indicate sinus tachycardia or ventricular tachycardia, while irregular RR interval may indicate AF. However, many disease such as AF poses patterns in both beat level and rhythm level, so it is beneficial to combine them together for analysis.

Appendix B Frequency Band for ECG Signals

The ECG signal is a mixture of heart muscle’s electrophysiologic activities including atrial, ventricular, papillary muscle and myocardium. Besides, it may also contain other electrical components from muscle, skin, respiration, body moving etc. The frequency bands listed below are commonly considered dominant components in ECG signal:

  • [leftmargin=5mm]

  • < 0.5 Hz: very low frequency component, mainly represent heart unrelated wandering.

  • 0.12 Hz - 0.5 Hz: respiration.

  • 0.5 Hz - 50 Hz: P wave, QRS complex and T wave.

  • 0.67 Hz - 5 Hz: P wave.

  • 1 Hz - 7 Hz: T wave.

  • 5 Hz - 50 Hz: muscle.

  • 10 Hz - 50 Hz: QRS complex is the most dominate component.

  • > 50 Hz: high frequency noise.

  • All: raw signal.

Notice that these frequency bands are approximate, since they are hard to be divided entirely. Besides, their significance may also vary among people. However, it is beneficial to combine frequency domain features and time domain features together for disease diagnosis, since the transformation of frequency bands will divide time domain ECG signals into subspaces, thus helps classification tasks.

The illustration of frequency transformation is shown in Fig.8.

Figure 8: Finite impulse response bandpass filter in frequency transformation layer.

Appendix C Interferer Simulation Details

We simulate baseline wander distortion signal using sine function and noise contamination signal using random normal distribution. Concretely, when interfere length

signal :


where is amplitude of interfere, represents elementwise addition.

Appendix D More Interpretability Evaluation Examples

Figure 9: More examples of interpretability evaluations