I Introduction
Automatically annotate arrhythmia with deep learning network has been emerging along with the boost of diverse frameworks. Back to 2015, 1D convolutional neural networks (CNN) were proposed by Kiranyaz to classify ECG beats
[1]. In the same year, a stacked denoising autoencoders (DAE) combined with supervised classification was proposed by Rahhal for active learning the beat annotation
[2]. In past 2017, Ng and his group trained a CNN classifier as deep as 34 layers to predict 14 outputs [3]. All these studies claimed a better performance compared with handcrafted feature, and even better than cardiologists annotation.This study targets on the same problem, in which a brandnew deep network is proposed. Data introduced in section IIA is partitioned into two parts for training and testing separately, preprocessed by typical filtering, transferred into 2D spectrum, and fed directly into this network. The network is trained by two steps, firstly feature vector in each time spot is summarized into latent representation by unsupervised learning, which acting as a warm initialization for further classifier training; secondly a classifier is retrieved by supervised learning from these latent representation. In the last part of this study, performance on test set is reported, and a comparative experiment is introduced to prove the pros of latent representation.
Ii Method
Iia Data
ECG records are mainly from MIT Physionet database [4], including malignant ventricular arrhythmia database (VFDB) [5] and inbuilt normal sinus rhythm database.
Type  Definition 

Normal sinus  normal heart rhythm 
Asys  no rate at least 4s 
Tachy  rate 140bpm for 17 beats 
VF/VFL  Fwave 
VT  rate in 100250bpm, QRS span larger than 0.1s 
The rhythm of interest includes asystole (Asys), supraventricular tachycardia (Tachy), ventricular flutter or fibrillation (VF/VFL) and ventricular tachycardia (VT); the corresponding definition can be found in table I.
IiB Preprocessing
A proper chunk of limb lead signal is firstly selected (e.g. 13s) and resampled (e.g 200Hz); the chunk span should be set based on the definition of arrhythmia in section IIA, and enclose segmentation around annotation label indicating rhythm change. Then a high passed FIR filter is employed to remove baseline wandering (e.g. 24dB at 0.05Hz); Finally 2D spectrum is computed based on welch’s method (e.g. 1024 points of fast Fourier transform operated on 91% overlapped moving window with span of 60 samples).
IiC Network to learn the category
To annotate preprocessed signal, a network in Fig.1
is proposed. This novel net structure, to our best knowledge, is never found in relative literature before. Therefore it requires an endtoend training instead of transfer learning for most of image AI task. The whole network includes two parts, which need to be trained separately. The first one is a deep representation net (1425 floats), trained by cost function Eq.
1.(1) 
This part gets a feature vector (20 floats) from frequency vector (60 frequency bins) by residual unit (ResUnit)[6]
and pooling, then tries to find a concise and robust representation of feature, drawn from Gaussian distribution in dense space (dimension = 8). The projection to dense space is retrieved by a variational autoencoder (VAE)
[7], namely a pair of encoderdecoder highlighted in gray box in Fig.1.Back to mathematics in Eq.1, encoder is denoted by and decoder is represented as , similar to the definition in Kingma et al’s work[7]. Therefore the first term in Eq.1 is normally referred as reconstruction loss, for a latent representation is drawn and used to reconstruct original input . The second term in Eq.1
is KullbackLeibler divergence when approximate the posterior
with a family of distributions , and it acts as a regularizer penalty to maintain latent distribution into sufficiently diverse clusters[7]. Therefore this term is normally referred as latent loss.The second portion, denoted by pink box in Fig.1, is a deep classification net (11790 floats), trained by cross entropy loss function Eq.2.
(2) 
where and is predicted and true label respectively, M is total category of labels, is indicator function and is predicted distribution. This part is compounded by a multilayer bidirectional RNN structure along with multiple dense layers. The attention window, introduced by Bahdanau et al [8], is polynomial lumped on RNN cell outputs except last layer, namely the span of attention window is increasing from bottom to top layer. Therefore, the model is learning from details to overall picture. This portion reuses the output from first portion, and gives a prediction on each sample.
In most cases, VAE serves as a traditional generative modeling. In our case, it’s used to generate a robust latent representation and decrease noise in each sample, since as denoted in Kingma et al’s study [7], VAE brings a constrain on the distribution of each latent cluster, and learns the distribution instead of a deterministic representation over training set. In later section a comparative test is presented to prove the effectiveness of VAE representation in this study.
Fig.1
shows a ’base’ network configuration, including : kernel and stride size of filtering marked on both sides of operator arrow; number of residual units (=1); dimension of latent representation (=8); hidden unit size in RNN cell (=10); number of layers in bidrectional RNN network (=4) and smallest attention window size lump on RNN output (=3). Later on, this ’base’ setup will be modified in the very same comparative test.
IiD Method to train the network
The original dataset after preprocessing, namely gray images indicating 2D spectrum of signal are randomly divided into several subsets (e.g. 5fold) in each arrhythmia type. Network are tested on a certain subset after aggregating those samples from different arrhythmia types. On the other hand the remaining data are organized into mini batch and fed directly into model for training purpose, after over sampling in underrepresented arrhythmia categories. Such design of data preparation are specific for imbalanced set in this study, namely the big difference of total samples falling in each arrhythmia categories. The mini batch size is equal to 140 in this study.
As mentioned in section IIC, the whole net is trained separately. The training of first part is fulfilled after a certain threshold of cost is met, then the first part is fixed and model start to be trained on second part. In VAE, the cost function is combined by two mathematics terms as in Eq.1. From these two parts, a fraction can be calculated to represent the portion of latent loss in total loss in each iteration. This fraction is coupled into the model training procedure in this study to dominate sampling in latent Gaussian distribution. In the very beginning, the reconstruction loss takes a large portion in total loss, therefore the fraction is small and latent variable
is drawn from normal distribution
during sampling instead of. By this manner, model convergences faster to a potential cluster center in several epochs and total loss decreases rapidly. Later on, latent loss gradually takes a dominant portion, and regularizer effect starts working.
IiE Method to deploy this model
The trained model is encapsulated into docker image, and deployed to either edge computer server or virtual private cloud (VPC). During the initialization of docker container in edge server, local working directory can be mounted, and data can be fed into model by local file I/O. As for the case in VPC, web RESTful API can be provided by flask or likely python component as major data I/O.
Iii Experiment and result
, its 2D spectrum and latent representation. The dimension of latent space equal to 8 in this setup. In the latent representation, each line represents an coordinate of latent space. Its amplitude gives the mean, and its width gives the standard deviation of a Gaussian distribution. Thus the whole picture gives the propagation of latent representation on time.
Type  Asys  Tachy  VF/VFL  VT 

# PP/Tot  48/52  19/20  398/444  335/372 
Sensitivity  0.92  0.95  0.89  0.90 
Precision  0.74  0.70  0.96  0.90 
Type  Asys  Tachy  VF/VFL  VT 

# PP/Tot  46/52  16/20  399/444  345/372 
Sensitivity  0.88  0.80  0.89  0.93 
Precision  0.82  0.72  0.94  0.89 
The model is trained and tested in 5fold cross validation. Fig.2 shows the intermediate plotting on a VT sample and a Tachy sample, including: waveform before and after prefiltering, relative 2D spectrum and relative latent representation. Table II and III give the performance rates on test sets using a socalled ’base’ setup in section IIC. From these tables, it’s proven that proposed network has promising sensitivity in detection these arrhythmias, and good precision in two ventricular arrhythmias.
Moreover, a comparative experiment is conducted, and accuracy of model on test set is recorded for comparative plotting. During this experiment, different setup on network parameter, e.g. the number of hidden units in RNN cell or span of attention window are tested; in addition to these superparameters, two methods to get latent representation are compared too : using VAE to learn a statistic distribution or retrieving a deterministic latent variable from a simple dense projection. The results can be found in Fig.3. From this figure, it’s proven that to learn a latent distribution by VAE can significantly boost the speed of convergence and accuracy. Meanwhile, other configuration in setup has no significant difference on the model performance.
Iv Conclusion
In this study, a brandnew deep network structure is proposed to annotate arrhythmia in ECG signal. This network includes two parts, one for extracting latent representation from feature vector in each time spot, and the other for predicting arrhythmia categories of interest. This network achieve around 90% sensitivity in asystole, ventricular flutter/fibrillation or ventricular tachycardia, and over 80% sensitivity in all 4 arrhythmias including supraventricular tachycardia. Moreover the test results show good precision rates in two ventricular arrhythmias. This network uses VAE to extract latent representation, which works as a warm initialization for classifier. This method is proven to significantly improve the speed of convergence and accuracy.
References
 [1] S. Kiranyaz, T. Ince, and M. Gabbouj, “Realtime patientspecific ecg classification by 1d convolutional neural networks,” IEEE Transactions on Biomedical Engineering, vol. 63, no. 3, pp. 664–675, 2016.
 [2] M. M. Al Rahhal, Y. Bazi, H. AlHichri, N. Alajlan, F. Melgani, and R. R. Yager, “Deep learning approach for active classification of electrocardiogram signals,” Information Sciences, vol. 345, pp. 340–354, 2016.
 [3] P. Rajpurkar, A. Y. Hannun, M. Haghpanahi, C. Bourn, and A. Y. Ng, “Cardiologistlevel arrhythmia detection with convolutional neural networks,” arXiv preprint arXiv:1707.01836, 2017.
 [4] A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.K. Peng, and H. E. Stanley, “Physiobank, physiotoolkit, and physionet,” Circulation, vol. 101, no. 23, pp. e215–e220, 2000.
 [5] S. D. Greenwald, “The development and analysis of a ventricular fibrillation detector,” Ph.D. dissertation, Massachusetts Institute of Technology, 1986.

[6]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in
Proceedings of the IEEE conference on computer vision and pattern recognition
, 2016, pp. 770–778.  [7] D. P. Kingma and M. Welling, “Autoencoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.
 [8] D. Bahdanau, K. Cho, and Y. Bengio, “Neural machine translation by jointly learning to align and translate,” arXiv preprint arXiv:1409.0473, 2014.
Comments
There are no comments yet.