1 Introduction
Atrial fibrillation (AF) is the most common cardiac arrhythmia, and its prevalence is around 1–2% worldwide developed2010guidelines . It is also estimated that by 2030 only in European Union 14–17 million patients suffer from AF zoni2014epidemiology . AF is associated with an increased risk of having stroke (5fold), blood clots, heart failure, coronary artery disease, or death (2fold; death rates are doubled by AF) developed2010guidelines . Therefore, developing automatic algorithms for early detection of AF is crucial.
During AF atrial muscle fibers have chaotic electrical activity which may emit impulses with 500 bpm rate to atrioventricular (AV) node, from which impulses pass randomly. This results to an irregular ventricular response which is one of the main characteristics of AF thaler2017only . In addition, AF has the following characteristics on electrocardiogram (ECG): 1) “absolutely” irregular RR intervals; 2) the absence of P waves; and 3) variable atrial cycle length (when visible).
The analysis of ECG is the most common approach to AF detection, and during the past ten years, various algorithms have been developed for automatic AF detection bruser2013automatic ; yaghouby2010towards ; asgari2015automatic ; mohebbi2008detection ; zabihi2017detection ; ANNAVARAPU2016151 ; BABAEIZADEH2009522 ; GARCIA2016157 ; HAGIWARA201899
. Most of the existing algorithms follow a traditional pipeline of preprocessing, feature extraction, and classification. The recent deep learning (DL) techniques
lecun2015deep also provide a promising framework for endtoend classification. In contrast to traditional approaches, one of the most significant advantages of using deep learning for classification is that handcrafted features are no longer needed, because deep neural networks have the ability of learning the inherent features when provided with a sufficient training data goodfellow2016deep . Whilst surprisingly, the applications of deep learning in AF have just begun in the past few years (see, e.g., rajpurkar2017cardiologist ; shashikumar2017deep ; xia2018detecting ; pourbabaee2017deep ; 8331569 ).For ECG signals, one can directly adopt 1D convolutional or recurrent network models for the classification task. However, transforming signals into spectral domain (spectrotemporal features) is a promising alternative approach knowing that the current stateoftheart deep convolutional neural networks (CNNs) structures are typically designed for 2D images. Deep CNNs such as AlexNet NIPS2012_4824 , Inceptionv4 AAAI1714806 , and DenseNet huang2017densely have proved their superiority in image classification.
Within the previous studies, only a few have resorted to the use of timevarying spectrum for AF detection. The reasons might be the following. First, it is not easy to select handcrafted features from 2D data using traditional classifiers. Second, the temporal features of spectrogram are usually hard to capture even in DL setting. Several studies xia2018detecting ; zihlmann2017convolutional
have endeavoured DL for AF detection in spectral domain, but the use of traditional spectral estimation methods such as shorttime Fourier transform (STFT) or continuous wavelet transform (CWT) may drop momentous information during the transformation, and produce less informative input data. Thus, to unravel these problems, it is beneficial to consider new spectrotemporal estimation methods that retain the temporal features better.
The contributions of this paper are: 1) We propose two extended models for spectrotemporal estimation using Kalman filter and smoother. We then combine them with deep convolutional networks for AF detection. 2) We test and compare the performance of proposed approaches for spectrotemporal estimation on simulated data and AF detection with other popular estimation methods and different classifiers. 3) For AF detection, we evaluate the proposals using PhysioNet/CinC 2017 dataset clifford2017af , which is considered to be a challenging dataset that resembles practical applications, and our results are in line with the stateoftheart.
This paper is an extended version based of our previous conference paper “Spectrotemporal ECG Analysis for Atrial Fibrillation Detection” zhao2018spectro
presented at 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing. In addition to the original contributions in the conference article, in this article, we use a new stochastic oscillator model and show that the spectrotemporal estimation can also be implemented with a steady state (stationary) Kalman filter and smoother, which leads to a significant reduction in time consumption without losing estimation accuracy. We demonstrate this in both simulated data and AF data classification. In addition to the experiments in the conference paper, where we only showed a few comparisons among estimation methods and classifiers, we expand them to a wide range of both standard and modern (e.g., Random Forests, CNNs, and DenseNet) classifiers for a better and more solid illustration of the classification performance.
The paper is structured as follows: In Section 2, we propose spectrotemporal methods for ECG signal analysis. In Section 3, we apply the proposed estimation method to AF detection using an averaging procedure. In Section 4, we compare and discuss experimental results both in simulated data and ECG dataset, followed by conclusion in Section 5.
2 SpectroTemporal Estimation Methods
Spectrotemporal signal analysis is an effective and powerful approach that is used in many fields ranging from biosignal analysis rad2017ecg and audio processing rad2012phase to weather forecasting ehrendorfer2012spectral and stock market prediction joseph2017daily . In ECG analysis, the temporal evolution of spectral information can be captured in spectrotemporal data representation, which can convey important information about the underlying biological process of the heart.
In this section, we develop new methods for spectrotemporal estimation. We first introduce a Fourier series model based upon the Bayesian spectrum estimation method of Qi et al. qi2002bayesian , and put Gaussian process priors on the Fourier coefficients. Then, by adopting the ideas presented in SARKKA20121517 , we convert the Fourier series into a more flexible stochastic oscillator model and use a fast stationary Kalman filter/smoother for its estimation. Finally, we demonstrate the estimation performance on simulated data.
2.1 Kalmanbased Fourier Series Model for SpectroTemporal Estimation
Apart from traditional STFT and CWT methods, the spectrotemporal analysis can also be done by modeling the signal as a stochastic statespace model and resorting to the Bayesian procedure (i.e., Kalman filter and smoother) for its estimation sarkka2013bayesian ; qi2002bayesian . The key advantages of this kind of approaches over other spectrotemporal methods are that we can apply them to both evenly and unevenly sampled signals qi2002bayesian and they require no stationarity guarantees nor windowing. Furthermore, as we show here, they can also be combined with statespace methods for Gaussian processes Hartikainen+Sarkka:2010 ; Sarkka+Solin+Hartikainen:2013 .
Recall that any periodic signal with fundamental frequency can be expanded into a Fourier series
(1) 
where the exact representation is obtained with , but for sampled (and thus bandlimited) signals it is sufficient to consider finite series. This stationary model is the underlying model in the STFT approach. STFT applies a window to each signal segment and finds a least squares fit (via discrete Fourier transform) to the coefficients .
In our approach, we start by assuming that the coefficients depend on time, and we put Gaussian process priors on them:
(2) 
As shown in Hartikainen+Sarkka:2010 ; Sarkka+Solin+Hartikainen:2013 , provided that the covariance functions are stationary, we can express the Gaussian processes as solutions to linear stochastic differential equations (SDEs). We choose the covariance functions to have the form
(3) 
where are scale parameters and are the inverses of the time constants (length scales) of the processes.
The statespace representations (which are scalar in this case) are then given as
(4) 
where are Brownian motions with suitable diffusion coefficients . We can also solve the equations at discrete time steps (see, e.g., Grewal+Andrews:2001 ) as
(5) 
where
(6) 
Let us now assume that we obtain noisy measurements of the Fourier series (1) at times
. What we can now do is to define a state vector
which stacks all the coefficients and . In this way, we can write , which leads to(7) 
We can also rewrite the dynamic model (5) as
(8) 
where contains the terms and on the diagonal and where contains the terms and on the diagonal.
If we assume that we actually measure (7) with additive Gaussian measurement noise , then we can express the measurement model as
(9) 
Equations (8) and (9) define a linear statespace model where we can perform exact Bayesian estimation using Kalman filter and smoother sarkka2013bayesian . In the original paper qi2002bayesian , the state vectors are assumed to perform random walk, but here the key insight is to use a more general Gaussian process which introduces a finite time constant to the problem. Although here we have chosen to use quite simple Gaussian process model for this purpose, it would also be possible to use more general Gaussian process priors for the coefficients such as statespace representations of Matérn or squared exponential covariance functions Hartikainen+Sarkka:2010 ; Sarkka+Solin+Hartikainen:2013 .
The Kalman filter for this problem then consists of the following forward recursion (for ):
(10) 
and the RTS smoother the following backward recursion (for ):
(11) 
The final posterior distributions are then given as:
(12) 
The magnitude of the sinusoidal with frequency at time step can then be computed by extracting the elements corresponding to and from the mean vector :
(13) 
From now on, matrix is called spectrotemporal data matrix.
2.2 Oscillator Model for SpectroTemporal Estimation
In practice, the computational cost of Kalman filter and smoother can be extensive when the length of the signal is very long. However, instead of the Fourier series state space model in previous section, one can also derive an alternative representation using stochastic oscillator differential equations. In this way, the dynamic and measurement models become linear timeinvariant (LTI) so that we can leverage a stationary Kalman filter to reduce the time consumption. This kind of stochastic oscillator models were also considered in SARKKA20121517 and the link to period Gaussian process models was investigated in solin14 .
A single quasiperiod stochastic oscillator can be described with the following stochastic differential equation model solin14 :
(14) 
where and the Brownian motion has a suitably chosen diffusion matrix solin14 . By solving the SDE in discrete time steps, we have
(15) 
where and are given by:
(16) 
where .
A general quasiperiodic signal can be modeled using a superposition of stochastic oscillators of the above form solin14 . If we construct , then the resulting timeinvariant model can be written as:
(17) 
where , and are defined as:
(18)  
(19) 
In this model, the first component of the state is a slowly drifting Brownian motion with diffusion coefficient modeling the possible nonzero mean of the signal.
The estimation problem can be solved with a Kalman filter and smoother. However, because the model is LTI, the Kalman filter is known to converge to a steadystate Kalman filter Kailath:233814 . The steadystate Kalman filter can be obtained by solving the following discrete algebraic Riccati equation (DARE) for the limit covariance :
(20) 
A positivesemidefinite solution to the equation is known to exists provided that the pair is detectable Kailath:233814 .
Thus we can obtain by solving DARE in (20), and the stationary Kalman filter for the forward mean propagation is:
(21) 
where the stationary gain is
(22) 
The corresponding smoother then turns out to converge to its steady state as well, and the backward propagation for the resulting steadystate smoother is:
(23) 
where the gain is computed as
(24) 
In this way, the calculation of the filter and covariances at every time step is not needed, which reduces the computational cost significantly. The disadvantage is that we need to solve the DARE in order to construct the stationary filter and smoother, which also adds to the computational cost.
After computing the estimates for each time step, we can extract the estimates of and and use (13) to compute the spectrotemporal data matrix.
2.3 Estimation Trials on Simulated Data
A quantitative evaluation of the proposed spectrotemporal methods for ECG classification is discussed in Sections 4 and 5.2. However, in this section we visually inspect the proposed spectrotemporal representations on the simulated data and compare them with other standard timefrequency approaches such as STFT, CWT, and BurgAR.
To avoid confusion in terminology, from now on, we refer the proposals in Section 2.1 and 2.2 as FourierKS and OscKS, respectively.
We simulated a noiseobserved multisinusoidal signal as shown in (25) and Fig. 1 with time step and .
(25) 
In Fig. 2, we plot the timevarying spectrum results using FourierKS, OscKS, STFT, CWT, and BurgAR. The settings for estimation we use here are described in the figure captions.
Although all methods can approximate the simulated data to a good extent, FourierKS and OscKS have higher frequency resolution with less noisy representation which can help us to extract more robust features from spectrotemporal representation. Morover, the results from FourierKS and OscKS methods are almost the same although they have different statespace models.
FourierKS  OscKS  CWT  STFT  BurgAR  


3.39  0.18  0.08  0.07  0.36  

9.18  0.95  1.32  0.30  2.58 
To verify the computational efficiency of the stationary proposal in Section 2.2, we run each of the estimation methods 20 times and record the mean values of their CPU time. We test with and to control the length of the signal. The results in Table 1 clearly show that the time reduction from FourierKS (3.39 s, 9.18 s) to OscKS (0.18 s, 0.95 s) is significant. For OscKS method, the time for solving DARE is 0.09 s which accounts for almost half of the total time (0.18 s). To reduce the time usage further, one can resort to better DARE solvers or lower resolution in frequency axis. For a longer signal (i.e. ), OscKS (0.95 s) method becomes faster than CWT (1.32 s), which indicates a competent efficiency for long signals.
3 Materials and Methods for ECG Classification
3.1 ECG Dataset
In the AF experiments, we used the ECG dataset provided by PhysioNet/CinC Challenge 2017 clifford2017af . In total 8528 short single lead ECG recordings were collected using AliveCor handheld devices. The recordings were uploaded automatically through an application on the user’s mobile phone. In addition, the data were sampled at 300 Hz and bandpass filtered by the AliveCor devices. The duration of ECG recordings were between 9 s to 61 s with 30 s median. The distribution of ECG recordings among different classes is as follows: Normal (5076 recordings), AF (758), Other (2415), and Noisy (279).
3.2 ECG SpectroTemporal Feature Engineering
Our aim is now to find the spectrotemporal features of ECG signals such that it can be classified by deep convolutional neural networks (CNNs). In Fig. 3 we show the overall proposed scheme from input (ECG) to output (predicted label).
The first step is QRS detection and ECG segmentation in which the raw ECG signal is divided into fixedlength segments aligned by their central R peaks. Next, the spectrotemporal data matrix for each segment is calculated using (13). The data matrices are then averaged and normalized to generate a fixedlength spectrotemporal feature matrix. In the final step, the 2D feature matrix (spectrotemporal image) is fed into a deep CNN for classification.
The logic behind the segmentation and averaging steps in the feature engineering procedure (dashed area in Fig. 3) is threefold. First, it can handle the problem of ECG recordings with different length, and generate fixedlength spectrotemporal feature matrices. Second, it can capture enough information from ECG recording to be classified by CNNs. For example, since the central R peaks in each segments are aligned, after averaging we expect sharp edges corresponding to QRS complexes in feature matrices (spectrotemporal image) for Normal rhythms. However, for AF rhythms we expect the blurred area in spectrotemporal images due to the variable RR intervals. For, noisy segments we do not expect any clear area for QRS complexes, and for Other classes based on the underlying arrhythmia one can expect different patterns in spectrotemporal images (see Fig.4). Finally, the third reason to use the segmentation and averaging steps is to decrease the effect of noise in ECG recordings. In the following we discuss different steps of feature engineering in detail.
In this work, for QRS detection, we use a modified version of PanTompkins algorithm. The original PanTompkins algorithm pan1985real is sensitive to burst noise, and it easily misinterprets noise with R peak. To address this limitation at least partially, we slightly modify the original algorithm such that it iteratively checks the number of detected R peaks and if that number is smaller than a threshold, it ignores the detected R peaks and their neighbourhood samples in the ECG signal, and again applies the PanTompkins algorithm on the rest of the signal. In this way, if there are few instances with highamplitude burst noise, our algorithms can handle those. One example which illustrate this modification is shown in Fig. 5.
The next step is segmentation in which the fixedlength ECG segments are extracted from the original signal such that each segment potentially covers three QRS complexes. The segmentation process is described as follows: if is the original ECG signal and is the position of th R peak in , then holds the positions of all R peaks, and is the total number of R peaks in . Now, to extract ECG segments we associate each , , to a segment of such that it potentially covers three adjacent QRS complexes. To do so, we collect samples before and after each . Following this procedure, the ECG segment associated to th R peak can be extracted from as , and using equation (13), the spectrotemporal data matrix corresponding to this ECG segment is where and are frequency and time steps, respectively. It is worth noticing that these two parameters (i.e., and ) determine the size of the matrix in (13). The choice of parameter is important, as it regulates the length of output and how much takes into average. Usually, should cover at least three QRS complexes for good evidence of RR intervals.
The spectrotemporal feature matrix is obtained by averaging over all spectrotemporal data matrices and multiplying with their maximum mask:
(26) 
The reason for adding a operation in Equation (26) is that it could, at least in certain extent, help preserving intricate details of spectrotemporal data that were potentially lost during averaging across every segments, and also normalizing the data.
3.3 Classification
In the recent ten years, deep learning techniques, especially convolutional neural networks, have achieved great success in detection and classification tasks. Comparing to 1D CNNs models, the progress of CNNs for 2D image applications is more prosperous. The aim here is to leverage advanced CNNs for AF classification using the timevarying spectrum (which is an image).
However, one flaw in most of the current network models is that the information during training, principally the gradient, may disappear if the network is exceedingly deep (with many layers), which is usually called “vanishing gradient” glorot2010understanding
. In general way, this root problem can be alleviated by several basic ways, for instance, with pretraining, residual connection, or with properly selected activation functions (e.g., one should not attach ReLu before batch normalization).
Densely connected convolutional networks (DenseNet) huang2017densely , which won the 2017 best paper award of CVPR, provide stateoftheart performance without degradation or overfitting even when stacked by hundred of layers. DenseNets can be seen as refined versions of deep residual networks (ResNets) he2016deep , where the former one introduces explicit connection on every two and preceding layers in a dense block rather than only adjacent layers, as shown in Fig. 6. Another additional advantage of DenseNet, as mentioned in huang2017densely , is the feature reuse.
Considering an layers network, and image input , the output of th layer is:
(27)  
(28) 
where and are layer operations (e.g., convolution, batchnormalization, or activation) of ResNet and DenseNet respectively, and is the output of th layer.
The DenseNet we implement here, which we refer as Dense18, is slightly different from the original proposal huang2017densely , where we employ both max and average global pooling on last layer and concatenate them as shown in Table 2
. In our application, because of the size of input, we remove the initial downsampling max pooling layer. Each dense block contains four
convolutional layers, with growth rate of 48 and reduction rate 0.5.Layer Name  Structure  Output Size  

Input  Input  (50, 50, 1)  
Convolution 

(50, 50, 64)  
Dense Block 1  (50, 50, 256)  
Transition 1 

(25, 25, 128)  
Dense Block 2  (25, 25, 320)  
Transition 2 

(12, 12, 160)  
Dense Block 3  (12, 12, 352)  
Transition 3 

(6, 6, 176)  
Dense Block 4  (6, 6, 368)  

(736)  

4 classes  (4) 
Random Forestliaw2002classification  CNN18  InceptionV3Szegedy_2016_CVPR  ResNet18he2016deep  ResNet34he2016deep  DenseNet18huang2017densely  Dense18  

STFT  73.47  72.65  75.66  76.17  76.26  77.39  77.67 
CWT  74.91  73.96  76.41  78.57  78.70  78.82  79.63 
BurgAR  73.22  71.78  76.45  76.41  76.30  77.58  77.76 
FourierKS  75.99  72.74  77.48  78.05  77.99  79.50  80.24 
OscKS  76.12  73.07  76.91  77.85  78.19  79.67  80.18 
Method  

(1)  STFT + Dense18  88.67  74.49  69.84  53.28  77.67  1.78 
(2)  CWT + Dense18  89.30  77.76  71.82  51.95  79.63  1.76 
(3)  BurgAR + Dense18  88.35  75.17  69.74  56.49  77.76  1.62 
(4)  Kalman + Dense18  89.29  79.18  72.25  52.50  80.24  1.52 
(5)  OSC + Dense18  89.09  79.78  71.68  55.86  80.18  1.55 
(6)  Martin zihlmann2017convolutional  88.8  76.4  72.6  64.5  79.2  N/A 
(6)  Zhaohan xiong2017robust  87  80  68  N/A  78  N/A 
3.4 Model Assessment and Evaluation Criteria
To evaluate the performance of the proposed methods, we have conducted experiments on the ECG dataset described in Section 3.1. The classification performance of different methods was assessed by using the scoring mechanism recommended by PhysioNet/Computing in Cardiology (CinC) Challenge 2017 clifford2017af over the whole dataset in 10fold crossvalidation scheme. The data were partitioned such that the same proportions of each class are available in each fold (stratified crossvalidation). Moreover, the F1 score,
(29) 
for each class is calculated to summarize the performance of that specific class: Normal (), AF (), Others (), and Noisy (
). Then, as recommended by PhysioNet/CinC 2017 the overall evaluation metric is used as follows:
(30) 
Finally, the detailed performance is shown by a 4class confusion matrix whose the diagonal entries are the correct classifications and the offdiagonal entries are the incorrect classifications. This confusion matrix is the result of stacking 10 confusion matrices of the test data in the 10fold crossvalidation.
4 Experiments
In principle, any timefrequency analysis method can be used for ECG classification. So, in order to show the benefit of using the proposed spectrotemporal method in Section 2
over other standard timefrequency analysis methods, we have conducted experiments on the ECG dataset. We have compared the results of the proposed method with shorttime Fourier transform (STFT), continuous wavelet transform (CWT), and classical power spectral density estimation method. To do so, we used magnitude of STFT, magnitude of CWT, and square root of nonlogarithmic power spectral density using Burg autoregressive model (BurgAR)
kay1981spectrum of ECG signal to construct the feature matrices.In addition, several different convolutional architectures are examined, and their results are compared to the standard RF classifier. The networks structure of InceptionV3, ResNet, and DenseNet are taken from their original papers Szegedy_2016_CVPR ; he2016deep ; huang2017densely , but we removed the initial subsampling layer for a fair comparison with Dense18 in Table 2. We also construct a plain CNN (CNN18) which has the same structure setting with Dense18
but without dense connection. For the random forest we use 500 decision trees and random selection of 50 features (out of 2500) at each node. In addition, at each node the random forest minimizes the crossentropy impurity measure. The settings for spectrotemporal estimation we choose here are the same as described in Section
5.1. All spectrotemporal feature matrices (images) are then unifiedly resized (downsample by local averaging) to for classifiers.With seven classifiers and five different timefrequency analysis methods, in total we have 35 different combinations whose performance are reported in Table 3. As can be seen from this table the best results (overall scores) belong to our proposed spectrotemporal representation methods (i.e., FourierKS and OscKS) with Dense18 classifier. Moreover, Table 4 shows the performance for each ECG classes for Dense18 classifier with different timefrequency representation.
The detailed performance of all five methods (i.e., FourierKS, OscKS, CWT, STFT, and BurgAR) with Dense18 classifier are reported in five confusion matrices in Fig. 7. Each confusion matrix is rowwise normalized. The diagonal entries show the Recall of each rhythm and offdiagonal entries show the misclassification rates. For example, the first row of the first confusion matrix shows 92.1% of normal rhythms are correctly classified as normal, but 0.6%, 6.3%, and 1.0% are incorrectly classified as AF, Other, and Noisy.
5 Discussion
5.1 ECG TimeFrequency Analysis Methods
We first examine how different spectrotemporal estimation methods perform on an ECG signal through a visual inspection. We take the 3223th recording (Rec. 3223) from CinC 2017 dataset as example, which is labelled as AF. It is shown in Fig.8(a). For the FourierKS and OscKS method, we choose different frequency range () and smoothing option as shown in Fig. 8(b), 8(c) and 8(d). We set the length scale
to a constant 10, and use 1 for variance of measurement noise
, and identity for covariance of process noise . In theory, could be different for each frequency, which could be used to improve the performance. Fig. 8(e) presents results by the original method in qi2002bayesian , which adopts Brownian motion model for the coefficients. For STFT and BurgAR, we apply apply 11 length 10 overlapping Hann windows for estimation, as shown in Fig. 8(f) and 8(h). For CWT (Fig. 8(g)), we use the default Morse wavelet implemented in Matlab.First, we observe that the estimation results of FourierKS (Fig.8(c)) and OscKS (Fig.8(d)) are nearly the same except that the base frequency coefficient estimates are very sensitive to in the OscKS method. If we compare FourierKS method to STFT, BurgAR, and CWT, which are shown in Fig. 8(c), 8(f), 8(h), and 8(g) respectively, we can initially conclude several advantages: the result from FourierKS is more smooth and it has higher and more unified resolution on both time and frequency. For STFT and BurgAR, the resolution is confined by window selection, length, and overlap. CWT untangles this problem by scaling and translation of wavelet basis function, but due to uncertainty principle of wavelet signal processing Ricaud2014 , the required resolution in time and frequency can not be met simultaneously (see Fig.8(g)). Our approaches model the timevarying Fourier series coefficients of signal in statespace, which are free from usage of windows or wavelets.
Another advantage of the proposed OscKS estimation method is that it can be very computationally efficient for implementation when we need to perform estimation many times and the system is fixed (i.e., , remain unchanged). For example, if one takes the averaging strategy, the spectrum estimation has to be done for every segment and recording. For OscKS method, we merely need to solve in (20) once. As we stated in Section 2.2, the computational cost of OscKS method is substantial reduced by deriving a stable covariance.
5.2 ECG Classification for AF Detection
As it is mentioned before, Table 3 shows that the best results belong to our proposed spectrotemporal representation methods (i.e., FourierKS and OscKS) with Dense18 classifier. Table 3 also shows that independent of spectrotemporal representation method, Dense18
has the highest performance among all classifiers. In contrast, the plain CNN (CNN18) has the lowest scores. In addition, RF is generally worse than convolutional networks classifiers (except CNN18) probably because in contrary to convolutional networks, RF has not benefited from the existing structure in spectrotemporal representation.
Regarding the different spectrotemporal representations STFT and BurgAR have the worst results, and FourierKS, and OscKS have the best performance. In addition, for some classifiers CWT provides the results which are as good as or even better than FourierKS, and OscKS. However, the best results of FourierKS, and OscKS are higher than the best result of CWT.
Table 4 shows that the the proposed ECG classification methods have the best result for Normal rhythm and the worst result for Noisy. The performance of AF and Other are between these two, but typically AF has better performance that Other, probably because Other is an umberella term that covers many abnormal nonAF rhythms, and we do not have enough samples for each abnormalities to properly train our classifiers.
To examine how different spectrotemporal features act in AF ECG analysis, one elementarylevel way is to investigate the feature map and activation of the first convolutional layer. However, this voxelbased “probing” only produces limited explanation Szegedy14intriguingproperties , and can not fully give the insights. The visualization is shown in Fig. 9. We can see that the featuremap of FourierKS and CWT are more diverse and active than STFT and BurgAR, and they have larger activation on “peaks” and background details. In comparison to FourierKS and CWT, the lowerfrequency area are better preserved and exploited for FourierKS method.
5.3 Limitations
Typically for AF detection we need at least 30 s ECG data developed2010guidelines . However, many ECG recordings in the dataset have less than 30 s duration (see Section 3.1) which limits the medical significance of the current study. In addition, the averaging step in feature engineering is robust only when there are enough spectrotemporal segments, which is not the case for very short ECG recordings (see Section 3.2).
6 Conclusion
In this paper, we proposed a spectrotemporal representation of ECG signals, based on statespace models, for application in deep network based atrial fibrillation detection. We empirically showed that if we put Gaussian process priors on the Fourier series coeffients, then by estimating the state of the corresponding linear statespace model using Kalman filter/smoother we can outperform other timefrequency analysis methods such as shorttime Fourier transform, continuous wavelet transform, and autoregressive spectral estimation for ECG classification.
We also accelerated the estimation of the spectrotemporal representation of signals by using a stochastic oscillator differential equation model and stationary Kalman filter/smoother. This representation is useful to improve the scalability of the proposed spectrotemporal representation for long ECG recordings. Finally, we have found an efficient convolutional architecture (i.e., Dense18) for AF detection using the spectrotemporal features by comparative evaluation of multiple convolutional neural networks models.
References
 (1) Annavarapu, A., Kora, P.: ECGbased atrial fibrillation detection using different orderings of conjugate symmetric–complex Hadamard transform. International Journal of the Cardiovascular Academy 2(3), 151–154 (2016)

(2)
Asgari, S., Mehrnia, A., Moussavi, M.: Automatic detection of atrial fibrillation using stationary wavelet transform and support vector machine.
Computers in Biology and Medicine 60, 132–142 (2015)  (3) Babaeizadeh, S., Gregg, R.E., Helfenbein, E.D., Lindauer, J.M., Zhou, S.H.: Improvements in atrial fibrillation detection for realtime monitoring. Journal of Electrocardiology 42(6), 522–526 (2009)
 (4) Breiman, L.: Random forests. Machine Learning 45(1), 5–32 (2001)
 (5) Bruser, C., Diesel, J., Zink, M.D., Winter, S., Schauerte, P., Leonhardt, S.: Automatic detection of atrial fibrillation in cardiac vibration signals. IEEE Journal of Biomedical and Health Informatics 17(1), 162–171 (2013)
 (6) Camm, A.J., Kirchhof, P., Lip, G.Y., Schotten, U., Savelieva, I., Ernst, S., Van Gelder, I.C., AlAttar, N., Hindricks, G., Prendergast, B., et al.: Guidelines for the management of atrial fibrillation: the task force for the management of atrial fibrillation of the european society of cardiology (ESC). European Heart Journal 31(19), 2369–2429 (2010)
 (7) Clifford, G.D., et al.: AF classification from a short single lead ECG recording: the Physionet/Computing in Cardiology Challenge 2017. 2017 Computing in Cardiology (CinC) 44, 1–4 (2017)
 (8) Ehrendorfer, M.: Spectral Numerical Weather Prediction Models. Society for Industrial and Applied Mathematics (2011)
 (9) García, M., Ródenas, J., Alcaraz, R., Rieta, J.J.: Application of the relative wavelet energy to heart rate independent detection of atrial fibrillation. Computer Methods and Programs in Biomedicine 131, 157–168 (2016)

(10)
Glorot, X., Bengio, Y.: Understanding the difficulty of training deep
feedforward neural networks.
In: Proceedings of the 13th International Conference on Artificial Intelligence and Statistics, vol. 9, pp. 249–256 (2010)
 (11) Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y.: Deep Learning. MIT Press (2016)
 (12) Grewal, M.S., Andrews, A.P.: Kalman Filtering, Theory and Practice Using MATLAB. Wiley, New York, NY (2001)
 (13) Hagiwara, Y., Fujita, H., Oh, S.L., Tan, J.H., Tan, R.S., Ciaccio, E.J., Acharya, U.R.: Computeraided diagnosis of atrial fibrillation based on ECG signals: A review. Information Sciences 467, 99–114 (2018)
 (14) Hartikainen, J., Särkkä, S.: Kalman filtering and smoothing solutions to temporal Gaussian process regression models. In: 2010 IEEE International Workshop on Machine Learning for Signal Processing (MLSP), pp. 379–384 (2010)

(15)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image
recognition.
In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 770–778 (2016)
 (16) Huang, G., Liu, Z., van der Maaten, L., Weinberger, K.Q.: Densely connected convolutional networks. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269 (2017)
 (17) Joseph, A., Larrain, M., Turner, C.: Daily stock returns characteristics and forecastability. Procedia Computer Science 114, 481–490 (2017)
 (18) Kailath, T., Sayed, A.H., Hassibi, B.: Linear Estimation. Prentice Hall, New Jersey (2000)
 (19) Kay, S.M., Marple, S.L.: Spectrum analysis – a modern perspective. Proceedings of the IEEE 69(11), 1380–1419 (1981)

(20)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks.
In: Advances in Neural Information Processing Systems 25, pp. 1097–1105. Curran Associates, Inc. (2012)  (21) LeCun, Y., Bengio, Y., Hinton, G.: Deep learning. Nature 521(7553), 436–444 (2015)
 (22) Mohebbi, M., Ghassemian, H.: Detection of atrial fibrillation episodes using SVM. In: 2008 30th Annual International Conference of the IEEE Engineering in Medicine and Biology Society, pp. 177–180. IEEE (2008)
 (23) Pan, J., Tompkins, W.J.: A realtime QRS detection algorithm. IEEE Transactions on Biomedical Engineering BME32(3), 230–236 (1985)
 (24) Pourbabaee, B., Roshtkhari, M.J., Khorasani, K.: Deep convolutional neural networks and learning ECG features for screening paroxysmal atrial fibrillation patients. IEEE Transactions on Systems, Man, and Cybernetics: Systems 48(12), 2095–2104 (2018)
 (25) Qi, Y., Minka, T.P., Picara, R.W.: Bayesian spectrum estimation of unevenly sampled nonstationary data. In: 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), vol. 2, pp. 1473–1476. IEEE (2002)
 (26) Rad, A.B., Virtanen, T.: Phase spectrum prediction of audio signals. In: 2012 5th International Symposium on Communications, Control and Signal Processing, pp. 1–5. IEEE (2012)
 (27) Rad, A.B., et al.: ECGbased classification of resuscitation cardiac rhythms for retrospective data analysis. IEEE Transactions on Biomedical Engineering 64(10), 2411–2418 (2017)
 (28) Rajpurkar, P., Hannun, A.Y., Haghpanahi, M., Bourn, C., Ng, A.Y.: Cardiologistlevel arrhythmia detection with convolutional neural networks. arXiv preprint arXiv:1707.01836 (2017)
 (29) Ricaud, B., Torrésani, B.: A survey of uncertainty principles and some signal processing applications. Advances in Computational Mathematics 40(3), 629–650 (2014)
 (30) Rubin, J., Parvaneh, S., Rahman, A., Conroy, B., Babaeizadeh, S.: Densely connected convolutional networks and signal quality analysis to detect atrial fibrillation using short singlelead ECG recordings. In: 2017 Computing in Cardiology (CinC), pp. 1–4 (2017)
 (31) Särkkä, S.: Bayesian Filtering and Smoothing. Cambridge University Press (2013)
 (32) Särkkä, S., Solin, A., Hartikainen, J.: Spatiotemporal learning via infinitedimensional Bayesian fltering and smoothing. IEEE Signal Processing Magazine 30(4), 51–61 (2013)
 (33) Särkkä, S., Solin, A., Nummenmaa, A., Vehtari, A., Auranen, T., Vanni, S., Lin, F.H.: Dynamic retrospective filtering of physiological noise in BOLD fMRI: DRIFTER. NeuroImage 60(2), 1517–1527 (2012)
 (34) Shashikumar, S.P., Shah, A.J., Li, Q., Clifford, G.D., Nemati, S.: A deep learning approach to monitoring and detecting atrial fibrillation using wearable technology. In: 2017 IEEE EMBS International Conference on Biomedical Health Informatics (BHI), pp. 141–144. IEEE (2017)
 (35) Solin, A., Särkkä, S.: Explicit Link Between Periodic Covariance Functions and State Space Models. In: S. Kaski, J. Corander (eds.) Proceedings of the Seventeenth International Conference on Artificial Intelligence and Statistics, Proceedings of Machine Learning Research, vol. 33, pp. 904–912. PMLR, Reykjavik, Iceland (2014)
 (36) Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inceptionv4, InceptionResNet and the impact of residual connections on learning. In: Proceedings of AAAI on Artificial Intelligence, pp. 4278–4284 (2017)
 (37) Szegedy, C., Vanhoucke, V., Ioffe, S., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016)
 (38) Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., Fergus, R.: Intriguing properties of neural networks. arXiv preprint arXiv:1312.6199 (2013)
 (39) Thaler, M.: The Only EKG Book You’ll Ever Need. Lippincott Williams & Wilkins (2017)
 (40) Xia, Y., Wulan, N., Wang, K., Zhang, H.: Detecting atrial fibrillation by deep convolutional neural networks. Computers in Biology and Medicine 93, 84–92 (2018)
 (41) Xiong, Z., Stiles, M.K., Zhao, J.: Robust ECG signal classification for detection of atrial fibrillation using a novel neural network. 2017 Computing in Cardiology (CinC) 44, 1–4 (2017)
 (42) Yaghouby, F., Ayatollahi, A., Bahramali, R., Yaghouby, M., Alavi, A.H.: Towards automatic detection of atrial fibrillation: A hybrid computational approach. Computers in Biology and Medicine 40(11), 919–930 (2010)
 (43) Zabihi, M., Rad, A.B., et al.: Detection of atrial fibrillation in ECG handheld devices using a random forest classifier. 2017 Computing in Cardiology (CinC) 44, 1–4 (2017)
 (44) Zhao, Z., Särkkä, S., Rad, A.B.: Spectrotemporal ECG analysis for atrial fibrillation detection. In: 2018 IEEE 28th International Workshop on Machine Learning for Signal Processing (MLSP), pp. 1–6. IEEE (2018)

(45)
Zihlmann, M., Perekrestenko, D., Tschannen, M.: Convolutional recurrent neural networks for electrocardiogram classification.
2017 Computing in Cardiology (CinC) 44, 1–4 (2017)  (46) ZoniBerisso, M., Lercari, F., Carazza, T., Domenicucci, S.: Epidemiology of atrial fibrillation: European perspective. Clinical Epidemiology 6, 213–220 (2014)
Comments
There are no comments yet.