Electrocardiogram is a reliable, effective and non-invasive diagnostic tool and is the best representation of electrophysiological pattern of depolarization and repolarization of the heart muscles during each heartbeat. Heart beat classification based on ECG provides conclusive information to the cardiologists about chronic cardiovascular diseases . An intelligent system for diagnosing cardiovascular diseases is highly desirable because they are the leading source of death around the globe .
Arrhythmia is a heart rhythmic problem which occurs when electrical pulses that coordinate hearbeats cause heart to beat irregularly i.e either too slow or too fast. Arrhythmias can be caused by coronary artery disease, high blood pressure, changes in the heart muscle (cardiomyopathy), valve disorders etc.
Myocardial Infarction, also known as heart attack, is caused due to the blockage of blood supply to the coronary arteries and in general to the myocardium. This blockage stops the supply of oxygen-rich blood to the heart muscle which can be life-threatening for the patient .
ECG beat-by-beat examination is vital for early diagnosis of cardiovascular conditions. However, differences of recording environment, variations of disease patterns among the subjects during testing, complex, non-stationary and noisy nature of ECG signal  make heartbeat classification a challenging and laborious exercise for cardiologists . Thus, computer based novel practices are useful for automatic and autonomous detection of abnormalities in heartbeat ECG classification.
Conventional methods for heartbeat classification using ECG signal rely mostly on hand-crafted or manually extracted features using signal processing techniques such as digital filter-based methods , mixture of experts methods , threshold-based methods 
, Principal Component Analysis (PCA)10] and wavelet transform . Some of the classifiers used with these extracted features are Support Vector Machines (SVM) 
, Hidden Markov Models (HMM) and Neural Networks 
. The first disadvantage with these conventional methods is the separation of feature extraction part and pattern classification part. Furthermore, these methods need expert knowledge about the input data and selected features. Moreover, extracting features using subject experts is a time consuming process and features may not invariant to noise, scaling and translations and thus can fail to generalize well on unseen data.
has recently attracted attention of many researchers. Deep learning models are capable of automatically learning invariant and hierarchical features directly from the data and employ end-to-end learning mechanism that takes data as input and class prediction as output. Recent deep learning models use 1D ECG signal or 2D representation of ECG by transforming ECG signal to images or some matrix form. For 1D ECG classification, commonly used deep learning models are deep belief networks, restricted Boltzmann machines, auto encoders, CNN
and recurrent neural network (RNN). For 2D ECG classification, CNNs are used and the input ECG data is transformed to images or some other 2D representation. It is experimentally proved in  that 2D representation of ECG provides more accurate heartbeat classification compared to 1D. In our previous work , univariate ECG signal is transformed to images by segmenting ECG signal between successive R-R intervals and then stacking these R-R intervals row wise to form images. Finally, multidomain multimodal fusion is performed to improve the stress assessment. Experimental results proved that multidomain multimodal fusion achieved highest performance as compared to single ECG modality.
In this manuscript, we deal with the shortcomings of existing deep learning models for ECG heartbeat classification by proposing two fusion frameworks that have the capacity of extracting and fusing complementary and discriminative features while reducing dimensionality as well.
The proposed work has following significant contributions:
Two multimodal fusion frameworks for ECG heartbeat classification called Multimodal Image Fusion (MIF) and Multimodal Feature Fusion (MFF), are proposed. At the input of these frameworks, we convert the heartbeats of raw ECG data into three types of two-dimensional (2D) images using Gramian Angular Field (GAF), Recurrence Plot (RP) and Markov Transition Field (MTF). Proposed fusion frameworks are computationally efficient as they keep the size of the combined features similar to the size of individual input modality features.
We transform heartbeats of ECG signal to images using Gramian Angular Field (GAF), Recurrence Plot (RP) and Markov Transition Field (MTF) to conserve the spatial domain correlated information among the data samples. These transformations result in an improvement in classification performance in contrast to the existing approaches of transforming ECG to images using spectrograms or methods involving time-frequency analysis (Short time Fourier transform or wavelet transform).
Ii Related Work
Deep Learning models especially CNN has been used over the years for ECG heartbeat classification for the detection of cardiovascular diseases such as arrhythmia and MI. These models include both 1D and 2D CNNs.
Ii-a One-dimensional CNN Approaches
Various models based on 1D CNN has been proposed in the literature for ECG classification. In 
, an active learning model based on ID CNN is presented for arrhythmia detection using ECG signal. Model performance is improved by using breaking-ties (BT) and modified BT algorithms. Authors in proposed a model for adaptive real time implementation of a patient-specific ECG heartbeat classification based on 1D CNN using end-to-end learning. In 
, a novel algorithm making use of an 11-layer deep CNN is proposed for automatic detection of MI using ECG beats with and without noise. A transfer learning method based on CNN is proposed in where the information learned from arrhythmia classification task is employed as a reference for the training of classifiers. A computationally intelligent method for patient screening and arrhythmia detection using CNN is proposed in 
. The proposed method is capable of diagnosing arrhythmia conditions without expert domain knowledge and feature selection mechanism. In, wavelet transform based on Fourier-Bessel series expansion is proposed for the localization of ECG. The Fourier-Bessel spectrum of the ECG beats is separated into adjacent parts using the fixed order ranges and then multiscale CNN is employed for MI classification of different categories. Multi-Channel Lightweight Convolutional Neural Network (MCL-CNN) which uses squeeze convolution, the depth-wise convolution, and the point-wise convolution is proposed in  for MI classification. Two end-to-end deep learning models based on CNN are proposed in 
. These models are called two stage hierarchical model. Furthermore, generative adversarial networks (GANs) is used for data augmentation and to reduce the class imbalance. In, authors proposed a neural network model for precise classification of heartbeats by following the AAMI inter-patient standards. This model works in two steps. In the first step the signals are preprocessed and then features are extracted from the signals. In the second step, the classification is performed by a two-layer classifier in which each layer consists of two independent fully-connected neural networks. The experiments show that the proposed model precisely detects arrhythmia conditions. In , authors proposed a complex deep learning model consists of CNN and LSTM. This model classifies six types of ECG signals by processing ten seconds ECG slices of MIT-BIH arrhythmia dataset. Experimental results proved that the proposed model could be used by cardiologists to detect arrhythmia. In , authors presented CNN based model for proper diagnoses of congestive heart failure using ECG. The testing and training of the proposed model was carried out on publicly available ECG datasets. Performance of the proposed model shows the authenticity of model for congestive heart failure detection.
Ii-B Two-dimensional CNN Approaches
The knock out performance of CNN on 2D data such as images convinced the researchers to convert raw ECG data to images for improved results. In , short-time Fourier transform is used to convert ECG signal into time-frequency spectrograms that were used as input to CNN for arrhythmia classification. Experimental results show that 2D-CNN achieved higher classification accuracy than 1D-CNN. In , ECG signal is converted into spectro-temporal images that were sent as an input to multiple dense convolutional neural network to capture both beat-to-beat and single-beat information for analysis. Authors in  transformed heartbeat time intervals of ECG signals to images using wavelet transform. These images are used to train a six layer CNN for heartbeat classification. In , Generative neural network is used to convert the raw 1D ECG signal data into a 2D image. These images are input to DenseNet which produces highly accurate classification, with high sensitivity and specificity using 4 classes of heart beat detection. To distinguish abnormal ECG samples from normal, authors in  used pretrained CNNs such as AlexNet, VGG-16 and ResNet-18 on spectrograms obtained from ECG. Using a transfer learning approach, the highest accuracy of 83.82% is achieved by AlexNet. In , multi-lead ECG are treated as 2D matrices for input to a novel model called multilead-CNN (ML-CNN) which employs sub two-dimensional (2D) convolutional layers and lead asymmetric pooling (LAP) layers. In , authors generated dual beat coupling matrix from the sections of heartbeats. This dual beat coupling matrix was then as 2D input to a CNN classifier. Gray-level co-occurrence matrix (GLCM), obtained from ECG data is employed for features vector description due to its exceptional statistical feature extraction ability in . In , ECG signals were segmented into heartbeats and each of the heartbeats were transformed to 2D grayscale images which were input to CNN. In , two second segments of ECG signal are transformed to recurrence plot images to classify arrhythmia in two steps using deep learning model. In the first step the noise and ventricular fibrillation (VF) categories were recognized and in the second step, the atrial fibrillation (AF), normal, premature AF, and premature VF labels were classified. Experimental results show the promising performance of the proposed method.
Ii-C Fusion based approaches
Fusing different modalities mitigates the weaknesses of individual modalities both in 1D and 2D forms by integrating complementary information from the modalities to perform the analysis and classification tasks accurately. In , a Multi-scale Fusion convolutional neural network (MS-CNN) is proposed for heartbeat classification using ECG signal. The Multi-scale Fusion convolutional neural network is a two stream network consisting of 13 layers. The features obtained from the last convolutional layer are concatenated before classification. Another Deep Multi-scale Fusion CNN (DMSFNet) is proposed in  for arrhythmia detection. Proposed model consists of backbone network and two different scale-specific networks. Features obtained from two scale specific networks are fused using a spatial attention module. Patient-specific heartbeat classification network based on a customized CNN is proposed in 
. CNN contains an important module called multi-receptive field spatial feature extraction (MRF-SFE). The MRF-SFE module is designed for extracting multispatial deep features of the heartbeats using five parallel convolution layers with different receptive fields. These features are concatenated before being sent to the third convolutional layer for further processing. Two stage serial fusion classifier system based on SVM’s rejection option is proposed in
. SVM’s distance outputs are related with confidence measure and then ambiguous samples are rejected with first level SVM classifier. The rejected samples are then forwarded to a second stage Logistic Regression classifier and then late fusion is performed for arrhythmia classification. Authors in presented a unique feature fusion method called parallel graphical feature fusion where all the focus is given to geometric features of data. Original signal was first split into subspaces, then multidimensional features are extracted from these subspaces and then mapped to the points in high-dimensional space. Multi-stage feature fusion framework based on CNN and attention module was proposed in  for multiclass arrhythmia detection. Classification is performed by extracting features from different layers of CNN. Combination of CNN and the attention module shows the improved discrimination power of the proposed model for ECG classification.
The shortcoming in the existing fusion methods is that they depend mostly on concatenation fusion. Concatenation leads towards the problem computational complexity, curse of dimensionality and hence the degradation in classification accuracy. In this paper, we address the imperfections of the existing literature and propose two fusion frameworks called Multimodal Image Fusion (MIF) and Multimodal Feature Fusion (MFF) which extract and fuse the features while reducing dimensionality as well. The proposed fusion frameworks are described in section III.
Iii Materials and Methods
This section explains the proposed fusion frameworks called Multimodal Image Fusion (MIF) and Multimodal Feature Fusion (MFF). The common element in both of the proposed fusion framework is ECG signal to image transformation as shown in Figures 1 and 2. Therefore in this section, first we will explain ECG signal to image transformation and then MIF, MFF and the two important elements of MFF, gated fusion network shown in Fig. 3 and architecture of CNN shown in Fig. 4, will be explained.
Iii-a ECG Signal to Image Transformation
For each fusion framework, we transform the input heart-beats into three types of images called GAF, RP and MTF images.
Iii-A1 Formation of Images by Gramian Angular Field (GAF)
Converting heart-beats of ECG into Gramian Angular Field (GAF) images maps the ECG in an angular coordinate system instead of typical rectangular coordinate system.
Consider that is an ECG signal of samples such that . We normalized between 0 and 1 to get . Now we map the normalized ECG in angular coordinate system by transforming the value into the angular cosine and the time stamps into the radius. Following equation is used to explain this encoding.
In the above equation, is normalized sample of the ECG, is the time stamp for and is a constant to adjust the spread of the angular coordinate system. This encoding provides two benefits. It is bijective and it conserves the spatial domain affiliations through the . Since the image location with respect to the ECG heart beat samples is consistent along the principal diagonal, therefore, the original heart beat samples of ECG can be restored from angular coordinates .
The angular viewpoint of the encoded image can be exploited by taking into account the sum/difference between each sample to indicate the correlation among various time stamps. The summation method, used in this article is explained by the following set of equations.
is the unit row vector in equation 3
GAF Images of five different categaories for MIT-BIH dataset are shown in Fig 5.
Iii-A2 Formation of Images by Recurrence Plot (RP)
ECG is a non-stationary signal, therfore to visulaize the recurrent behavior and to observe the recurrence pattern of ECG signal , we encode ECG heartbeats into RP images. An RP image obtained from a heartbeat of ECG represents spacing between time points .
For ECG signal defined in section III-A1, the recurrence plot is given by
where is threshold and is the heaviside function.
RP Images of five different categaories for MIT-BIH dataset are shown in Fig 5.
Iii-A3 ECG to Markov Transition Field (MTF) image conversion
bins based on quantiles and assign everyto the related bins . Second step is the construction of weighted adjacency matrix
by computing tranformations within quantile bins like a first-order Markov chain on the time axis. Weighted adjacency matrix in the normalized form is called Markov transition matrix and is non-reative to the spatial domain characteristics, resulting in information loss. For handling the loss of information, Markov transition matrix is transformed to Markov transition field matrix (MTF) by stretching the transition likelihoods corresponding to the spatial domain locations. The MTF matrix is denoted by M and is shown below
Where is the frequency of transition of a point between two quantiles. Since the formation of transformed matrix depends upon the chances of moving element, the MTF cannot be restored to original ECG signal.
Bins are the quantiles where the probability distribution is same. Any number of bins can be selected for ECG to MTF images. We decided to take 10 bins as the data is normalized between 0 and 1. These bins are defined during the formation of Weighted adjacency matrix which is the first step for creating MTF matrix shown in equation5.
MTF Images of five different categaories for MIT-BIH dataset are shown in Fig 5.
For ECG to image transformation using GAT, RP and MTF methods, we are using the full length of heartbeats to transform 1D information to 2D. Therefore, ECG signal of any length can be transformed to images and then can be resized using interpolation.
We can see from Fig. 5, that for each kind of image (GAF, RP and MTF), the gray scale images are more interpretable. These images show different patterns for each of the five categories of MIT-BIH dataset. The x-y values of the 2D images are just pixel values of the GAF, RP, and MTF images.
Iii-B Multimodal Image Fusion Framework
Multimodal Image Fusion (MIF) framework is shown in Fig. 1. At the input, we transform the heartbeats of raw ECG signal into three types of images as described in section III-A and shown in Fig. 5. The motivation of choosing GAF, MTF and RP is that they are three different statistical methods of transforming ECG to images. During transformation they preserve the temporal information and hence they are lossless transformations. We combine these three gray scale images to form a triple channel image (GAF-RP-MTF). A triple channel image is a colored image in which GAF, RP and MTF images are considered as three orthogonal channels like three different colors in RGB image space. However, this three-channel image is not conventional way of converting a gray scale image to RGB, rather in this paper all three gray scale images are formed from raw ECG data with different statistical methods. Thus, a three-channel image in the presented work carries statistical dynamics of the ECG and therefore, is more informative. Furthermore, three-channel image can be easily utilized with with off-the-shelf CNNs like AlexNet.
Iii-C Multimodal Feature Fusion Framework
At the input of MFF, we transform ECG heartbeats into images as shown in Fig. 2. AlexNets are employed to learn features from input imaging modality. We extract these learned features from (fc-7) of each AlexNet and are then fused by an efficient Gated Fusion Network (GFN), backbone of the proposed MFF, which fuses the features effectively by taking care of their dimensionalities as well. These fused features are input of the SVM classifier as shown in Fig. 2.
Iii-C1 Gated Fusion Network
The architecture of our proposed gated fusion network (GFN) is shown in Fig. 3. We have adapted this network from our previous work in . The input to the GFN are the features extracted from the second last fully connected layer (fc-7) of each AlexNet as shown in Fig. 2.
Let , and be the features from each imaging modality respectively. These feature are then convolved with high boost kernel as shown in Fig. 3.
We used high boost filter for convolution with features since this filter precisely recognize important information of feature and accredits boosted value to every element of features according to its importance . High boost filter is the difference between scaled version and low-pass version of the input image as shown below in equation 6.
where and are respectively the scaled version and low pass version of image
In general, high boost filter is given by
where is the amplification factor that assigns the weights to the feature during convolution.
The best filter performance is obtained for = 1. Other values of produces less amplification.
Thus, following high boost kernel is selected empirically that highlights the important characteristics.
High boost filter highlights the high frequency components while conserving the low frequency components.
After convolution of features with the high boost filter, sigmoid function is used for generating proper gated weights , and respectively as shown in Fig. 3. Finally, we obtained point-wise product of the weights , and and the features , and respectively, to perform feature fusion and to generate fused features. The working of GFN can be understood by the following equations.
: Sigmoid Function.
: Point Wise Multiplication
: th feature of th modality
: Fused feature
Iii-C2 CNN Architecture
Architecture of CNN used in proposed MFF is shown in Fig. 4
. It consists of three convolutional layers, two pooling layers, and a fully connected layer. The first convolutional layer has 16 kernels of size 5x5, followed by pooling layer of size 2x2 and stride 2. Second and third convolutional layers have 32 kernels of size 5x5 followed by 2x2 pooling layer with stride 2.
Iii-D Classification Task and Classifier
The classification task of the proposed methods is ECG heart beat classification for arrythmia and MI detection.
The classification metrics used for classification are accuracy, precision and recall as shown in TablesV, VI, VII and VIII. The accuracies, precisions and recalls are calculated using following equations.
= True positive
= True negative
= False positive
= False negative
We used Softmax classifier in proposed MIF and Support Vector Machines (SVM) classifier in proposed MFF for classification task.
Softmax classifier is a multiclass classifier or regressor used in the fields of machine learning. Score function for softmax classifier computes the class specific probabilities whose sum is 1.
The mathematical representation of score function for softmax classifer is shown below.
where is the input vector and the score function maps the exponent domain to the probabilities.
In simplest form, the score function for SVM is the mapping of the input vector to the scores and is a simple matrix operation as shown in Equation 17.
Where is the input vector, is the weight determined by input vector and the number of classes and
is the bias vector.
|Initial Learn Rate||0.005|
|Learn Rate Drop Factor||0.5|
|Learn Rate Drop Period||10|
Iii-E Training and Optimization
We resize images to 227 x 227 to perform experiments with AlexNet. We also perform experiments with smaller but computationally efficient CNN, whose architecture is shown in Fig. 4, to show that proposed frameworks can achieve comparable performance even with the smaller CNN. The comparison in terms of computational cost between both CNN models is provided in Table XI. We fine tune Alexnet by reducing the size of second last fully connected layer ’fc7’ from 4096 to 512 and the size of last fully connected layer ’fc8’ from 1000 to size equal to the number of classes in our datasets. The size of “fc7” layer of AlexNet is 4096 which is according to size of classification layer which is 1000. For our MIT-BIH dataset and PTB dataset, we need the size of classification layer equal to 5 and 2 respectively due to number of classes in these datasets. Thus to make ‘fc7’ compatible with classification layer, we reduce its size to 512. The training parameters for AlexNet and CCN are shown in Table I.
For optimization of the deep networks, we used Stochastic Gradient Descent with Momentum (SGDM) algorithm. SGDM is a method which helps accelerate gradients vectors in the right directions, thus leading to faster converging. It is one of the most popular optimization algorithms and many state-of-the-art models are trained using it.
Iv Experimental Results
Iv-a ECG Databases
Experiments are performed with PhysioNet MIT-BIH Arrhythmia dataset   for heartbeat classification and PTB Diagnostic ECG dataset  for MI classification using both proposed fusion frameworks. For experiments, ECG lead-II re-sampled data at sampling frequency of 125Hz is used as the input.
We used the standardized form of both datasets provided in . These datasets are already denoised and the training and testing parts are provided in the form of standard ECG heartbeats. Furthermore, five classes of arrythmia and MI localization has already been done and provided in terms of standard ECG heart-beats. Our study focused on ECG to image transformation and to the design of proposed multimodal fusion frameworks. The main focus is increasing the overall performance of classification of heartbeats. We did not attempt at modeling or solving for a specific type of noise.
We conduct our experiments on Matlab R2020a on a desktop computer with NVIDIA GTX-1070 GPU.
The experimental results are discussed in detail in section V.
Iv-A1 PhysioNet MIT-BIH Arrhythmia Dataset
Forty seven subjects were involved during the collection of ECG signals for the dataset. The data was collected at the sampling rate of 360Hz and each beat is annotated by at least two experts. Using these annotations, five different beat categories are created in accordance with Association for the Advancement of Medical Instrumentation (AAMI) EC57 standard  as shown in Table II.
For training on CNN, we need large number of samples. We use the same testing and training segments provided in  to train on CNNs. Since there is a class-imbalanced in the training part of the dataset as apparent from the numbers, we applied SMOTE  to upsample the minority classes (classes other than N) and finally settled on the numbers shown in the right column of Table III.
SMOTE is a data augmented technique which is used to reduce overfitting during training and is helpful to reduce the biasness of classifier.
We perform experiments using both proposed fusion frameworks on MIT-BIH dataset with the training and testing samples shown in Table IV and with the training parameters shown in Tables I. The experimental results are shown in Tables V and VI.
|GAF Images only||97.3||85||91|
|RP Images only||97.2||82||93|
|MTF Images only||91.5||86||89|
|GAF Images only||98.4||98||96|
|RP Images only||98||98||94|
|MTF Images only||95.3||94||89|
Iv-A2 PTB Diagnostic ECG dataset
Two hundred and ninety (290) subjects took part during collection of ECG records for PTB Diagnostics dataset. 148 of them are diagnosed as MI, 52 healthy control, and the rest are diagnosed with 7 different diseases. Frequency of 100Hz is used for each ECG record from 12 leads. However, for our experiments, we used lead II ECG recordings and worked with healthy control and MI categories.
We perform experiments using both proposed fusion frameworks on PTB dataset with training and testing samples shown in Table IV and with training parameters shown in Tables I. Training and testing parts of the dataset are provided in  to train CNN models. The experimental results are shown in Tables VII and VIII
We present the comparative results of the proposed frameworks with the state-of-the art methods in Tables IX and X. As we can see, our proposed frameworks considerably outperform the existing methods in terms of accuracy, precision, and recall.
To justify the importance of the proposed fusion frameworks, we assess the performance of different components of the proposed framework with both datasets by concatenation and average fusion methods. We performed average fusion by accrediting the unity value to all the weights i.e = 1, = 1 and = 1 in the gated fusion network. Since we have three modalities, therefore, by taking simple average, we get the equal value of 0.333 for each weight. We also experiment with 0.333 and get the same results. Since weights are equal in average fusion, therefore, to make things simpler, we assign a unity value to every weight. It is possible that better weight can be acquired through trainable weight coefficients. This is something we plan to investigate in future. Tables V, VI, VII and VIII reports the results of assessing different fusion methods along with proposed fusion frameworks.
|Izci et al. ||97.96||-||-|
|Dang et al. ||95.48||96.53||87.74|
|Li et al. ||99.5||97.3||98.1|
|Zhao et al. ||98.25||-||-|
|Oliveria et al. ||95.3||-||-|
|Huang et al. ||99||-||-|
|Shaker et al. ||98||90||97.7|
|Kachuee et al. ||93.4||-||-|
|Xu et al. ||95.9||-||-|
|He et al. ||98.3||-||-|
|Qiao et al. ||99.3||-||-|
|Dicker et al. ||83.82||82||95|
|Acharya et al. ||95.22||95.49||94.19|
|Kojuri et al. ||95.6||97.9||93.3|
|Kachuee et al. ||95.9||95.2||95.1|
|Liu et al. ||96||97.37||95.4|
|Sharma et al. ||96||99||93|
|Chen et al. ||96.18||97.32||93.67|
|Cao et al. ||96.65||-||-|
|Ahamed et al. ||97.66||-||-|
The performance of concatenation fusion is poor as compared to other methods as shown by experimental results. Concatenation fusion creates high dimensional feature vector that leads to the additional computational cost and deterioration of information during classification .
We also provide the comparison of both proposed fusion frameworks in terms of inference speed as shown in Table XII. Inference speed is the time consumed by classifier to recognize one test sample. It is expressed in microseconds (s). It is observed that MFF yields high accuracy, precision and recall for both datasets as compared to MIF, however, MIF is computationally efficient in terms of inference speed.
Since we experiment with two different CNNs, we provide comparison between both CNNs in terms of computational cost as shown in Table XI. Since there is a trade off between accuracy and computational cost, we observe from Tables V, VI and XI that CNN, shown in Fig. 4, is less accurate than AlexNet but is computationally efficient.
We prefer SVM classifier over softmax classifier since we have experimentally proved in our previous work  that SVM performs better than softmax, which is typically built into any CNN framework. Softmax classifier reduces the cross entropy function while SVM employs a margin based function. The more rigorous nature of classification is the reason of better performance of SVM over softmax.
The comparison provided in Tables IX and X is on the basis of datasets and the performance metrics. There are slight changes in the conditions for testing in few of the comparisons, However, it is appropriate to compare the results.
The limitation of the proposed Multimodal Image Fusion (MIF) Framework is that it requires exactly three different statistical gray scale images for creating a triple channel compound image. Since Multimodal Feature Fusion (MFF) Framework is using three separate AlexNet for training on GAF, RP and MTF images, it requires more time for training and inference.
We proposed two computationally efficient multimodal fusion frameworks for ECG heart beat classification called Multimodal Image Fusion (MIF) and Multimodal Feature Fusion (MFF). At the input of these frameworks, we convert ECG signal into three types of images using Gramian Angular Field (GAF), Recurrence Plot (RP) and Markov Transition Field (MTF). In MIF, we first perform image fusion by combining three input images to create a three channel single image which used as input to the CNN. In MFF, highly informative cues are pulled out from penultimate layer of CNN and they are fused and used as input for the SVM classifier. We demonstrate the superiority of the proposed fusion frameworks by performing experiments on PhysionNet’s MIT-BIH for five different arrhythmias and on PTB diagnostics dataset for MI classification. Experimental results prove that we beat the previous state-of-the-art in terms of classification accuracy, precision and recall. The important finding of this study is that the multimodal fusion of modalities increases the performance of the machine learning task as compare to use the modalities individually.
-  L. Sun, Y. Lu, K. Yang, and S. Li, “Ecg analysis using multiple instance learning for myocardial infarction detection,” IEEE transactions on biomedical engineering, vol. 59, no. 12, pp. 3348–3356, 2012.
-  Y. Xia, X. Liu, D. Wu, H. Xiong, L. Ren, L. Xu, W. Wu, and H. Zhang, “Influence of beat-to-beat blood pressure variability on vascular elasticity in hypertensive population,” Scientific reports, vol. 7, no. 1, pp. 1–8, 2017.
-  U. R. Acharya, N. Kannathal, L. M. Hua, and L. M. Yi, “Study of heart rate variability signals at sitting and lying postures,” Journal of bodywork and Movement Therapies, vol. 9, no. 2, pp. 134–141, 2005.
-  U. R. Acharya, Y. Hagiwara, J. E. W. Koh, S. L. Oh, J. H. Tan, M. Adam, and R. San Tan, “Entropies for automated detection of coronary artery disease using ecg signals: A review,” Biocybernetics and Biomedical Engineering, vol. 38, no. 2, pp. 373–384, 2018.
-  Z. Zhang, J. Dong, X. Luo, K.-S. Choi, and X. Wu, “Heartbeat classification using disease-specific feature selection,” Computers in biology and medicine, vol. 46, pp. 79–89, 2014.
-  E. Pasolli and F. Melgani, “Active learning methods for electrocardiographic signal classification,” IEEE Transactions on Information Technology in Biomedicine, vol. 14, no. 6, pp. 1405–1416, 2010.
-  Y. H. Hu, S. Palreddy, and W. J. Tompkins, “A patient-adaptable ecg beat classifier using a mixture of experts approach,” IEEE transactions on biomedical engineering, vol. 44, no. 9, pp. 891–900, 1997.
-  V. Chouhan and S. Mehta, “Threshold-based detection of p and t-wave in ecg using new feature signal,” International Journal of Computer Science and Network Security, vol. 8, no. 2, pp. 144–153, 2008.
-  N. A. Bhaskar, “Performance analysis of support vector machine and neural networks in detection of myocardial infarction,” Procedia Computer Science, vol. 46, no. 4, pp. 20–30, 2015.
K.-i. Minami, H. Nakajima, and T. Toyoshima, “Real-time discrimination of ventricular tachyarrhythmia with fourier-transform neural network,”IEEE transactions on Biomedical Engineering, vol. 46, no. 2, pp. 179–185, 1999.
-  H. Khorrami and M. Moavenian, “A comparative study of dwt, cwt and dct transformations in ecg arrhythmias classification,” Expert systems with Applications, vol. 37, no. 8, pp. 5751–5757, 2010.
L. Sharma, R. Tripathy, and S. Dandapat, “Multiscale energy and eigenspace approach to detection and localization of myocardial infarction,”IEEE transactions on biomedical engineering, vol. 62, no. 7, pp. 1827–1837, 2015.
-  H. Lu, K. Ong, and P. Chia, “An automated ecg classification system based on a neuro-fuzzy system,” in Computers in Cardiology 2000. Vol. 27 (Cat. 00CH37163). IEEE, 2000, pp. 387–390.
-  K. A. Sidek, I. Khalil, and H. F. Jelinek, “Ecg biometric with abnormal cardiac conditions in remote monitoring system,” IEEE Transactions on systems, man, and cybernetics: systems, vol. 44, no. 11, pp. 1498–1509, 2014.
V. Krasteva, S. Ménétré, J.-P. Didon, and I. Jekova, “Fully convolutional deep neural networks with optimized hyperparameters for detection of shockable and non-shockable rhythms,”Sensors, vol. 20, no. 10, p. 2875, 2020.
-  I.-C. Tanoh and P. Napoletano, “A novel 1-d ccanet for ecg classification,” Applied Sciences, vol. 11, no. 6, p. 2758, 2021.
-  M. Wasimuddin, K. Elleithy, A. Abuzneid, M. Faezipour, and O. Abuzaghleh, “Multiclass ecg signal analysis using global average-based 2-d convolutional neural network modeling,” Electronics, vol. 10, no. 2, p. 170, 2021.
-  M. Längkvist, L. Karlsson, and A. Loutfi, “A review of unsupervised feature learning and deep learning for time-series modeling,” Pattern Recognition Letters, vol. 42, pp. 11–24, 2014.
-  R. Salloum and C.-C. J. Kuo, “Ecg-based biometrics using recurrent neural networks,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). IEEE, 2017, pp. 2062–2066.
-  J. Huang, B. Chen, B. Yao, and W. He, “Ecg arrhythmia classification using stft-based spectrogram and convolutional neural network,” IEEE Access, vol. 7, pp. 92 871–92 880, 2019.
-  Z. Ahmad and N. Khan, “Multi-level stress assessment using multi-domain fusion of ecg signal,” in 42nd Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE, 2020, pp. 4518–4521.
-  H. Dang, M. Sun, G. Zhang, X. Zhou, Q. Chang, and X. Xu, “A novel deep convolutional neural network for arrhythmia classification,” in 2019 International Conference on Advanced Mechatronic Systems (ICAMechS). IEEE, 2019, pp. 7–11.
-  P. De Chazal, M. O’Dwyer, and R. B. Reilly, “Automatic classification of heartbeats using ecg morphology and heartbeat interval features,” IEEE transactions on biomedical engineering, vol. 51, no. 7, pp. 1196–1206, 2004.
-  Y. Xia and Y. Xie, “A novel wearable electrocardiogram classification system using convolutional neural networks and active learning,” IEEE Access, vol. 7, pp. 7989–8001, 2019.
-  S. Kiranyaz, T. Ince, and M. Gabbouj, “Real-time patient-specific ecg classification by 1-d convolutional neural networks,” IEEE Transactions on Biomedical Engineering, vol. 63, no. 3, pp. 664–675, 2015.
-  U. R. Acharya, H. Fujita, S. L. Oh, Y. Hagiwara, J. H. Tan, and M. Adam, “Application of deep convolutional neural network for automated detection of myocardial infarction using ecg signals,” Information Sciences, vol. 415, pp. 190–198, 2017.
-  M. Kachuee, S. Fazeli, and M. Sarrafzadeh, “Ecg heartbeat classification: A deep transferable representation,” in 2018 IEEE International Conference on Healthcare Informatics (ICHI). IEEE, 2018, pp. 443–444.
-  B. Pourbabaee, M. J. Roshtkhari, and K. Khorasani, “Deep convolutional neural networks and learning ecg features for screening paroxysmal atrial fibrillation patients,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, vol. 48, no. 12, pp. 2095–2104, 2018.
-  R. K. Tripathy, A. Bhattacharyya, and R. B. Pachori, “Localization of myocardial infarction from multi-lead ecg signals using multiscale analysis and convolutional neural network,” IEEE Sensors Journal, vol. 19, no. 23, pp. 11 437–11 448, 2019.
-  Y. Chen, H. Chen, Z. He, C. Yang, and Y. Cao, “Multi-channel lightweight convolution neural network for anterior myocardial infarction detection,” in 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI). IEEE, 2018, pp. 572–578.
-  A. M. Shaker, M. Tantawi, H. A. Shedeed, and M. F. Tolba, “Generalization of convolutional neural networks for ecg classification using generative adversarial networks,” IEEE Access, vol. 8, pp. 35 592–35 605, 2020.
-  H. Wang, H. Shi, K. Lin, C. Qin, L. Zhao, Y. Huang, and C. Liu, “A high-precision arrhythmia classification method based on dual fully connected neural network,” Biomedical Signal Processing and Control, vol. 58, p. 101874, 2020.
-  C. Chen, Z. Hua, R. Zhang, G. Liu, and W. Wen, “Automated arrhythmia classification based on a combination network of cnn and lstm,” Biomedical Signal Processing and Control, vol. 57, p. 101819, 2020.
-  M. Porumb, E. Iadanza, S. Massaro, and L. Pecchia, “A convolutional neural network approach to detect congestive heart failure,” Biomedical Signal Processing and Control, vol. 55, p. 101597, 2020.
-  C. Hao, S. Wibowo, M. Majmudar, and K. S. Rajput, “Spectro-temporal feature based multi-channel convolutional neural network for ecg beat classification,” in 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). IEEE, 2019, pp. 5642–5645.
-  A. T. Oliveira, E. G. Nobrega et al., “A novel arrhythmia classification method based on convolutional neural networks interpretation of electrocardiogram images,” in IEEE International conference on industrial technology. Piscataway, NJ, 2019.
-  M. M. Al Rahhal, Y. Bazi, H. Almubarak, N. Alajlan, and M. Al Zuair, “Dense convolutional networks with focal loss and image generation for electrocardiogram classification,” IEEE Access, vol. 7, pp. 182 225–182 237, 2019.
-  A. Diker, Z. Cömert, E. Avcı, M. Toğaçar, and B. Ergen, “A novel application based on spectrogram and convolutional neural network for ecg classification,” in 2019 1st International Informatics and Software Engineering Conference (UBMYK). IEEE, 2019, pp. 1–6.
-  W. Liu, M. Zhang, Y. Zhang, Y. Liao, Q. Huang, S. Chang, H. Wang, and J. He, “Real-time multilead convolutional neural network for myocardial infarction detection,” IEEE journal of biomedical and health informatics, vol. 22, no. 5, pp. 1434–1444, 2017.
-  X. Zhai and C. Tin, “Automated ecg classification using dual heartbeat coupling based on convolutional neural network,” IEEE Access, vol. 6, pp. 27 465–27 472, 2018.
-  W. Sun, N. Zeng, and Y. He, “Morphological arrhythmia automated diagnosis method using gray-level co-occurrence matrix enhanced convolutional neural network,” IEEE Access, vol. 7, pp. 67 123–67 129, 2019.
-  E. Izci, M. A. Ozdemir, M. Degirmenci, and A. Akan, “Cardiac arrhythmia detection from 2d ecg images by using deep learning technique,” in 2019 Medical Technologies Congress (TIPTEKNO). IEEE, 2019, pp. 1–4.
-  B. M. Mathunjwa, Y.-T. Lin, C.-H. Lin, M. F. Abbod, and J.-S. Shieh, “Ecg arrhythmia classification by using a recurrence plot and convolutional neural network,” Biomedical Signal Processing and Control, vol. 64, p. 102262, 2021.
-  X. Fan, Q. Yao, Y. Cai, F. Miao, F. Sun, and Y. Li, “Multiscaled fusion of deep convolutional neural networks for screening atrial fibrillation from single lead short ecg recordings,” IEEE journal of biomedical and health informatics, vol. 22, no. 6, pp. 1744–1753, 2018.
-  R. Wang, J. Fan, and Y. Li, “Deep multi-scale fusion neural network for multi-class arrhythmia detection,” IEEE Journal of Biomedical and Health Informatics, 2020.
-  F. Li, J. Wu, M. Jia, Z. Chen, and Y. Pu, “Automated heartbeat classification exploiting convolutional neural network with channel-wise attention,” IEEE Access, vol. 7, pp. 122 955–122 963, 2019.
-  A. Uyar and F. Gurgen, “Arrhythmia classification using serial fusion of support vector machines and logistic regression,” in 2007 4th IEEE Workshop on Intelligent Data Acquisition and Advanced Computing Systems: Technology and Applications. IEEE, 2007, pp. 560–565.
-  Y. Zhao, X. Yin, and Y. Xu, “Electrocardiograph (ecg) recognition based on graphical fusion with geometric algebra,” in 2017 4th International Conference on Information Science and Control Engineering (ICISCE). IEEE, 2017, pp. 1482–1486.
-  R. Wang, Q. Yao, X. Fan, and Y. Li, “Multi-class arrhythmia detection based on neural network with multi-stage features fusion,” in 2019 IEEE International Conference on Systems, Man and Cybernetics (SMC). IEEE, 2019, pp. 4082–4087.
-  N. Manshor, A. A. Halin, M. Rajeswari, and D. Ramachandram, “Feature selection via dimensionality reduction for object class recognition,” in 2011 2nd International Conference on Instrumentation, Communications, Information Technology, and Biomedical Engineering. IEEE, 2011, pp. 223–227.
Z. Wang and T. Oates, “Imaging time-series to improve classification and imputation,” in
Twenty-Fourth International Joint Conference on Artificial Intelligence, 2015.
-  C.-L. Yang, Z.-X. Chen, and C.-Y. Yang, “Sensor classification using convolutional neural network by encoding multivariate time series as two-dimensional colored images,” Sensors, vol. 20, no. 1, p. 168, 2020.
-  J. Eckmann, S. O. Kamphorst, D. Ruelle et al., “Recurrence plots of dynamical systems,” World Scientific Series on Nonlinear Science Series A, vol. 16, pp. 441–446, 1995.
-  Recuplots and cnns for time-series classification. [Online]. Available: https://www.kaggle.com/tigurius/recuplots-and-cnns-for-time-series-classification
-  Z. Wang and T. Oates, “Encoding time series as images for visual inspection and classification using tiled convolutional neural networks,” in Workshops at the Twenty-Ninth AAAI Conference on Artificial Intelligence, 2015.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “Imagenet classification with deep convolutional neural networks,” inAdvances in neural information processing systems, 2012, pp. 1097–1105.
-  Z. Ahmad and N. Khan, “Cnn based multistage gated average fusion (mgaf) for human action recognition using depth and inertial sensors,” IEEE Sensors Journal, 2020.
-  H. B. Mitchell, Image fusion: theories, techniques and applications. Springer Science & Business Media, 2010.
-  A. L. Goldberger, L. A. Amaral, L. Glass, J. M. Hausdorff, P. C. Ivanov, R. G. Mark, J. E. Mietus, G. B. Moody, C.-K. Peng, and H. E. Stanley, “Physiobank, physiotoolkit, and physionet: components of a new research resource for complex physiologic signals,” circulation, vol. 101, no. 23, pp. e215–e220, 2000.
-  G. B. Moody and R. G. Mark, “The impact of the mit-bih arrhythmia database,” IEEE Engineering in Medicine and Biology Magazine, vol. 20, no. 3, pp. 45–50, 2001.
-  R. Bousseljot, D. Kreiseler, and A. Schnabel, “Nutzung der ekg-signaldatenbank cardiodat der ptb über das internet,” Biomedizinische Technik/Biomedical Engineering, vol. 40, no. s1, pp. 317–318, 1995.
-  Ecg heartbeat categorization dataset. [Online]. Available: https://www.kaggle.com/shayanfazeli/heartbeat
-  A. for the Advancement of Medical Instrumentation et al., “Testing and reporting performance results of cardiac rhythm and st segment measurement algorithms,” ANSI/AAMI EC38, vol. 1998, 1998.
-  N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, “Smote: synthetic minority over-sampling technique,” Journal of artificial intelligence research, vol. 16, pp. 321–357, 2002.
-  X. Xu, S. Jeong, and J. Li, “Interpretation of electrocardiogram (ecg) rhythm by combined cnn and bilstm,” IEEE Access, vol. 8, pp. 125 380–125 388, 2020.
R. He, Y. Liu, K. Wang, N. Zhao, Y. Yuan, Q. Li, and H. Zhang, “Automatic detection of qrs complexes using dual channels based on u-net and bidirectional long short-term memory,”IEEE Journal of Biomedical and Health Informatics, 2020.
-  F. Qiao, B. Li, Y. Zhang, H. Guo, W. Li, and S. Zhou, “A fast and accurate recognition of ecg signals based on elm-lrf and blstm algorithm,” IEEE Access, vol. 8, pp. 71 189–71 198, 2020.
-  J. Kojuri, R. Boostani, P. Dehghani, F. Nowroozipour, and N. Saki, “Prediction of acute myocardial infarction with artificial neural networks in patients with nondiagnostic electrocardiogram,” Journal of Cardiovascular Disease Research, vol. 6, no. 2, 2015.
-  Y. Cao, T. Wei, N. Lin, D. Zhang, and J. J. Rodrigues, “Multi-channel lightweight convolutional neural network for remote myocardial infarction monitoring,” in 2020 IEEE Wireless Communications and Networking Conference Workshops (WCNCW). IEEE, 2020, pp. 1–6.
-  M. A. Ahamed, K. A. Hasan, K. F. Monowar, N. Mashnoor, and M. A. Hossain, “Ecg heartbeat classification using ensemble of efficient machine learning approaches on imbalanced datasets,” in 2020 2nd International Conference on Advanced Information and Communication Technology (ICAICT). IEEE, 2020, pp. 140–145.
E. Akbas and F. T. Y. Vural, “Automatic image annotation by ensemble of visual
2007 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 2007, pp. 1–8.
-  Z. Ahmad and N. Khan, “Towards improved human action recognition using convolutional neural networks and multimodal fusion of depth and inertial sensor data,” in 2018 IEEE International Symposium on Multimedia (ISM). IEEE, 2018, pp. 223–230.