Up to 35% of around 60 million epileptic patients are not under effective medical treatment due to the drug refractory[ngugi2010estimation], [kwan2000early], [assi2017towards]. Epileptic patient may suffer from severe comorbidities, injuries and anxiety due to sudden seizure onset[racine1972modification]. Hence, it is important to have an effective method of seizure prediction. EEG signals, commonly used for seizure prediction, can represent brain activity of epileptic patient[mirowski2009classification]. The recorded typical EEG signals of an epileptic patient can be divided into four states: Interictal (between seizures), Preictal (before seizure), Ictal (seizure) and Post-ictal (after seizure)[mormann2006seizure]. The preliminary goal of seizure prediction is to distinguish between interictal and preictal states. Most recently published seizure prediction methods are based on EEG or Intracortical EEG signals include two main steps. The first one is called feature extraction, which is used to extract features from the raw signals[shahnaz2015seizure]truong2018convolutional]. The second step consists of either classification based on the selected features, algorithms such as Rule-based decision[assi2017towards], Threshold Crossing[eftekhar2014ngram]
and Support Vector Machine (SVM)[sharif2017prediction].
Recently, deep learning algorithms have been used for EEG signals analysis, where the most representative algorithm is Convolutional neural network (CNN). Truong et al. used STFT and CNN with 2D convolution to process both EEG and IcEEG signals[truong2018convolutional]. Eberlein et al. processed raw EEG signals time domain with a deep CNN to predict seizure onset[eberlein2018convolutional]. Truong et al. used Integer CNN and binary weights CNN for seizure detection, where state-of-the-art results were achieved[truong2018integer]. Hossain et al. used 1D and 2D mixed convolution for seizure detection and got good results[hossain2019applying].
Although some good results have been obtained, these algorithms still have some drawbacks. Firstly, many deep learning classification algorithms still need extra feature extraction steps[truong2018convolutional],[tsou2019epilepsy]
. Secondly, most reported works only adopt network architecture from computer vision and fail to consider the accurate characteristics of EEG signals[korshunova2017towards]. It is important to notice that most algorithms for seizure prediction are not hardware-friendly oriented due to the large number of high precision floating point parameters and the corresponding complex computation[marni2018real].
In this paper, we propose Binary Single-dimensional Convolutional Neural Network (BSDCNN) trained with raw EEG data to predict seizure onset. Firstly, the conventional feature extraction is skipped in this work. Instead raw EEG data is directly used as input without any steps of preprocessing. Secondly, BSDCNN utilizes 1D convolution to better match the characteristic of EEG signals. Theoretical explanation is given. Thirdly, weights and activation values are binarized which reduce the scale of parameter and the computational complexity significantly. The remaining sections of this paper are organized as follows. Section II introduces proposed design method and used datasets. Results evaluation and comparison with other works are described in Section III. The last section concludes this paper.
Ii Proposed Method
The purpose of seizure prediction is to distinguish between preictal and interictal brain states. Two popular datasets of seizure prediction were used in this research: the American Epilepsy Society Seizure Prediction Challenge (AES) [brinkmann2016crowdsourcing] and the CHB-MIT one[goldberger2000physiobank]. Details of the proposed BSDCNN algorithm will be described in this section.
AES dataset contains EEG data collected from five canines and two human subjects with epilepsy. The EEG data were recorded with 16 (or 15) electrodes and 400 (or 5000) Hz sampling rate[karoly2017circadian]. As shown in Fig. 1, there are two important parameters, namely seizure prediction horizon (SPH) and preictal interval length (PIL)[assi2015hybrid]. SPH defines the interval between preictal and seizure onset, while PIL is the length of preictal state. In this dataset, SPH and PIL are set to 5 minutes and 1 hour, respectively. Samples are extracted from intericatal and preictal intervals respectively with fixed 20-seconds time window, then the matrix is used as input data.
CHB-MIT dataset contains scalp EEG data of 23 measurements from 22 patients. All measurements are recorded at 256Hz sampling rate[furbass2015prospective]. Fixed 23-electrode configurations are used in 15 measurements, while there are some changes in electrode configuration for the remaining measurements[alickovic2018performance]. Moreover, we only consider measurements with no less than 3 lead seizures and exclude one subject (chb06) due to its absence from comparison with latest works. Consequently, only measurement from 6 subjects were selected in this work. We set 5min SPH and 30min PIL for fair result comparison to other state-of-the-art works. Fixed 20s time window is also used in this dataset, the shape of input data is .
The ratio between preictal and interictal is about 5:1 and 4:1 in AES dataset and CHB-MIT dataset respectively. The training model is likely to be unsatisfactory if trained with imbalanced data. The solution to this challenge in this work is presented as follows.
Ii-B Single-dimensional CNN Model
Two-dimensional convolution kernel allows to deliver the excellent performance in image recognition. However, it hardly has requested best performance in seizure prediction for the following two reasons. Firstly, the two dimensions of EEG data are different from the image data. In image data, the two dimensions are both recording of pixels. However, the two dimensions of EEG data have different meanings of time and channel, respectively. We argue that mixing up the channel and time dimension will decrease the prediction performance. It is demonstrated in our experiment in section III. Secondly, single-dimensional convolutional kernel has the same resolution for each line of input, while 2D convolutional kernel extracts less information of edge pixels compared with internal pixels. The edge sampling points of EEG signals contain equivalent information compared with interior sampling points, while the edge pixels in the image often contain less information than interior pixels. This is why 1D convolution kernel has great performance in image classification, but it has some drawbacks in EEG signals analysis.
Ii-C Binary Single-dimensional Convolutional Neural Network for Seizure Prediction
Binary convolutional neural network (BCNN) uses binary activation values and weights in place of activation values and weights of full-precision. Generally speaking, the weights and activation values of BCNN are constrained to +1 or -1[rastegari2016xnor]. In CNN, most of computation time and resource are intended for the Multiply Accumulate (MAC) operation, binarize the activation values and weights can reduce computation time and complexity significantly[lin2017towards]. In addition, the hardware implementation of BCNN is easier than a full-precision CNN, it also has lower hardware resource and power consumption requirements[yu2016binary].
In image classification task, BCNN has achieved good results when compared with full-precision networks. CIFAR10 (the full precision is [courbariaux2016binarized]) and YOLOv2 (the full precision is [nakahara2018lightweight]). Motivated by building a model with less parameter scale and less computation for seizure prediction, BCNN is used in this work.
Signum function is used for the binarization of weights and activations values. Because of the use of signum function, the gradient will be 0 when back propagation. To make the gradient can be propagated, we must limit the gradient by the following equation[courbariaux2016binarized]:
In back propagation, signum function is replaced by equation (1). It means that the gradient of this node is constrained to 1 when input is between -1 and 1, while the gradient of this node is 0 at other input values. It should be noted that in the training process, we also retain the full-precision weights. When back propagation, the full-precision weights are updated in back propagation, while the binarization are applied only in forward propagation. Since the impact on the difference in the distribution of data for each batch, the speed of convergence and training effect will be affected greatly when binarization of activation values. Batch normalization is added before signum activation function in every convolutional block, which improves the convergence speed and training effect of the model[ioffe2015batch].
Single-dimensional convolution will continue to be used in BCNN. However, we use smaller convolution kernels instead of previous ones and replace the pooling layer with strided convolution layer to get better performance. This unified form may have some advantages in hardware implementation. In addition to the first convolutional layer, strided convolution layer and dense layers, the parameters of the remaining layers are carried out to binarization.
Based on the above methods, we propose binary single-dimensional convolutional neural network (BSDCNN), which combines the abilities of single-dimensional convolution for feature extraction of EEG signals and the advantages of computational complexity and power consumption of BCNN. The network structure is shown in Fig. 1. To reduce hardware resource consumption, the proposed network takes raw EEG signals as input. There are five convolution blocks. Each convolutional block consists of a convolution layer and a strided convolution layer. Batch normalization layer was added after every convolutional layer. Except for the first convolution block, the parameters of other convolution blocks are binary. The first three blocks are convolution of time dimension. The size of convolution kernels are , and , respectively and the amount of convolution kernels are 16, 32 and 64, respectively. The last two convolutional blocks are convolution of channel dimension. The size of convolution kernels are and the amount of convolution kernels are 128 and 256, respectively. Following that, there are three fully connected layers with sigmoid activation function and output sizes of 256, 64 and 2. The results and discussion are shown in section III.
1D: Single-dimensional convolution; 2D: Two-dimensional convolution; 1D-2D: Single-dimensional convolution of time dimension and Two-dimensional convolution of channel dimension; AUC: Area Under Curve
In order to overcome problems of imbalanced dataset and improve the performance, when we extract preictal samples to make up training dataset with 5s overlapping, but interictal samples are extracted without overlapping. This is equivalent to oversampling the small amount of preictal data. In addition, the 16 training data are randomly extracted from both interictal and preictal for training. Accordingly, this method ensures that 32 training data of every batch will include the same number of two types of data. By using these methods, the influence of imbalanced data is reduced to some extent.
In order to compare with other state-of-the-art works, we evaluate our model with the following metrics: sensitivity, false prediction rate (FPR) and area under curve (AUC).
Table I shows the AUC results of using 1D or 2D convolution kernel in time or in channel dimensions. Previous researchers only mentioned that mixed convolution of time and spatial dimensions affect performance [eberlein2018convolutional], [cecotti2010convolutional]. However, this does not apply when the used time dimension convolution kernel is based on two-dimensional structure. This is because the previous two-dimensional convolution in the first three convolutional blocks does not extract enough information for the first and last rows. Therefore, in the later convolution process, some useful information in time domain can still be extracted by using two-dimensional convolution kernel. The benefits of channel dimension one-dimensional convolution kernel are limited, because the number of channels is much smaller than the number of sampling points. If the single-dimensional time convolution kernel is used in first three convolutional blocks, then enough time information can be extracted, the use of single-dimensional channel convolution kernel in the following convolutional process will have a good performance.
|Layer||Parameter type||Mem||Layer||Parameter type||Mem|
Conv: Convolutional layer; Pool: Max-pooling layer;
AUC: Area Under Curve; SConv: Strided convolutional layer;
BConv: Binary Convolutional layer with batch normalization layer;
BSConv: Binary Strided Convolutional layer with batch normalization layer
TABLE II compares the parameter scales of convolutional block between BSDCNN and high precision CNN with 32bit floating points called SDCNN. Compared to high precision CNN, 7.2 times reduction of parameter memory by using BSDCNN. Moreover, it also reduces the computation about 25.5 times through calculation of bit number multiplication and accumulation. It is important to note that all our calculations include batch normalization layer. The test results of SDCNN and BSDCNN in different datasets are shown in the Fig. 2 and the average AUC of SDCNN and BSDCNN are shown in TABLE IV. As can be seen from the results, using BSDCNN only reduces the average AUC of 2.74% compared to using SDCNN.
|Method||STFT+CNN[truong2018convolutional]||CNN[eberlein2018convolutional]||BSDCNN (This work)|
STFT: Short-Time Fourier Transform; AUC: Area Under Curve; SEN:Sensitivity; FPR: False Prediction Rate
Average AUC in this work is the mean AUC of Dog1 to Dog4. The average AUC of all subjects is 0.946.
|Method||STFT+CNN[truong2018convolutional]||Wavelet+CNN[khan2017focal]||BSDCNN (This work)|
STFT: Short-Time Fourier Transform; AUC: Area Under Curve; SEN: Sensitivity; FPR: False Prediction Rate
Average FPR, AUC and SEN in this work is the mean of corresponding samples. The average FPR, AUC and SEN of all subjects is 0.10, 0.965 and 95.22 respectively.
Finally, we compare our model with other recent state-of-the-art works based on CNN. Table III and Table IV demonstrate comparisons of performance on AES and CHB-MIT datasets respectively. Truong et al. [truong2018convolutional] and Khan et al.[khan2017focal] use extra feature extraction steps, while Eberlein et al.[eberlein2018convolutional] use raw EEG data directly. The proposed method achieved the highest AUC, sensitivity and FPR/h among others.
In this work, Binary Single-dimensional Convolutional Neural Network (BSDCNN) for seizure prediction has been proposed. By using binary neural network with single-dimensional convolutional kernels, the proposed model for seizure prediction model, compared with full-precision one, gets better results than all available models. This method greatly reduces 7.2 times the size of the parameters and 25.5 times the computation complexity with only precision loss of 2.74%. Comparing with recent state-of-the-art works, the proposed BSDCNN presents better performance. Moreover, the theoretical explanation of single-dimensional convolution shows better performance in seizure prediction compared with two-dimensional convolution model.
The authors would like to acknowledge start-up funds from Westlake University to the Cutting-Edge Net of Biomedical Research and INnovation (CenBRAIN) to support this project.