I Introduction
Up to 35% of around 60 million epileptic patients are not under effective medical treatment due to the drug refractory[ngugi2010estimation], [kwan2000early], [assi2017towards]. Epileptic patient may suffer from severe comorbidities, injuries and anxiety due to sudden seizure onset[racine1972modification]. Hence, it is important to have an effective method of seizure prediction. EEG signals, commonly used for seizure prediction, can represent brain activity of epileptic patient[mirowski2009classification]. The recorded typical EEG signals of an epileptic patient can be divided into four states: Interictal (between seizures), Preictal (before seizure), Ictal (seizure) and Postictal (after seizure)[mormann2006seizure]. The preliminary goal of seizure prediction is to distinguish between interictal and preictal states. Most recently published seizure prediction methods are based on EEG or Intracortical EEG signals include two main steps. The first one is called feature extraction, which is used to extract features from the raw signals[shahnaz2015seizure]
. For example, shorttime Fourier transform (STFT) is used to transfer time domain signals to frequency domain features
[truong2018convolutional]. The second step consists of either classification based on the selected features, algorithms such as Rulebased decision[assi2017towards], Threshold Crossing[eftekhar2014ngram]and Support Vector Machine (SVM)
[sharif2017prediction].Recently, deep learning algorithms have been used for EEG signals analysis, where the most representative algorithm is Convolutional neural network (CNN). Truong et al. used STFT and CNN with 2D convolution to process both EEG and IcEEG signals[truong2018convolutional]. Eberlein et al. processed raw EEG signals time domain with a deep CNN to predict seizure onset[eberlein2018convolutional]. Truong et al. used Integer CNN and binary weights CNN for seizure detection, where stateoftheart results were achieved[truong2018integer]. Hossain et al. used 1D and 2D mixed convolution for seizure detection and got good results[hossain2019applying].
Although some good results have been obtained, these algorithms still have some drawbacks. Firstly, many deep learning classification algorithms still need extra feature extraction steps[truong2018convolutional],[tsou2019epilepsy]
. Secondly, most reported works only adopt network architecture from computer vision and fail to consider the accurate characteristics of EEG signals
[korshunova2017towards]. It is important to notice that most algorithms for seizure prediction are not hardwarefriendly oriented due to the large number of high precision floating point parameters and the corresponding complex computation[marni2018real].In this paper, we propose Binary Singledimensional Convolutional Neural Network (BSDCNN) trained with raw EEG data to predict seizure onset. Firstly, the conventional feature extraction is skipped in this work. Instead raw EEG data is directly used as input without any steps of preprocessing. Secondly, BSDCNN utilizes 1D convolution to better match the characteristic of EEG signals. Theoretical explanation is given. Thirdly, weights and activation values are binarized which reduce the scale of parameter and the computational complexity significantly. The remaining sections of this paper are organized as follows. Section II introduces proposed design method and used datasets. Results evaluation and comparison with other works are described in Section III. The last section concludes this paper.
Ii Proposed Method
The purpose of seizure prediction is to distinguish between preictal and interictal brain states. Two popular datasets of seizure prediction were used in this research: the American Epilepsy Society Seizure Prediction Challenge (AES) [brinkmann2016crowdsourcing] and the CHBMIT one[goldberger2000physiobank]. Details of the proposed BSDCNN algorithm will be described in this section.
Iia Datasets
AES dataset contains EEG data collected from five canines and two human subjects with epilepsy. The EEG data were recorded with 16 (or 15) electrodes and 400 (or 5000) Hz sampling rate[karoly2017circadian]. As shown in Fig. 1, there are two important parameters, namely seizure prediction horizon (SPH) and preictal interval length (PIL)[assi2015hybrid]. SPH defines the interval between preictal and seizure onset, while PIL is the length of preictal state. In this dataset, SPH and PIL are set to 5 minutes and 1 hour, respectively. Samples are extracted from intericatal and preictal intervals respectively with fixed 20seconds time window, then the matrix is used as input data.
CHBMIT dataset contains scalp EEG data of 23 measurements from 22 patients. All measurements are recorded at 256Hz sampling rate[furbass2015prospective]. Fixed 23electrode configurations are used in 15 measurements, while there are some changes in electrode configuration for the remaining measurements[alickovic2018performance]. Moreover, we only consider measurements with no less than 3 lead seizures and exclude one subject (chb06) due to its absence from comparison with latest works. Consequently, only measurement from 6 subjects were selected in this work. We set 5min SPH and 30min PIL for fair result comparison to other stateoftheart works. Fixed 20s time window is also used in this dataset, the shape of input data is .
The ratio between preictal and interictal is about 5:1 and 4:1 in AES dataset and CHBMIT dataset respectively. The training model is likely to be unsatisfactory if trained with imbalanced data. The solution to this challenge in this work is presented as follows.
IiB Singledimensional CNN Model
Twodimensional convolution kernel allows to deliver the excellent performance in image recognition. However, it hardly has requested best performance in seizure prediction for the following two reasons. Firstly, the two dimensions of EEG data are different from the image data. In image data, the two dimensions are both recording of pixels. However, the two dimensions of EEG data have different meanings of time and channel, respectively. We argue that mixing up the channel and time dimension will decrease the prediction performance. It is demonstrated in our experiment in section III. Secondly, singledimensional convolutional kernel has the same resolution for each line of input, while 2D convolutional kernel extracts less information of edge pixels compared with internal pixels. The edge sampling points of EEG signals contain equivalent information compared with interior sampling points, while the edge pixels in the image often contain less information than interior pixels. This is why 1D convolution kernel has great performance in image classification, but it has some drawbacks in EEG signals analysis.
IiC Binary Singledimensional Convolutional Neural Network for Seizure Prediction
Binary convolutional neural network (BCNN) uses binary activation values and weights in place of activation values and weights of fullprecision. Generally speaking, the weights and activation values of BCNN are constrained to +1 or 1[rastegari2016xnor]. In CNN, most of computation time and resource are intended for the Multiply Accumulate (MAC) operation, binarize the activation values and weights can reduce computation time and complexity significantly[lin2017towards]. In addition, the hardware implementation of BCNN is easier than a fullprecision CNN, it also has lower hardware resource and power consumption requirements[yu2016binary].
In image classification task, BCNN has achieved good results when compared with fullprecision networks. CIFAR10 (the full precision is [courbariaux2016binarized]) and YOLOv2 (the full precision is [nakahara2018lightweight]). Motivated by building a model with less parameter scale and less computation for seizure prediction, BCNN is used in this work.
Signum function is used for the binarization of weights and activations values. Because of the use of signum function, the gradient will be 0 when back propagation. To make the gradient can be propagated, we must limit the gradient by the following equation[courbariaux2016binarized]:
(1) 
In back propagation, signum function is replaced by equation (1). It means that the gradient of this node is constrained to 1 when input is between 1 and 1, while the gradient of this node is 0 at other input values. It should be noted that in the training process, we also retain the fullprecision weights. When back propagation, the fullprecision weights are updated in back propagation, while the binarization are applied only in forward propagation. Since the impact on the difference in the distribution of data for each batch, the speed of convergence and training effect will be affected greatly when binarization of activation values. Batch normalization is added before signum activation function in every convolutional block, which improves the convergence speed and training effect of the model[ioffe2015batch].
Singledimensional convolution will continue to be used in BCNN. However, we use smaller convolution kernels instead of previous ones and replace the pooling layer with strided convolution layer to get better performance. This unified form may have some advantages in hardware implementation. In addition to the first convolutional layer, strided convolution layer and dense layers, the parameters of the remaining layers are carried out to binarization.
Based on the above methods, we propose binary singledimensional convolutional neural network (BSDCNN), which combines the abilities of singledimensional convolution for feature extraction of EEG signals and the advantages of computational complexity and power consumption of BCNN. The network structure is shown in Fig. 1. To reduce hardware resource consumption, the proposed network takes raw EEG signals as input. There are five convolution blocks. Each convolutional block consists of a convolution layer and a strided convolution layer. Batch normalization layer was added after every convolutional layer. Except for the first convolution block, the parameters of other convolution blocks are binary. The first three blocks are convolution of time dimension. The size of convolution kernels are , and , respectively and the amount of convolution kernels are 16, 32 and 64, respectively. The last two convolutional blocks are convolution of channel dimension. The size of convolution kernels are and the amount of convolution kernels are 128 and 256, respectively. Following that, there are three fully connected layers with sigmoid activation function and output sizes of 256, 64 and 2. The results and discussion are shown in section III.
1D–1D  1D–2D  2D–1D  2D–2D  

Dog1  0.94  0.92  0.78  0.82 
Dog2  0.99  0.98  0.96  0.97 
Dog3  0.97  0.97  0.89  0.91 
Average  0.977  0.957  0.877  0.900 

1D: Singledimensional convolution; 2D: Twodimensional convolution; 1D2D: Singledimensional convolution of time dimension and Twodimensional convolution of channel dimension; AUC: Area Under Curve
IiD Training
In order to overcome problems of imbalanced dataset and improve the performance, when we extract preictal samples to make up training dataset with 5s overlapping, but interictal samples are extracted without overlapping. This is equivalent to oversampling the small amount of preictal data. In addition, the 16 training data are randomly extracted from both interictal and preictal for training. Accordingly, this method ensures that 32 training data of every batch will include the same number of two types of data. By using these methods, the influence of imbalanced data is reduced to some extent.
The model was trained for 20 epochs. For one epoch, the training process takes average 605s and 76s on AES and CHBMIT dataset respectively. Proposed model is implemented in Python 3.6 with use of Keras 2.2 with a Tensorflow 1.13 backend on single NVIDIA 2080Ti GPU
[abadi2016tensorflow].Iii Results
In order to compare with other stateoftheart works, we evaluate our model with the following metrics: sensitivity, false prediction rate (FPR) and area under curve (AUC).
Table I shows the AUC results of using 1D or 2D convolution kernel in time or in channel dimensions. Previous researchers only mentioned that mixed convolution of time and spatial dimensions affect performance [eberlein2018convolutional], [cecotti2010convolutional]. However, this does not apply when the used time dimension convolution kernel is based on twodimensional structure. This is because the previous twodimensional convolution in the first three convolutional blocks does not extract enough information for the first and last rows. Therefore, in the later convolution process, some useful information in time domain can still be extracted by using twodimensional convolution kernel. The benefits of channel dimension onedimensional convolution kernel are limited, because the number of channels is much smaller than the number of sampling points. If the singledimensional time convolution kernel is used in first three convolutional blocks, then enough time information can be extracted, the use of singledimensional channel convolution kernel in the following convolutional process will have a good performance.
SDCNN  BSDCNN  

Layer  Parameter type  Mem  Layer  Parameter type  Mem 
Conv1  Int  10K  Conv1  Int  2.6K 
Pool  –  –  SConv1  Int  40K 
Conv2  Int  320K  BConv1  Bin  2.6K 
Pool  –  –  BSConv1  Bin  5.1K 
Conv3  Int  640K  BConv2  Bin  10.3K 
Pool  –  –  BSConv2  Bin  20.3K 
Conv4  Int  768K  BConv3  Bin  40.5K 
Pool  –  –  BSConv3  Bin  80.5K 
Conv5  Int  3.1M  BConv4  Bin  161K 
Pool  –  –  BSConv4  Bin  321K 
Overall  –  4.84M  –  –  672K 
AUC  –  0.982  –  –  0.955 

Conv: Convolutional layer; Pool: Maxpooling layer;
AUC: Area Under Curve; SConv: Strided convolutional layer;
BConv: Binary Convolutional layer with batch normalization layer;
BSConv: Binary Strided Convolutional layer with batch normalization layer
TABLE II compares the parameter scales of convolutional block between BSDCNN and high precision CNN with 32bit floating points called SDCNN. Compared to high precision CNN, 7.2 times reduction of parameter memory by using BSDCNN. Moreover, it also reduces the computation about 25.5 times through calculation of bit number multiplication and accumulation. It is important to note that all our calculations include batch normalization layer. The test results of SDCNN and BSDCNN in different datasets are shown in the Fig. 2 and the average AUC of SDCNN and BSDCNN are shown in TABLE IV. As can be seen from the results, using BSDCNN only reduces the average AUC of 2.74% compared to using SDCNN.
Method  STFT+CNN[truong2018convolutional]  CNN[eberlein2018convolutional]  BSDCNN (This work)  

FPR(/h)  SEN(%)  AUC  SEN(%)  AUC  SEN(%)  FPR(/h)  
Dog1  0.17  50  0.798  –  0.88  77.90  0.23 
Dog2  0.01  100  0.812  –  0.97  95.13  0.14 
Dog3  0.05  58.3  0.844  –  0.95  81.64  0.07 
Dog4  0.41  78.6  0.919  –  0.86  77.32  0.20 
Dog5  0.07  80  –  –  1  98.41  0.03 
Pat1  0.36  100  –  –  0.99  96.86  0.04 
Pat2  0.86  66.7  –  –  0.97  97.58  0.11 
Average  0.276  76.2  0.843  –  0.915  89.26  0.117 

STFT: ShortTime Fourier Transform; AUC: Area Under Curve; SEN:Sensitivity; FPR: False Prediction Rate
Average AUC in this work is the mean AUC of Dog1 to Dog4. The average AUC of all subjects is 0.946.
Method  STFT+CNN[truong2018convolutional]  Wavelet+CNN[khan2017focal]  BSDCNN (This work)  

FPR(/h)  SEN(%)  AUC  SEN(%)  AUC  SEN(%)  FPR(/h)  
chb01  0.24  85.7  0.943  –  1  100  0.01 
chb05  0.16  80.0  0.988  –  0.97  90.60  0.08 
chb08  –  –  0.921  –  0.99  99.24  0.05 
chb10  0.00  33.3  0.855  –  0.96  94.26  0.12 
chb14  0.40  80.3  –  –  0.94  93.90  0.17 
chb22  –  –  0.877  –  0.93  93.29  0.19 
Average  0.20  69.83  0.917  –  0.970  94.69  0.095 

STFT: ShortTime Fourier Transform; AUC: Area Under Curve; SEN: Sensitivity; FPR: False Prediction Rate
Average FPR, AUC and SEN in this work is the mean of corresponding samples. The average FPR, AUC and SEN of all subjects is 0.10, 0.965 and 95.22 respectively.
Finally, we compare our model with other recent stateoftheart works based on CNN. Table III and Table IV demonstrate comparisons of performance on AES and CHBMIT datasets respectively. Truong et al. [truong2018convolutional] and Khan et al.[khan2017focal] use extra feature extraction steps, while Eberlein et al.[eberlein2018convolutional] use raw EEG data directly. The proposed method achieved the highest AUC, sensitivity and FPR/h among others.
Iv Conclusion
In this work, Binary Singledimensional Convolutional Neural Network (BSDCNN) for seizure prediction has been proposed. By using binary neural network with singledimensional convolutional kernels, the proposed model for seizure prediction model, compared with fullprecision one, gets better results than all available models. This method greatly reduces 7.2 times the size of the parameters and 25.5 times the computation complexity with only precision loss of 2.74%. Comparing with recent stateoftheart works, the proposed BSDCNN presents better performance. Moreover, the theoretical explanation of singledimensional convolution shows better performance in seizure prediction compared with twodimensional convolution model.
Acknowledgment
The authors would like to acknowledge startup funds from Westlake University to the CuttingEdge Net of Biomedical Research and INnovation (CenBRAIN) to support this project.