Early Seizure Detection with an Energy-Efficient Convolutional Neural Network on an Implantable Microcontroller

06/12/2018 ∙ by Maria Hügle, et al. ∙ University of Freiburg Universitätsklinikum Freiburg 0

Implantable, closed-loop devices for automated early detection and stimulation of epileptic seizures are promising treatment options for patients with severe epilepsy that cannot be treated with traditional means. Most approaches for early seizure detection in the literature are, however, not optimized for implementation on ultra-low power microcontrollers required for long-term implantation. In this paper we present a convolutional neural network for the early detection of seizures from intracranial EEG signals, designed specifically for this purpose. In addition, we investigate approximations to comply with hardware limits while preserving accuracy. We compare our approach to three previously proposed convolutional neural networks and a feature-based SVM classifier with respect to detection accuracy, latency and computational needs. Evaluation is based on a comprehensive database with long-term EEG recordings. The proposed method outperforms the other detectors with a median sensitivity of 0.96, false detection rate of 10.1 per hour and median detection delay of 3.7 seconds, while being the only approach suited to be realized on a low power microcontroller due to its parsimonious use of computational and memory resources.

READ FULL TEXT VIEW PDF
POST COMMENT

Comments

There are no comments yet.

Authors

page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Epilepsy is one of the most common neurological diseases [1] and a high percentage of patients with epilepsies are refractory to pharmaceutical therapy [2]. A new treatment option for patients with intractable epilepsy is closed-loop brain stimulation [3], with an additional advantage of only short intermittent interventions compared to traditional continuous pharmaceutical therapy. In order to interrupt seizures, a seizure detection algorithm can be used to trigger the intervention using intracranial Electroencephalography (EEG) data. Research on automatic seizure detection started with the objective of reducing workload for the review of long-term recordings in epilepsy monitoring units and moved together with the development of seizure prediction algorithms [4] towards the application in implantable devices [5, 6]

. Most of the approaches are based on handcrafted feature selection

[7, 8] or rules designed by experts [9]

. Driven by the success of deep learning, more recent approaches use deep convolutional networks or recurrent networks

[10, 11, 12]. However, most of these architectures are too demanding for the implementation on an implantable hardware platform. There are some approaches which are using small convolutional neural networks with only few layers and a small number of weights. The EEGNet of Lawhern et al. [13] is in principle transferable to an implantable hardware. However, the hardware implementation itself was not addressed in their work. Kiral-Kornek et al. [14] proposed two architectures for seizure prediction. One convolutional neural network, which was evaluated on a comprehensive data set and another, which can be implemented on a TrueNorth neuromorphic chip from IBM.

In this paper, we propose SeizureNet, which uses efficient layer combinations and has state-of-the-art detection performance. SeizureNet bridges the gap to an implant for seizure detection based on deep learning. To the best of our knowledge, we have designed the first convolutional neural network for seizure detection specifically for an implantable ultra-low power microcontroller. The proposed architecture exhibits low runtime and memory usage, but maintains high sensitivity in combination with a low false positive rate and a short detection delay for a successful stimulation in a later closed-loop application.

After describing the hardware and the dataset in the two following sections, we define the seizure detection problem and explain our detection pipeline, including preprocessing, model architecture, training and performance evaluation in section IV. We then show results in section V for the actual seizure detection performance, as well as a comparison of hardware properties such as runtime, memory, and energy consumption of our model and four other baselines. We also discuss limitations of the seizure detection device, before concluding in section VI.

Ii Hardware

For the hardware implementation of the network, a low power microcontroller MSP430FR5994 from Texas Instruments is used. Due to its power consumption of 118A/MHz in active mode and 0.5A in standby mode, it is suitable for the application in an implantable device where a heating of the surrounding tissue must be avoided. A further great advantage of the MSP430FR series is its ferromagnetic nonvolatile memory (FRAM). With a low-power consumption and fast write speed, a swift storage of hidden layer activations of a neural network can be implemented. However, the FRAM also limits the maximum clock speed of the controller as its reading speed is limited to  MHz. It is possible to run the controller with higher clock speeds but only with additional wait states for the CPU leading to a lower power efficiency. Another useful feature for the implementation of convolution layers is the -bit hardware multiplier of the controller, enabling power efficient multiply and accumulate (MAC) operations without CPU intervention.

Iii Dataset

The dataset used is the Epilepsiae database [15], containing long-term continuous intracranial EEG data. We evaluate our approach on 24 patients. Each recording has a duration between five and eleven days and contains the measurement of approximately 100 intracranial and scalp electrodes originally sampled with or resampled to Hz. During the two weeks, the evaluated patients had between seizures. To limit the amount of data for our experiments, we consider 100 minutes segments of the recordings around the seizures.

For every patient, we consider a subset of electrodes, which are selected a priori by expert epileptologists to cover the seizure onset zone(s). In case that less than four electrodes display the initial ictal EEG pattern, neighboring channels are included for seizure detection. The total number of electrodes is limited due to hardware limitations.

Iv Methods

Iv-a Seizure Detection Problem

Seizure detection can be modeled as time-series classification, where we classify ictal phases (seizures) and interictal phases (non-seizures). To create the inputs for the convolutional network, we process sliding windows over the EEG data , where is the recording duration and is the number of electrodes. second samples are created as input features with input dimensionality at time point , where is the sampling frequency and

the stride. The corresponding labels are

, where indicates a seizure at the end of sample and an interictal sample. The window lengths are chosen to keep the runtime of a forward pass low. To train our models, an overlap of is used, which equals a stride of seconds. Due to the hardware runtime limitations, we evaluate our models with a stride of 1s.

Iv-B Preprocessing

Compared to the conventional scalp EEG, intracranial EEG is less prone to artifacts like the pick-up of the electrocardiogram or electromyogram. However, a careful removal of noise and drifts in the data can facilitate the subsequent pattern classification task. Multiple preprocessing steps are performed to remove signal components which are not carrying relevant information and to adapt the signal statistics to intra-individual fluctuations. The power line noise at Hz is removed by applying a notch-filter on the raw EEG data to exclude this frequency. Subsequently, a highpass filter (

Hz) removes slow drifts. The data is then rescaled by dividing through the rolling 10 minutes standard deviation to account for non-stationarity in the source. Preliminary experiments showed this setting to perform well. The rolling mean and standard deviation can be computed for a time point

and window size as:

The input data is then normalized as follows:

where

, which equals 10 minutes of data. We normalize the scaled and standardized data via the hyperbolic tangent to reduce the influence of outliers and artifacts while maintaining a quasi-linear relation for most of the input distribution.

Iv-C Model Architecture

In order to find a good model architecture, we evaluated the runtime and memory requirements for various layer types like convolutions, dense layers, pooling layers and activations. The architecture of SeizureNet is shown in Table I. The proposed network is a deep convolutional network with alternating convolutional and pooling layers.

Layer Operation Output
Input
1
2
Dropout (0.2)
3
4
Dropout (0.2)
5
6
Dropout (0.2)
7
Dropout (0.2)
8
9 Sigmoid
Total Number of Parameters: 3,621
TABLE I: Architecture of SeizureNet for input electrodes.

Lawhern et al. [13] proposed convolutions over electrodes in the first layer. They used kernels of size , which is similar to approaches such as common spatial patterns [16]. We extend this by convolving over the electrodes and time, so that we can learn spatio-temporal patterns efficiently in one layer. In the last layer, we use convolutions instead of a fully-connected layer. This was introduced in [17] and is a parameter-efficient way to reduce dimensions [18]

. Rectified Linear Units (ReLu) are used in all hidden layers. Further, batch normalization is used after the convolutions. During training, dropout regularization is applied to reduce overfitting.

Iv-D Training

To deal with the high imbalance of ictal and interictal samples, we use an oversampling technique. Mini-batches are created by randomly picking ictal samples with probability

, and interictal samples otherwise. In order to learn to detect seizures as early as possible, we weight the ictal samples in the loss function according to their distances to the seizure onset. The weights decrease linearly from

for the onset to for the seizure offset.

In all experiments, the patient-specific models are trained. For evaluation, we use 3-fold cross validation. Each model is trained with a batch size of for steps (batches of samples), with a sampling probability for the number of seizures in each batch. For optimization, we use Adam [19] with a learning rate of and the binary cross-entropy loss.

Iv-E Detection Performance Evaluation

It is non-trivial to evaluate a seizure detection system. Mainly, three objectives should be optimized:

  • The sensitivity is defined as the ratio of actually detected seizures to the total number of seizures.

  • The detection delay is calculated as the mean delay over all detected seizures. For each detected seizure, the delay is defined as the expired time between the electrographic seizure onset identified through visual inspection by a domain expert, and the first algorithm-based detection of the seizure.

  • The false positive rate is the number of false detections per hour (fp/h).

Iv-F Approximations

For our hardware implementation, we have to approximate the rolling 10 minutes standard deviation due to memory limitations. We first approximate the rolling 10 minutes mean by computing a grand mean over -second means . Doing so, only means have to be stored for the targeted duration, while introducing a negligible error by ignoring the most recent samples until they form a new 1 second mean . The approximated mean can then be used to compute the 10 minutes standard deviation :

with .

Further, we use a linear approximation of the hyperbolic tangent in the preprocessing to avoid the need for a lookup-table:

Iv-G Comparison to other Approaches

For seizure detection, it is hard to compare approaches without evaluating on the same framework, due to factors like different evaluation methods, different datasets, omitted patients and patient-specific or across-patients training. Hence, we reimplemented all our baselines. We compare the performance of our architecture to three other convolutional neural networks, which we find most similar to our approach. In order to evaluate fairly, we use the same preprocessing and the sampling method for all convolutional networks. The architectures of the network baselines are shown in Table II

. Further, we compare to the performance of a support vector machine approach from the literature using handcrafted features

[8].

EEGNet Kiral-Kornek et al. Acharya et al.
Input Input Input
Transpose
Dropout Dropout
Dropout Dropout
Dropout Dropout
Dense
GlobalMaxPool2D Dense
Dense
Sigmoid Output Layer
Total Number of Parameters:
957 15,665 96,220
TABLE II: Architecture of the baseline networks.

Iv-G1 EEGNet

The first baseline is EEGNet, a small convolutional network by Lawhern et al. [13]

. The EEGNet shows robust performance across four different brain-computer interface classification tasks. It consists of three convolutional layers and two max pooling layers, where the first convolution estimates a set of spatial filters over the electrodes. To adapt their approach to our framework, we replace the softmax regression output layer by a sigmoid activation.

Iv-G2 Kiral-Kornek et al.

In [14], a convolutional network is evaluated for patient-specific seizure prediction based on spectrograms. To compare with their architecture, we use their approach for seizure detection instead of prediction. They proposed a network consisting of three alternating convolutional layers and three max-pooling layers using

spectrograms as input. As activation function, they use an Exponential Linear Unit (ELU)

[20]. To provide a proof-of-concept that they can implement their seizure prediction system on a low-power system, they adapt their network architecture to a 18-layer binary neural network, consisting only of convolutional layers and dropout. This architecture can run on the IBM TrueNorth Neurosynaptic System chip [21]. However, because their network uses binary weights, they need more layers and thus more parameters to reach the same precision. In total, they use over 4.2 million parameters, which would cost more than

kB for binary weights. Because this already exceeds our memory limit, we only use their small architecture as a baseline. To adapt this approach to seizure detection, we use 1 second windows instead of 30 second windows and Short-Time Fourier Transformation (STFT) to generate the spectrograms. Further, we focus on the comparison of the network architectures and thus don’t include time of day as an input feature to keep the input consistent across methods.

Iv-G3 Acharya et al.

In [11], a 13-layer convolutional neural network is trained across patients on one electrode to classify interictal, ictal, and preictal phases (phases directly before a seizure). They trained their model on a subset of the Epilepsiae database, consisting of 5 epileptic patients and additionally 5 healthy subjects. The architecture takes an input window of length , which equals to s for a sampling rate of Hz. As activation function, they use a Leaky ReLu. To stick to our framework, we adapt their network so that it uses electrodes as input by changing the first convolutional layer from a convolution to . Further, we use a sigmoid output layer. For training, an overlap of is used. As for the other approaches, the performance is evaluated with a second stride.

Iv-G4 Svm

Besides deep learning approaches, we compare our method to the support vector machine (SVM) proposed in [8]

. For feature extraction, they use the EMD algorithm

[22]

, which is a signal processing method for nonlinear and non-stationary time series. The SVM uses the variance of Intrinsic Mode Functions (IMFs) as input features and a Radial Basis Function (RBF) as kernel. A post processing step is applied, which classifies a seizure only if the SVM has classified a number of consecutive epochs as epileptic activity. In the original study, eight consecutive epochs were required for detecting a seizure. Since this would lead to a high detection delay for online detection steps of one second, we skip this step. Further, they use only 20 seconds of ictal and 100 seconds of interictal data for training. In contrast to that, we use all available seizures of the training set and downsample the remaining interictal data to the ratio proposed in the original paper. In our experiments, we use the first three extracted IMFs as features for the SVM.

V Results

In order to evaluate our approach, we first compare the detection performance for all architectures. Further, the ROC AUC score is computed to show the discriminative properties of the models. Since the AUC score is independent of the classifier threshold, it is a suitable additional metric to compare the different architectures. However, since seizures only have to be detected once but as fast as possible, the per-sample sensitivity and thus the AUC score of the classifier is not as important as the other metrics. After that, we show how the performance metrics can be influenced by varying the classification threshold. This offers the possibility of manual (re-)adjustment of the detection device during a study with real patients. Then, we investigate the effect of the preprocessing approximations. Finally, we evaluate all network-based models regarding their memory and speed for our hardware implementation. For this purpose, we analyze the requirements for the preprocessing and the successive layers of the networks.

Fig. 1: Detection performances over all patients for a classifier threshold of . Median delay and false positive rate are shown in logarithmic scale.

The detection performance of our model and all baselines is shown in Fig. 1 for a classifier threshold of . To show the overall performance, we calculate the metrics separately for each patient and summarize the distribution in the figure. SeizureNet has robust and high sensitivities for all patients. With a median sensitivity of , it outperforms all other architectures except for the network of Kiral-Kornek et al., which has a median sensitivity of . Their network, however, shows extremely high false positives rates, with outliers up to false positives per hour. Because of the large window size of s, the lowest median of is achieved by the network of Acharya et al. However, this window leads to a high median delay of s (unacceptable for early seizure detection applications) and a low sensitivity of . The second best false positive rate is achieved by EEGNet due to its low sensitivity. The SVM has a good median sensitivity, but a higher delay and a higher false positive rate than SeizureNet. With the best median AUC score of and a good balance between false positives and delay, SeizureNet shows the best and most robust detection performance.

Fig. 2 shows the influence of the classifier threshold on the performance metrics. Higher thresholds correlate with reduced sensitivity and increased delays, but also prove to be less prone to false alarms. The highest median sensitivity of is achieved with a classifier threshold , with a respective median delay of s and false positives per hour. A low median of can be achieved with a high classifier threshold of . However, this threshold decreases the sensitivity to and increases the median delay to s. Of course, the classifier threshold can also be varied for the other architectures (for the SVM, the proposed postprocessing step can be used instead). However, as indicated by the high AUC score, SeizureNet has the best performance independent of the threshold.

Fig. 2: Influence of classifier threshold on the detection performance of SeizureNet. Median delay and false positive rate are shown in logarithmic scale.

V-a Approximations

Fig. 3 illustrates the performance loss due to preprocessing for the proposed approximations for 10 patients. We compare to another approximation, where we replace the rolling mean with 0. The used highpass filter roughly centers the signal around the zero line which allows us to dispense with a mean estimation and the standard deviation can be approximated by

In our experiments, using the 10 minutes standard deviation and the hyperbolic tangent shows the best results. The best approximation is using the grand mean with a median delay of and a of and an AUC score of . The zero mean approximation offers a big advantage in runtime and memory reduction, however, it increases the median delay considerably to .

Fig. 3: Detection performance of the proposed preprocessing and various approximations evaluated on a subset of 10 patients.

V-B Hardware Requirement Analysis

The theoretical runtime and memory requirements of all convolutional networks for our hardware implementation are shown in Fig. 4. Besides SeizureNet, only the network of Kiral Kornek et al. would actually be implementable on our microcontroller. However, the preprocessing of this network is extensive due to the STFT and has a 38% higher runtime. The high runtimes of EEGNet and Acharya et al. make an implementation on our realtime detection device impossible. For EEGNet, this is caused by zero-padded convolutions, which are not reducing the dimensionality of the input. The runtime for the network of Acharya et al. is mainly affected by the large input window.

The required memory of the hardware implementation is specified by the number of parameters of the networks, a buffer for the rolling window of the preprocessing, lookup-tables and two buffers, saving the inputs and outputs of the hidden layers. The size of the buffers is determined by the layer with the largest input and output size. Regarding the memory, all networks besides the network proposed of Acharya et al. are theoretically implementable on our device. The limiting factor of this network is the use of fully-connected layers, which require 74% of the total memory.

Besides SeizureNet, none of the networks were actually implemented on hardware platforms, so we cannot compare hardware efficiency and power consumption. Only Kiral-Kornek et al. implemented an adapted network on the IBM TrueNorth chip. This chip has a consumption of

mW to power 1 Mio neurons, resulting in a power consumption of

nW per neuron. With neurons, a power consumption of mW can be estimated for the adapted network. For the preprocessing and forward pass of SeizureNet, we measured a power consumption of which is 5 to 8.8 times lower.

Fig. 4: Memory (top) and runtime (bottom) requirements for SeizureNet and the baseline architectures. Runtime blocks are ordered according to their execution in the forward pass. Layers with few parameters or cycles are not visible.

The SVM approach was excluded from this comparison as non-parametric approaches behave differently for each patient. While test-time predictions scale linearly in time and memory with the number of support vectors, the required amount grows rapidly with the number of seizures considered in the training data. Due to the varying computational demands for each patient, it is difficult to find a fair setting for a comparison against the network approaches that have a constant load across patients. Additionally, the preprocessing in [8] would require simplification to be efficiently implementable and change the method substantially.

V-C Limitations of the Approach

Tuning the networks to early and sensitive detection of electroencephalographic seizure patterns (Fig. 5) occurs at the cost of higher false positive rates. Several components can have contributed to this: a) selection of short EEG segments of s decreases detection delay but increases chances to falsely classify artefacts as ictal patterns; longer windows of analyses can ameliorate this (note the low FP rate in [11] using a s window); b) subclinical ictal electroencephalographic patterns resembling ictal patterns by definition, but not accompanied by clinical phenomena can be detected which may occur more frequently than clinically manifest seizures [23]. These detections should not be called false detections, and there may even be reasons to use such detections to trigger interventions in a closed-loop device setting. Integration of more EEG channels may allow for an estimation of the probability of clinical correlates of the ictal event.

Expert review of some missed seizure showed epileptic auras with unclear electrographic correlates, which allowed neither clear visual nor algorithmic detection (Fig. 6 bottom). Additional investigations are needed to identify the patterns relevant for seizure detection during the evolution of the electrographic ictal event, which not always coincide with the seizure onset pattern, but rather with alterations of the ictal pattern in the course of recruitment and spread (Fig. 6 top and middle). Remarkably, SeizureNet had also good detection latencies for seizures with onset patterns, which are difficult to detect, for example for amplitude depression [24].

Fig. 5: Early detected seizure with the repetitive spiking onset pattern: predictions one minute around the seizures and the normalized electrode signal of one electrode.
Fig. 6: Late detections (top, middle) and missed seizure (bottom): predictions one minute around the seizures and the normalized electrode signal of one electrode.

Vi Conclusion

We have introduced SeizureNet, a convolutional neural network for online seizure detection with state-of-the-art performance. Empirical evaluation of our approach demonstrates its suitability for practical realization on an implantable low-power microcontroller for clinical applications. The considered approximations to preprocessing and architecture choices preserve performance sufficiently while leaving over computational resources for further improvement. Candidates for future work in this direction are the distillation of network ensembles [25] or the improvements in quantized, low-precision neural networks [26].

Acknowledgment

This work was supported by BrainLinks-BrainTools Cluster of Excellence funded by the German Research Foundation (DFG, grant number EXC 1086).

References