The human brain is responsible for performing a wide range of autonomous, semi-autonomous, and manual functions, such as generating thoughts, motor control, storing memories, regulating hormones, etc. [herculano2009human]. It is susceptible to more than 600 diseases like tumors, epilepsy, Alzheimer’s, and strokes [WHO2006]. These diseases can be diagnosed using medical imaging techniques, such as (functional) Magnetic Resonance Imaging (fMRI/MRI) and Computed Tomography (CT Scan), and Electroencephalography (EEG) [AANS]. These techniques are typically used to study the brain ex-post-facto, i.e., after the event has occurred, to evaluate the amount of damage that has been caused by the event [NINDS2019]. Patients prone to such neurological orders could be in potentially fatal situations, where they could place themselves and other people in harm, for example, the occurrence of seizures or strokes in drivers and heavy equipment users.
State-of-the-art techniques, typically, analyze the EEG signals using compute-intensive algorithms and statistical methods, including machine learning/deep neural networks, to accurately detect/predict each neurological disorderindividually [akmandor2017keep, mardi2011eeg, burrello2019laelaps, pascual2019self, hussein2018epileptic, zhang2018integration, hosseini2016cloud, kim2018wave2vec, samie2018highly]
. This requires the edge device (typically a wearable or a sensor-head) to continuously transmit the EEG data to the cloud for further processing and feature extraction. Besides the communication time overheads, these fully cloud-based techniques pose serious privacy and security concerns for the users who might not wish to continuously transmit all of their bio-signal data over an insecure/untrustworthy network, or to store it on the third-party cloud platforms[shirazi2017extended]. However, it may still be feasible to transmit certain parts of the data to the cloud, if extensive processing is required to recover from life-threatening situations, considering the fact that the third party cannot retrieve the complete signal information with incomplete data. Such a situation is more realistic and can be considered as a trade-off between privacy, security, and urgency-of-extensive-analysis.
Enabling such an efficient EEG processing system requires addressing the following research challenges, as addressed in this work:
How can the continuous monitoring of EEG signals at the edge device be used to predict multiple different neurological anomalies?
How to enable real-time anomaly prediction with the help of a cloud-edge hybrid framework, while trying to minimize the amount of data transmitted to the cloud?
Novel Contributions: To address the above challenges, we propose the novel EMAP framework for predicting anomalies in real-time that employs the following key components:
An efficient edge sensor node to continuously monitor the brain signals by collecting, pre-processing, and transmitting only one second of the EEG signal data to the cloud every few seconds;
A Mega-database () of EEG signals, on the cloud, that was constructed by combining various state-of-the-art EEG databases containing normal and anomalous EEG signals, such as seizures, epilepsy, etc.;
A novel signal cross-correlation search algorithm, which efficiently compares the patient’s one-second EEG signal with all the signals of the in the cloud, to quickly identify the top- analogous signals, which are transmitted to the edge;
A novel real-time signal tracking algorithm at the edge to estimate the similarity of the top-
analogous signals with the input in real-time, to eliminate dissimilar signals, estimate the probability of an anomaly, and predict its occurrence based on the inputs obtained from the subsequent time-steps.
Furthermore, to enable an efficient design of the EMAP framework, we perform a motivational analysis that studies the benefits of continuous monitoring and signal cross-correlation to estimate the probability of anomalies. The prediction accuracy of EMAP is evaluated for three different neurological disorders, namely, seizures, strokes, and encephalopathy, using different input signals for each disorder.
Evaluation & Open-Sourcing:
We have successfully obtained a prediction accuracy of 94%, 73%, and 79%, on average, for the three different anomalies that we have tested. To enable easy reproduction and adaption of the proposed EMAP framework, we will open-source the complete tool-flow athttps://emap.sourceforge.io. Fig. 1 illustrates an overview of the contributions (in dark highlighted blocks) that have been proposed in the cloud-edge hybrid framework.
Ii Related Work
Electroencephalography (EEG) is a technique that is typically used to study the brain ex-post-facto, by medical experts, to ascertain the amount of damage caused due to a specific event. This typically requires high-quality EEG electrodes that are not portable or easy to use and can be highly expensive. For the purpose of continuous monitoring, 10-20 electrodes (considered to be an EEG placement standard), which cover the surface area of the head, can be placed on a cap (electrode-caps) to measure the EEG signal samples [OpenBCI]. These devices can be used for other purposes, such as stress detection and mitigation [akmandor2017keep] and drowsiness detection [mardi2011eeg].
Recently, these devices are used as wearables in the healthcare industry to accurately detect or predict specific brain anomalies, especially seizures. Most of the current techniques heavily rely on deep learning for accurateseizure detection. Burrello et al. proposed a hyperdimensional computing approach called Laelaps
that can be used for accurately classifying seizures using EEG signals[burrello2019laelaps]. Pascual et al. proposed a minimally supervised algorithm that can be used to automatically label seizures, without the help of medical experts, to generate personalized training data [pascual2019self]. Similarly, other deep learning techniques have been proposed for seizure detection by Zhou et al. [zhou2018epileptic] and Hussein et al. [hussein2018epileptic]. On the other hand, various research works have also addressed the problem of seizure prediction using deep learning by Zhang et al. [zhang2018integration], Hosseini et al.[hosseini2016cloud], and Kim et al. [kim2018wave2vec]. Typically, these techniques are resource-consuming and might require additional hardware units such as powerful deep neural network accelerators or GPUs, which may not be feasible for low-cost IoT edge devices. Towards this, Samie et al. proposed an algorithm for seizure prediction that can be deployed on low-power resource-constrained IoT devices [samie2018highly]. Previous research works have also proposed the use of signal cross-correlation for diagnosing physical and psychological diseases such as epilepsy and schizophrenia [timashev2012analysis][zhang2015seizure].
In this work, a cloud-edge hybrid framework has been proposed, which can monitor EEG signals and predict the occurrences of various brain-related anomalies, and not just seizures, in real-time.
In this section, we present the relevant background knowledge to a level of detail that is necessary to understand the proposed novel contributions in this work.
Bandpass Filters: Finite Impulse Response (FIR) Bandpass filters are used to attenuate the noise components and motion artifacts outside the desired frequency range. This is especially the case for multi-channel EEG electrodes that are highly susceptible to noise because of the location of their placement, i.e., on the scalp of the users. Therefore, we define a 100-tap FIR bandpass filter (), which attenuates all frequencies besides the desired range of Hz, with the following transfer function:
Signal Cross-Correlation: The similarity of two signals can be evaluated using a metric known as cross-correlation, which is a function of the displacement of one signal with respect to the other, also known as the sliding dot product. The cross-correlation of two signals and , composed of 256 samples each, is defined as:
Area Between Curves: Besides cross-correlation, the similarity of two signals can also be determined by calculating the area between the curves of the two signals, which is defined as:
Time Consumption: The initial time overhead () for the proposed framework to estimate anomaly probabilities for the first iteration after deployment is modeled using the following equation:
where is the time required for transmitting the input EEG signal from the edge to the cloud, is the time required for the signal cross-correlation search to determine the set of signals with maximum similarity to the input signal, i.e., cloud search, and is the time required to download this set of signals, from the cloud, by the edge device. In each subsequent time-step, the signal tracking algorithm at the edge is used to estimate the anomaly probabilities, which is required to be less than one second for the proposed EMAP framework to meet the real-time constraints.
Anomaly Probability: We define the probability of occurrence of an anomaly () as the proportion of anomalous signals () with respect to the total number of signals being tracked () at the edge. It can be computed as:
Iv Analysis of Signal Cross-Correlation for Predicting Brain Anomalies
To illustrate that cross-correlation can be used to predict anomalies, we performed an experiment to determine the top- signals in the with maximum similarity to an anomalous input signal. As Fig. 2(a) illustrates, the proportion of normal to anomalous signals is quite large, and if its probability were to be estimated at this point, it would be very low, i.e., close to . However, with continuous monitoring, dissimilar signals can be eliminated in real-time to only keep track of signals that are highly correlated to the input signal; see Figs. 2(b)-(f). Using this approach, we eliminate dissimilar signals after each time-step/iteration and estimate the probability of anomalies, which goes up to at the end of Iteration 5. The probability of occurrence of an anomaly increases after each iteration as the proposed EMAP framework eliminates the normal signals at a higher rate as compared to the anomalous signals, due to their dissimilarity to the input signal.
V Our EMAP Framework
Fig. 3 presents an overview of the proposed EMAP framework, which is composed of three key stages that are explained in detail in the subsequent sub-sections:
Signal Acquisition is responsible for sampling, pre-processing, and transmitting one-second of EEG signal data to the cloud,
Cloud Search compares the input signal against all the signals in the using the proposed cross-correlation search algorithm to identify the top- analogous signals, which are transmitted to the edge for real-time tracking, and
Edge Tracking employs a novel signal-tracking algorithm to evaluate the similarity of the incoming input samples with the set of analogous signals to eliminate dissimilar signals and estimate the probability of occurrence of an anomaly in real-time.
The cloud-edge hybrid framework allows us to effectively offload the compute- and memory-intensive signal cross-correlation algorithm to the cloud while the searched signals are tracked, and their anomaly probability estimated, in real-time by the edge device.
V-a Signal Acquisition
This stage of the framework is responsible for three main tasks, namely,
transmission. The electrode-caps, discussed in Section II, can be used as sensor nodes to sample the brain signals at the required frequency of Hz (16-bit resolution), after which the signal is filtered using a bandpass filter to eliminate the noise components and to generate a uniform piece-wise linear curve that can be transmitted to the cloud for comparison. We define the input signal obtained from the EEG headset as , where denotes the set of samples of the input signal at time-step , and denotes the sample in the time-step of the input. This signal is passed through the -tap bandpass filter to obtain the signal , which is subsequently transmitted to the cloud. Note, it might be suitable to include a 100-tap bandpass filter as a simple hard-coded accelerator on the edge device, to ensure that the framework works in real-time. Fig. 4(a) presents the time required for uploading the various number of samples to the cloud across different communication platforms. In the era of 4G communication, the time required for transmitting an EEG signal of one time-step should ideally take less than ms. This time constraint is imposed to efficiently offload compute- and memory-intensive tasks to the cloud, and receive their outputs in real-time.
V-B Cloud Search
This stage is composed of two parts, namely,
the construction of the , and
the signal cross-correlation search. The first part, i.e., the construction of the , involves identifying and combining different state-of-the-art open-access EEG datasets presented in [PhysioNet][harati2014tuh][Dua:2019][brunner2014bnci][zwolinski2010open], to include a wide-range of normal and anomalous EEG waveforms. This includes collecting, up-/down-sampling the signals to the base frequency of Hz, and labeling the EEG signals as normal or anomalous. Since the input signal to this stage is bandpass filtered, all the signals in the dataset are also bandpass filtered to ensure consistency, uniformity, and ease of search. Each signal in the dataset is further sliced into signal-sets of samples each, and allocated a label (normal or anomalous), to enable the search algorithm to quickly search through the complete database in parallel. We define the super-set of datasets as , where each dataset is composed of a set of signals , such that . Each signal, in each dataset, is passed through the bandpass filter to generate the output signal . Each signal is sliced into signal-sets of samples each, which are then subsequently labeled as normal or anomalous to construct the mega-database . The is defined as the super-set of all signal-sets, i.e., , where each signal-set has the attribute/label , such that for normal signals and for anomalous signals.
The next part in this stage cross-correlates the input signal obtained from the signal acquisition stage with all the signal-sets present in the . Because of the large number of signal-sets, the search space for this algorithm is huge; therefore, the time required for this stage is also quite large. For example, an input signal composed of samples needs to be cross-correlated times with a single signal-set of samples in the , with a skip window (), i.e., the number of offset samples, of , in each iteration. This example is depicted in Fig. 5. Therefore, we use a sliding window approach that offsets the signal-set, based on the step-size (), before each cross-correlation is performed. We propose to increase/decrease the step-size non-linearly because of the following reasons:
two dissimilar signals can increase the number of searches, without identifying similar signals, when is very small, and
two very similar signals can reduce the number of searches, by skipping over similar signals, when is very large. Therefore, we define an exponential sliding window, which can increase or decrease the skip window () based on the correlation of the signals obtained at offset and the step-size, as . The proposed approach considers the aforementioned properties of EEG signals to reduce the search space, as depicted in Fig. 6. We determine the value of for the framework by performing a series of experiments to study the exploration time, the number of correlated signal matches, and average cross-correlation in the top- signals by varying the values of . The results of these experiments are illustrated in Fig. 7(a). As illustrated in the figure, the signal cross-correlation value saturates and increases by very small margins when is increased beyond . This ensures that highly correlated signals are not eliminated during the proposed signal cross-correlation search. Therefore, we have preset to be for all the future requirements in our proposed framework, to limit the initial overhead () to ~ seconds. We also illustrate the benefits of using the proposed approach when compared to the exhaustive search for a varying number of signal-sets explored in Fig. 7(b). On average, we achieve ~ reduction in the exploration time when using the proposed signal cross-correlation algorithm, i.e., Algorithm 1, when compared to the exhaustive search. The Algorithm searches over the complete signal-set space to determine the set of top- signals (), which have the maximum correlation with the input signal . Based on our experiments, the cross-correlation threshold was determined to be in order to avoid scenarios where the input signal is unable to find similar signals in the . This set of cross-correlated signals is transmitted to the edge device for real-time signal tracking and anomaly prediction.
V-C Edge Tracking
Fig. 4(b) presents the time required to download the signal correlation set from the cloud for various values of the number of signals transmitted. For the framework to work in real-time, the complete signal correlation set needs to be downloaded in less than ms. In the Edge Tracking stage, we propose a simple lightweight algorithm for tracking the signal using the signal’s input from the next time-step, i.e., . Re-evaluating the cross-correlation for each of the signals is both time and resource consuming, neither of which are, typically, available in the embedded edge nodes that are required to perform computations in real-time and are resource-constrained. Therefore, we propose to evaluate the area between the curves for the subsequent time-steps to estimate the similarity between the input signal and the signals present in . The proposed lightweight signal tracking algorithm is illustrated in Algorithm 2. Next, we compare the two signal matching techniques by varying the cross-correlation () and area thresholds () to determine the number of signal matches that can be obtained. The results of this experiment are presented in Fig. 8(a). Based on this analysis, we can determine that the area threshold for the edge tracking algorithm is equal to ~ sq. units, which is roughly equivalent to the signal cross-correlation threshold () deployed in the cloud. This threshold value can be modified to increase/decrease the number of matches based on the user requirements and the processing capabilities of the edge device. Moreover, we have also performed an experiment to determine the execution time differences between the cross-correlation approach and the proposed technique, the results of which are presented in Fig. 8(b). This method of estimating the similarity is roughly ~ faster. Furthermore, the time required for tracking signals, by the edge device, is ~, which satisfies the real-time requirement of the framework. Similar to the approach used in the cloud, each signal-set and its parameters () are tracked using a lightweight algorithm, which estimates the area () between the two signals at each time-step. Signal-sets that do not satisfy the area threshold are removed from the list of signals that are tracked () in each iteration. When the number of signals in this list drops below the signal tracking threshold , the patient’s EEG signal for the current time step is transmitted to the cloud to obtain a new that can be used for tracking the signals once-again. This procedure is done in the background, i.e., the signal tracking at the edge is still ongoing to provide anomaly prediction probabilities in real-time, while the cloud is used to search for a new signal correlation set .
Figure 9 presents an overview of the timing analysis of the EMAP framework. After the sampling is completed at instance a, the data is filtered and transmitted to the cloud for the mega-database search, which incurs an initial overhead of ~ seconds. After the search is complete, the top- signals are transmitted to the edge device, at instance c, for real-time tracking and probability estimation. In each iteration , we use the proposed lightweight signal tracking algorithm to remove dissimilar signals and to estimate the probability of an anomaly. If the number of signals being tracked falls below the pre-determined threshold, the previous set of sampled signals is transmitted to the cloud, at instance e, for the MDB Search. The signals are still being tracked in real-time at the edge, i.e., the MDB Search is completed at the cloud while doing real-time signal tracking at the edge in parallel. The same tracking procedure is repeated at the edge with a new set of top- signals at instance h. Based on our experiments, we have determined that the sampled EEG signals need to be transmitted to the cloud every five iterations, i.e., after seconds of edge-processing.
Vi Experimental Results & Discussion
Vi-a Experimental Setup
We implemented the proposed EMAP framework by considering an Intel Core i7-7700HQ microprocessor with 16GB of DDR4 RAM and a 128 GB SSD running the Linux Ubuntu 18.04.3 LTS operating system as the cloud, and a Raspberry Pi B+ with 16GB extended memory as the edge node. The complete framework was implemented in the Python programming language with the help of the scipy, sklearn, spyedflib, and pymongo libraries. We used the MongoDB framework to implement the , to systematically store and access signals. The was constructed using the signals obtained from the datasets presented in [PhysioNet][harati2014tuh][Dua:2019][brunner2014bnci][zwolinski2010open].
|EMAP||SoA - Prediction||SoA - Detection|
Vi-B Prediction Accuracy Analysis
We have already illustrated that the framework’s parameters and stages are configured in a manner so as to achieve real-time anomaly predictions. Therefore, in this section, we will study the accuracy of predictions and the ability of the framework to predict three different neurological disorders. For the following experiments, we have randomly constructed 5 batches of 20 input signals each to estimate the accuracy of predicting each anomaly that we have considered. The prediction results presented are for two sequential cloud calls, i.e., after transmitting the input signal to the cloud twice after the signal tracking threshold is violated.
Seizures are one of the most common neurological disorders in the world, and therefore, is one of the diseases that has been widely studied as a research challenge. Fig. 10 presents the prediction accuracy results of the framework at various time intervals before the occurrence of the seizure. We have achieved a maximum prediction accuracy of and an average prediction accuracy of in real-time. Each time-step of the input signal is compared with the set of correlated signals to estimate the anomaly probability, which if increasing is classified as an anomaly. Whereas the state-of-the-art technique [samie2018highly], on average, achieves a prediction accuracy of . Furthermore, this technique is highly specific and can only be used to predict the occurrence of seizures, whereas the EMAP framework can be used to predict multiple different brain-anomalies.
Next, we evaluate the proposed EMAP framework for other anomalies, such as encephalopathy (Anomaly 2) and stroke (Anomaly 3), the results of which are presented in Table I. Due to the unavailability of similar highly annotated datasets for these two anomalies, i.e., the preset and onset of anomaly progression, for the following two cases, we have annotated the complete signal as an anomaly. We have achieved an average prediction accuracy of and , respectively for encephalopathy and strokes. This reduction in prediction accuracy is attributed to the unavailability of a substantially-labeled dataset such as the ones available for the seizure. Furthermore, since the proposed algorithm focuses on maximizing the sensitivity to anomalies and classifies near-threshold anomaly probability increases as anomalous, the classification accuracy of the normal signal is reduced, i.e., the average percentage of false-positives is ~, which is a limitation of the EMAP framework.
Finally, we evaluate the loss in accuracy of deploying the proposed signal cross-correlation search (Algorithm 1) in the cloud instead of the time-consuming exhaustive cross-signal. We evaluate the average signal cross-correlation of the top- signals with respect to the input for different normal and anomalous input signals. The results of these experiments are illustrated in Fig. 11. As can be observed, the average cross-correlation of the proposed approach is very close to the average cross-correlation of the signals obtained using the exhaustive cross-correlation technique. Therefore, the loss in accuracy is almost non-existent and indistinguishable due to the substantially large and highly redundant data-set that we use. However, due to the sliding window technique deployed in the proposed approach, the top- signals selected are very diverse, i.e., typically, the top- signals exhibit high cross-correlation to the input, but can also exhibit very low cross-correlation in certain scenarios, as illustrated in the figure.
In this work, we presented EMAP, a cloud-edge hybrid framework that is beneficial for continuously monitoring EEG signals and to estimate the probability of occurrence of an anomaly in real-time. The framework is composed of three key stages, namely,
Signal Acquisition, which is responsible for collecting, filtering, and transmitting the EEG data to the cloud;
Cloud Search, where the input signal is cross-correlated with all the signals in the , which is a construction of multiple openly accessible EEG dataset, to determine the top- signals with maximum similarity to the input signal; and
Edge Tracking, where the subsequent EEG signal samples are used to eliminate the dissimilar signals and predict the occurrence of an anomaly. Using the proposed, we have achieved a prediction accuracy of , , and for three different anomalies, namely, seizures, encephalopathy, and strokes, respectively. The EMAP framework has been made open-source at https://emap.sourceforge.io, to ensure ease of adoption and reproducibility.
This work was partially supported by Doctoral College Resilient Embedded Systems which is run jointly by TU Wien’s Faculty of Informatics and FH-Technikum Wien.