Automated Polysomnography Analysis for Detection of Non-Apneic and Non-Hypopneic Arousals using Feature Engineering and a Bidirectional LSTM Network

09/06/2019 ∙ by Ali Bahrami Rad, et al. ∙ aalto 7

Objective: The aim of this study is to develop an automated classification algorithm for polysomnography (PSG) recordings to detect non-apneic and non-hypopneic arousals. Our particular focus is on detecting the respiratory effort-related arousals (RERAs) which are very subtle respiratory events that do not meet the criteria for apnea or hypopnea, and are more challenging to detect. Methods: The proposed algorithm is based on a bidirectional long short-term memory (BiLSTM) classifier and 465 multi-domain features, extracted from multimodal clinical time series. The features consist of a set of physiology-inspired features (n = 75), obtained by multiple steps of feature selection and expert analysis, and a set of physiology-agnostic features (n = 390), derived from scattering transform. Results: The proposed algorithm is validated on the 2018 PhysioNet challenge dataset. The overall performance in terms of the area under the precision-recall curve (AUPRC) is 0.50 on the hidden test dataset. This result is tied for the second-best score during the follow-up and official phases of the 2018 PhysioNet challenge. Conclusions: The results demonstrate that it is possible to automatically detect subtle non-apneic/non-hypopneic arousal events from PSG recordings. Significance: Automatic detection of subtle respiratory events such as RERAs together with other non-apneic/non-hypopneic arousals will allow detailed annotations of large PSG databases. This contributes to a better retrospective analysis of sleep data, which may also improve the quality of treatment.



There are no comments yet.


page 1

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Medical studies show a bidirectional relationship between sleep and health, and consequently, sleep disorders may have a negative effect on patients’ health, mood, and quality of life [1]. There are about 90 different sleep disorders classified under the main categories of insomnia, sleep-related breathing disorders, sleep-related movement disorders, hypersomnias of central origin, parasomnias, and circadian rhythm sleep disorders [2]. In this study, we pay special attention to sleep arousals induced by sleep-related breathing disorders. However, sleep arousals which are identified by transitions from deeper sleep states to lighter ones can also occur either spontaneously or in association with other sleep disorders and/or environmental stimuli.

Sleep arousals are characterized by sudden shifts in electroencephalography (EEG) frequency [3]. However, depending on the type of sleep disorders, arousals may be manifested on other biosignals too. For example, sleep-related breathing disorders, which are characterized by respiratory or ventilatory disturbance during sleep [4], lead to arousals detectable from biosignals such as airflow, respiratory effort signals (chest and abdominal), and arterial oxygen saturation (SaO) along with EEG. Furthermore, bruxism, defined as unconscious clenching, grinding, or bracing of the teeth during sleep [5], is a type of sleep-related movement disorder which leads to arousals observable from chin EMG and EEG [6]. Therefore, analysis of the patterns of the aforementioned clinical time series together with other biosignals such as electrooculography (EOG) and electrocardiography (ECG), which are recorded during a typical polysomnography (PSG) test, provide important information for sleep arousal detection.

Despite the recent attempts to automate PSG-based sleep analysis [7, 8, 9, 10, 11, 12, 13], arousal detection is still done manually by expert sleep technologists. Typical contemporary PSG datasets can consist of hundreds to thousands of cases, and each case contains more than a dozen clinical time series with about eight-hour long. Manual analysis of such datasets is a labor-intensive and time-consuming process, which highly depends on the sleep technologist’s experience and skill [14]

, and consequently limits the PSG-based sleep-related studies. Our aim is to develop a machine learning algorithm to automatically detect arousal events in PSG recordings. We use the same objective as appointed by the PhysioNet/Computing in Cardiology (CinC) Challenge 2018 

[15, 16]. According to the PhysioNet challenge rules, the target arousals are those which are neither apneic nor hypopneic. This excludes all apnea types including obstructive, central, and mixed events [17] as well as hypopnea from our analysis.

Our particular focus is on detecting the respiratory effort-related arousals (RERAs) which account for 99.6% of all target arousals available in the PhysioNet training dataset [16]. RERA is a sequence of breaths lasting at least 10 seconds, characterized by extended inspiratory phase, paradoxical movement of the chest and abdomen, and/or flattening of inspiratory airflow that leads to an arousal from sleep [18, 19]. RERAs are very subtle respiratory events which do not meet criteria for apnea or hypopnea and are more challenging to detect [20]. Despite its subtle nature and moderate manifestation on biosignals, RERAs can cause fatigue and daytime sleepiness [21], not to mention an excessive number of RERAs is also associated with raised blood hypertension [22] and car accidents [23]. Aside from RERAs, the remaining 0.4% target arousals of this study consist of other types of sleep-related breathing disorders, sleep-related movement disorders, environmental stimuli, and spontaneous arousals.

The current study is a continuation of our prior work [24]

in the sense that it is developed for the follow-up phase of the 2018 PhysioNet challenge, and then assessed on the same dataset with the same evaluation criteria. However, it is a thoroughly independent body of research by virtue of the following facts. First, in our prior work, we proposed an automatic feature learning procedure based on a 2D convolutional neural network (CNN) 

[25] and state distance representation [26] of clinical time series. However, in the current study, we extract hand-engineered features from various time series based on the combination of expert knowledge and feature selection techniques. Second, in our previous study, we used a limited number of PSG channels (only 3 biosignals), but here we use all available PSG data (13 biosignals). Third, the development of the previous algorithm involved minimum/no physiological knowledge; the currently proposed method is developed based on prior knowledge of the physiological process during arousals. Fourth, we also extract an alternative semi-automatic set of features using state-of-the-art scattering transform [27] and investigate ways to increase its performance for PSG classification. Fifth, we utilize a recurrent neural network (RNN) based on long short-term memory (LSTM) units [28] for sequence modeling of sleep microstructures and transient events. The developed software is available in the PhysioNet system and will be released under an open-source license, according to the PhysioNet timeline.

Ii Dataset and Evaluation Criteria

We use the same dataset and scoring mechanism as provided by the 2018 PhysioNet/CinC challenge. The dataset comprises 1983 cases of in-laboratory PSG recordings. The data were recorded by Massachusetts General Hospital’s (MGH) Sleep Lab in the Sleep Division together with the Computational Clinical Neurophysiology Laboratory, and the Clinical Data Animation Center according to the American Academy of Sleep Medicine (AASM) practice standards [16]. The recordings consist of 13 biosignals as follow:

  • six EEG channels for recording cortical activity of three brain regions, based on the International 10-20 System:

    • frontal: F3-M2 and F4-M1 (PSG channels 1, 2),

    • central: C3-M2 and C4-M1 (PSG channels 3, 4),

    • occipital: O1-M2 and O2-M1 (PSG channels 5, 6);

  • the left side EOG for recording eye movements (PSG channel 7);

  • chin EMG for measuring chin muscle activity (PSG channel 8);

  • two respiratory effort signals for recording thoracoabdominal movements:

    • abdominal (PSG channel 9),

    • chest (PSG channel 10);

  • respiratory airflow (PSG channel 11);

  • arterial oxygen saturation (SaO) (PSG channel 12);

  • ECG for measuring heart activity (PSG channel 13).

All biosignals except SaO were sampled at 200 Hz. The SaO was upsampled to 200 Hz for convenience.

All recordings were annotated according to AASM standard by seven clinical experts, but one expert was used for each recording. The recordings were scored for sleep stages and then annotated into three classes: non-target arousal, target arousal, and non-arousal events. The non-target arousals are those regions in PSG recordings with apneic or hypopneic arousals, and the target arousals are the regions which meet either of the following conditions:

  1. 2 seconds before the onset of RERA to 10 seconds after its ending;

  2. 2 seconds before the onset of non-RERA, non-apneic, and non-hypopneic arousal to 2 seconds after its ending.

As it was stated earlier, 99.6% of target arousals in the training dataset are related to RERAs. The remaining 0.4% are distributed among arousals related to snoring, partial airway obstruction, Cheyne-Stokes breathing, hypoventilation, bruxism, periodic leg movement, noise, and spontaneous.

The dataset is divided into two disjoint subsets of training ( = 994 subjects) and testing ( = 989). The labels of the testing dataset are hidden and are reserved to be used by PhysioNet challenge organizers to evaluate the performance of the submitted algorithms. The performance is assessed using the area under the precision-recall curve (AUPRC) for binary classification between target arousal and non-arousal regions. The non-target arousal regions are not considered for evaluation. More information on the evaluation criteria is available in [15] and [16]

. In addition to AUPRC which is the primary evaluation criterion, we calculated the area under the receiver operating characteristic curve (AUROC) as a secondary evaluation criterion.

Iii Feature Engineering

After preprocessing of PSG recordings as described in Section III-A, we extract 465 features from each 5-second analysis window. The features are categorized into two groups: physiology informed and physiology agnostic features. The physiology informed features are extracted based on our physiological knowledge of sleep arousal and its manifestations on biosignals. However, this set of features are not solely based on physiology, but instead, we extract a large number of hand-engineered features based on our prior knowledge of sleep arousals, and then during multiple steps of feature selection and expert judgments, remove the irrelevant and/or redundant ones (see Section III-B). On the other hand, the physiology agnostic features are entirely derived based on our knowledge of signal processing and machine learning without any physiological consideration (see Section III-C).

Iii-a Preprocessing and Data Preparation

The 60 Hz powerline artifact is removed using a band-stop filter. Moreover, an inspection of the spectral content of biosignals indicates the presence of an extra 80 Hz artifact in some recordings. This might be related to the second harmonic of the powerline artifact (120 Hz) which due to the aliasing effect presents itself as an 80 Hz false frequency component. The 80 Hz artifact is filtered out as well. Then, the high-amplitude muscle-generated artifacts due to body movements are removed by simple thresholding: if the instantaneous magnitude of the biosignal exceeds 8 times the interquartile range of its amplitude, it is replaced by zero value. Furthermore, the dynamic range of the signal amplitude is normalized by dividing the instantaneous amplitude by 8 times the interquartile range. The last two steps (i.e., high-amplitude artifact removal and dynamic range normalization) are applied to all biosignals except SaO and ECG. Finally, each PSG recording is segmented into 5-second nonoverlapping triangular windows. From now on, all the analyses are done on these 5-second windows.

Iii-B Physiology Informed Features

In the initial phase of this study, we extracted more than 900 features from all biosignals. The extracted features were from various domains such as time, frequency (or spectral), time-frequency, and phase space. The number of features is then reduced through multiple steps of feature selection methods and expert judgment. In the first step, 250 features are removed after a feature ranking procedure using a random forest classifier similarly to 

[29] and [30]. Then, the features derived by nonlinear analysis of biosignals in the reconstructed phase space [31] are removed to speed up the feature extraction process. Although these features contribute to a better classification result by 0.02 points in terms of AUPRC, we remove them from our analysis due to the run-time constraint applied by the PhysioNet challenge organizers. In the next step, all time-frequency features, obtained from the ordinary discrete wavelet transform (DWT), are removed and replaced by features derived from the scattering transform [27]. The problem with DWT is that it is covariant to translation and one needs to extract the ad hoc translation invariant features similarly to [32]. Since the calculation of scattering transform features involves no physiological knowledge, we treat them as physiology agnostic features, discussed separately in Section III-C

. In the last step, we applied our proposed heuristic feature selection method (see Section 

IV) to the remaining 500 features to derive the final set of 75 physiology informed features.

In the following, we describe these 75 features which can be further categorized into two subgroups: respiratory-related and non-respiratory-related features. The respiratory-related features, described in Section III-B1, are extracted from biosignals related to the respiratory process such as abdominal, chest, airflow, and SaO. The non-respiratory-related features, described in Section III-B2, are extracted from EEGs, EOG, chin EMG, and ECG.

Iii-B1 Respiratory-related features

Monitoring respiratory activity using relevant biosignals such as airflow, abdominal and chest, as well as oxygen saturation (SaO) reveals abnormalities and/or complications related to breathing [33]. For example, SaO indicates changes in blood oxygen level which is an important marker for the detection of sleep apnea or other respiratory problems. The respiratory-related biosignals also capture information about snoring, respiratory rate, airway obstruction, and the strength of inhalation and expiration [34]. For instance, the morphology and movement patterns of the chest and abdomen (e.g., biphasic, paradoxical, and in-phase) and/or the shape of the airflow signal (flatten vs. normal) are important indicators for detection of RERAs [35, 36]. Furthermore, snoring can be derived from the high-frequency periodic oscillation of airflow [37], or it might even appear as an artifact on the non-respiratory-related chin EMG biosignal [18].

In the following, we describe the selected 41 respiratory-related features, among them, there are 13 cross-channel and 28 isolated-channel features.

  1. Thirteen cross-channel features are extracted from the abdominal, chest, and airflow signals (PSG channels 9-11) using correlation analysis, hypothesis testing, and multichannel signal decomposition. The first six features are the Pearson correlation coefficients and the p-values for testing the hypothesis that there is no relationship between each pair of signals (null hypothesis). The next seven cross-channel features are extracted after factorization of the the matrix

    formed by these signals each of length


    We consider the singular value decomposition (SVD) of

     [38], that is,


    where and are and orthogonal matrices, respectively, and is a block matrix in which is a diagonal matrix with singular values , , and in the diagonal, that is,


    and is a zero matrix. By convention , , and are organized such that . The singular values , ,

    , their arithmetic and geometric means, their standard deviation (STD), and the ratio of

    are then the next seven cross-channel features.

    Fig. 1: Magnitudes of the frequency spectra of the wavelets in the two filter banks. In the first filter bank , , and , thus the number of wavelets is 14 (). In the second filter bank and since , . Thus, the number of wavelets is 8 ().
  2. Six features are extracted from the abdominal signal. The first two features are the STD and the root mean square (RMS) values of the signal. Then the signal is modeled as an order 10 autoregressive (AR) process, that is,


    where and are the

    -th samples of the signal and input white noise, respectively, and

    are the parameters of the model, estimated by Burg’s method 

    [39]. The third feature is the ninth parameter () of the above-mentioned AR model. It is worth mentioning that there are various approaches for choosing a good value for the order of the AR model such as minimizing either the Akaike or the Bayesian information criteria [40, 41]. However, in this work, our purpose is not to design an optimum model for signal representation, but we are merely looking for those parameters (i.e., features) that are informative enough to be used for discrimination between arousal and non-arousal classes. Therefore, instead of being preoccupied with the optimum model selection, we choose a model order with a moderate value (e.g., 10) and during a feature selection procedure, choose the discriminative parameters. Furthermore, the respiratory-related abdominal spectrum (i.e., low-frequency interval of the abdominal spectrum) is divided into the following five frequency bands: 0.01-0.4 Hz, 0.4-0.75 Hz, 0.75-1.2 Hz, 1.2-1.6 Hz, and 1.6-3 Hz. The signal power in the frequency band between - Hz, , is estimated by the area under the power spectral density curve, , that is,


    where is estimated using Burg’s AR model with an empirically derived order 30. The last three features are , , and the ratio of .

  3. Five features are extracted from the chest signal, namely, RMS, STD, skewness,

    , and .

  4. Twelve features are extracted from airflow. We extracted RMS and skewness of the signal along with its power in five frequency bands: , , , , and . Moreover, the next four features are the nonlinear combinations of these features as follow: , , , and . The last feature is , in which , , and are the airflow signal and its first and second forward differences, respectively, that is,

  5. We extract 5 features from SaO, namely, mean, STD, RMS, and mean frequency of the power spectrum of the signal, as well as STD of the signal first difference.

Fig. 2: Magnitudes of the frequency spectra of the wavelets in the first filter bank of Fig. 1 are shown in the interval from 0 to 5 Hz. has the same bandwidth as , but the bandwidths of to are increasing exponentially.

Iii-B2 Non-respiratory-related features

AASM guidelines define arousal as an abrupt shift in EEG frequency including alpha (8-13 Hz), theta (4-8 Hz), and frequencies above 16 Hz lasting at least 3 seconds, and is preceded by at least 10 seconds of stable sleep [42]. Moreover, during a rapid eye movement (REM) stage, this EEG frequency shift needs to be accompanied by concurrent increases in submental (chin) EMG amplitude, to be recognized as arousal. On the other hand, non-rapid eye movement (NREM) arousals may occur without the aforementioned increase in chin EMG.

We extract various features from different EEG frequency bands including delta (0.1-4 Hz), theta (4-8 Hz), alpha (8-13 Hz), sigma (13-16 Hz), and beta (16-25 Hz). Moreover, EOG- and EMG-based features are extracted to differentiate between EEG arousals during REM and NREM, and ECG-based features are extracted to provide complementary information about sleep-related breathing disorders as well as autonomic arousals [43]. In the following, the selected 34 non-respiratory-related features are described.

  1. Seven features are extracted from each frontal EEG and EOG (PSG channels 1, 2, 7) as follow: RMS, STD, skewness, and kurtosis of biosignals, together with

    and parameters of AR model in (4) and in (5).

  2. RMS and in (4) are calculated for each central and occipital EEG (PSG channels 3-6).

  3. Three features are extracted from chin EMG as follow: RMS and kurtosis of the signal, in addition to .

  4. The following two features are extracted from ECG signal: and .

Iii-C Physiology Agnostic Features

Fig. 3: Time-domain representation of the complex wavelets corresponding to the analytical filters in Fig. 2. Re and Im are the real and imaginary parts of the complex wavelet functions corresponding to analytical filters shown in Fig. 2. is the approximation function whose corresponding low-pass filter is not shown in Fig. 2.

One of the challenges in classification is handling a substantial amount of intra-class variability which is not helpful for discrimination between different classes. Removing or minimizing this irrelevant information and preserving useful inter-class variabilities may significantly increase the classifier’s performance. The scattering transform proposed by Mallat [27] is a systematic approach to address this problem by building locally invariant, stable, and informative representations while preserving the signal norm and most of the inter-class variabilities.

The scattering transform is a deep representation which mimics a CNN in the sense that it propagates the input signal across a sequence of linear filters followed by pooling and nonlinearities [44]. However, contrary to a CNN in which the filters have adaptive weights obtained through a gradient-based learning strategy and error back-propagation [45], the scattering transform is derived by cascading predefined filters, namely wavelets. To be more precise, the scattering transform is a deep signal representation, derived by cascading wavelet transform moduli followed by an averaging operator (i.e., low-pass filtering) [27]. The logic behind this new transformation is to derive a translation invariant representation of the original signal which is also stable to small deformations like time warping.

In this work, we design a 2-layer scattering network with corresponding filter banks illustrated in Fig. 1. We use Gabor wavelets (i.e., approximately analytic wavelets constructed by frequency modulation of Gaussian windows) [46] whose central frequencies of the mother wavelets in the first and second filter banks are calculated as follows


Here, quality factors and are the number of wavelets per octave for the first and second filter banks, and  Hz is the sampling frequency. We design this scattering network such that the resulting representation is invariant to 5-second translation which leads to and wavelet scales in the first and second filter banks. Other wavelets in the filter banks are derived by dilating the mother wavelets by a factor of


where indicates the layer index in the scattering network and indicates the scale index. In the Fourier domain, these filter banks can be represented by


whose magnitudes are demonstrated in Fig. 1. If the central frequency of is , then the central frequency of is . In other words, the frequency axis is divided in a (base-two) logarithmic manner. However, for in order to cover the entire frequency spectrum the first filters (i.e., ) cover the higher-frequency interval in a logarithmic manner, and the lower-frequency interval is covered by equally-spaced filters with the same bandwidth as . This is due to the fact that the filter has the smallest bandwidth in frequency and the largest time-support which should be smaller than the predefined 5-second translation invariant scale. Although these filters are not dilations of , for simplicity they are still called wavelets [47]. In this work for the first filter bank, and , and for the second filter bank and since , (see Fig. 1). Derived by zooming in the  Hz frequency interval of the first filter bank, Fig. 2 shows that has the same bandwidth as , but the bandwidth of the other filters increases exponentially. The time-domain representations of the complex wavelets corresponding to the analytical filters in Fig. 2 are demonstrated in Fig. 3.

The 2-layer scattering network used in this work can be summarized as follows


where is convolution and is the complex modulus operator. In (11) the zeroth-order scattering coefficient is calculated by low-pass filtering (or weighted time-averaging) of the original signal (i.e., by convolution of with the approximation function ). By this low-pass filtering, high-frequency content of is lost. This high-frequency content can be recovered by the wavelet transform. So, in (12) the variation of signal at different scales is calculated by convolution of with wavelets . At a first glance it seems that the complex modulus operator in (12) results in information loss as well, but it can be shown that at least for a specific family of wavelets, can be reconstructed from up to a global phase (i.e., up to multiplication by a unitary complex number) and the reconstruction operator is continuous (but not uniformly continuous) [48]. So, the main source of information loss is the low-pass filtering which is needed for generating shift-invariant features. In (13) the first-order scattering coefficients are calculated by low-pass filtering of the first-order wavelet scattering modulus , and yet again the lost information is recovered in (14) in which the next wavelet scattering modulus is calculated by convolution of with the second layer wavelets . Finally, in (15) the second-order scattering coefficients are calculated. This process can be repeated an arbitrary number of times to generate more and more shift-invariant features. However, we stop it after generating the second-order scattering coefficients since the higher order coefficients have very low energy which can be neglected in the analysis [49], and they do not contribute towards improving the classification results [50]. This structure mimics a CNN in the sense that the convolutional layers (i.e., wavelet transforms ) are followed by nonlinearities (i.e., modulus operations ), and then they are followed by average pooling (i.e., low-pass filtering ). However, it is different from a CNN mainly because filters are not data-driven but predefined, and there is no weight sharing among different scales.

In this work, we extract wavelet scattering coefficients for 6 biosignals: EOG, abdominal, chest, airflow, SaO2, and ECG (PSG channels 7, 9, 10, 11, 12, and 13). Since non-orthogonal Gabor wavelets have significant overlap in the frequency domain (see Figs. 1 and 2), the resulting scattering coefficients are redundant. In order to speed up the analysis and decrease the memory usage, we downsample the features by a factor of 4. The final number of features for each biosignal is 65, resulting in a total number of 390 features.

The last point to discuss here is that one should not misinterpret our so-called physiology agnostic feature extraction as a domain agnostic method. We use the term “physiology agnostic” to highlight the fact that these features are not inspired by physiological knowledge of biosignals. However, the scattering transform is not a true domain agnostic method since the discovery of invariants and stability conditions to deformations which has a pivotal role in the success of this transformation is domain-dependent. It is obvious that invariants and stability conditions for image and texture data such as spatial translation, rotation, scaling, and partial occlusion [51, 52] are different for audio and speech signals such as time shifting, time warping deformation, frequency transposition, and frequency warping [50, 53].

Iv Classification




Fig. 4: LSTM memory cell with forget gate as proposed in [54]. In the original LSTM, proposed in [28], there was no forget gate.
Fig. 5: The architecture of the proposed BiLSTM network.

In the intermediate phase of this work after feature engineering, we relied on the sliding window method [55] to classify each 5-second segment of the PSG data using a random forest classifier [56]

. On average, the best-achieved result was 0.18 in terms of AUPRC, with high variance among different chunks of the data. The main reason for this low performance is that the temporal information and dependencies among different segments of the time series are lost. To address this shortcoming we use an LSTM network 

[28] which is a type of RNN with a gating mechanism that controls the flow of information [57]

. Contrary to “vanilla” RNN which suffers from the vanishing and exploding gradient problem 

[58] and consequently does not capture the long-range dependencies, LSTM addresses the aforementioned problem and captures richer contextual information of the time series, thanks to the gating mechanism.

In this work, we analyze the PSG recordings retrospectively and since the past, present, and future information of the time series is available at analysis time, we can use a bidirectional LSTM (BiLSTM) variant. Each BiLSTM layer consists of two layers of LSTMs: causal and anticausal counterparts. A single unit of a causal LSTM which processes the time series forward in time is illustrated in Fig. 4. It consists of four gates that control the flow of information through the following equations


Here, , , , and

are the vectors related to the input gate, forget gate, candidate cell gate, and output gate, respectively, for the entire layer of units or memory cells. Moreover, vectors

and are the cell and hidden states, respectively, and , , and

are respectively the input weight matrix, recurrent weight matrix, and bias vector for the gate denoted by

. and

denote respectively sigmoid and hyperbolic tangent activation functions, and

is the Hadamard product. The anticausal LSTM which processes the time series backward in time is similar to the forward LSTM with reverse time order which leads to similar equations with different weights and biases (, , ); moreover, and are replaced respectively by and . The outputs of the two LSTMs are then concatenated to capture the contextual information of the whole time series.

The architecture of the proposed BiLSTM network is illustrated in Fig. 5

in which 3 layers of BiLSTMs with 400 hidden units per layer (200 for each LSTM) are followed by a leaky rectifier linear unit (Leaky ReLu) layer, a fully connected layer, and a softmax layer. We have scrutinized and evaluated several different combinations, to empirically identify the best architecture. To name a few, we have examined a different number of BiLSTM layers, different number of memory cells per layer, multiple activation functions (e.g., linear, ReLu, Leaky ReLu, and sigmoid), different number of fully connected layers, and different parameters for Leaky ReLu layer. Leaky Relu has the following equation


where typically is a small number (e.g., 0.01). However, we have obtained the best result with

. The theoretical reason behind this observation is not clear which is not an uncommon situation in the field of deep learning. Although there are studies which discuss the effect of different nonlinearities 

[59, 60, 61], they mainly focus on CNNs and suffer from the lack of mathematical rigor.

We have also applied the dropout mechanism [62, 63] between different layers of the network, but the classification accuracy declined. We implemented our proposed method in MATLAB R2018b which only has an input/output dropout layer. However, for RNNs there is a more effective type of dropout mechanism which is applied to recurrent layers [64]. In fact, since the employed dropout mechanism was not useful, we decided not to use it and instead selected a set of discriminative features before feeding them to the BiLSTM network. For physiology informed features we proposed a heuristic feature selection method as follows. At first, we pre-train the BiLSTM network with 500 features, and rank them using the following ad hoc score:


and then select the 75 top-ranked features. In (23), and are the weights of the connections between the -th feature and the -th memory cell of the forward and backward LSTMs in the first BiLSTM layer, respectively, is the absolute value, and is the number of memory cells of each LSTM. For physiology agnostic features we do not apply any feature selection and feed them (390 features) directly to the network.

All features Physiology informed Physiology agnostic
(n = 465) features (n = 75) features (n = 390)
fold 1 0.46 0.90 0.50 0.92 0.42 0.89

fold 2
0.48 0.89 0.47 0.92 0.53 0.90

fold 3
0.50 0.91 0.56 0.93 0.49 0.89

fold 4
0.48 0.90 0.55 0.92 0.46 0.89

fold 5
0.43 0.91 0.51 0.92 0.46 0.88

fold 6
0.55 0.91 0.53 0.91 0.50 0.90

fold 7
0.54 0.90 0.51 0.92 0.43 0.90

fold 8
0.52 0.91 0.44 0.90 0.45 0.88

fold 9
0.52 0.91 0.49 0.91 0.44 0.90

fold 10
0.42 0.89 0.50 0.92 0.40 0.87

0.49 0.90 0.51 0.92 0.46 0.89

(0.04) (0.01) (0.04) (0.01) (0.04) (0.01)

Test data
TABLE I: The Classification Performance of All Features (submitted for PhysioNet challenge) together with Physiology Informed and Physiology Agnostic Features

Before feeding the network with training data, all PSG segments with non-target arousal labels are removed. Then, the recordings were sorted based on the feature sequence length. The sorted data are further divided into mini-batches with a size of 20 subjects. Feature sequences inside each mini-batch are zero-padded in order to have the same length. The network is trained by these mini-batches to obtain the weights and biases which minimize the cross-entropy loss function using the Adam optimization algorithm 


. In order to address the class imbalance problem, we use a weighted cross-entropy loss function with 0.9 and 0.1 weights for target arousal and non-arousal classes, respectively. Moreover, we use 0.005 learning rate which is 5 times larger than the default value of the Adam algorithm. By choosing this value, we obtain a better result and have a faster training phase. Other important parameters such as exponential decay rates of the first and second moment estimates are set to their default values: 0.9 and 0.999. The training is done during 30 epochs, but after every 10 epochs, the learning rate drops to 70% of its previous value. Finally, we also employ the gradient norm clipping techniques 

[66] by putting a further constraint on the gradient norm not to be larger than 1. If , the gradient is replaced by . The reason for using this technique is that if the gradient has a very large value, then the update term in the gradient descent-based algorithm may cause the parameters to jump to a point far from their current position, increasing the value of the loss function, thus wasting most of the efforts made so far to reach the current point [57]. To prevent this issue we move a smaller distance in the gradient direction.

V Results

The performance of our proposed method, consisting of 465 features and a BiLSTM network, for classification of PSG data for sleep arousal detection is assessed on the training dataset using a 10-fold cross-validation procedure. The proposed method achieves average scores of 0.49 and 0.90 for AUPRC and AUROC, respectively. Moreover, to evaluate the performance on the test dataset with hidden labels, the ensemble of the BiLSTM networks, trained on the aforementioned 10-fold cross-validation committee, is submitted to the PhysioNet system. The ensemble classifier achieves the state-of-the-art AUPRC score of 0.50, which is the second-best score during the follow-up and official phases of the 2018 PhysioNet challenge. This result is also 0.31 points better than our prior work [24].

Table I shows the classification performance of different sets of features using the same BiLSTM architecture. The set of physiology informed features achieves the best average scores of 0.51 AUPRC and 0.92 AUROC, even better than our submitted method. This result is the same as the result of the winner algorithm of the 2018 PhysioNet challenge [67] on the training dataset. The performance of the physiology agnostic features (i.e., scattering transform features) is worse than the results of the physiology informed features by 0.05 and 0.03 points in terms of AUPRC and AUROC, respectively. However, it is still among the top 5 PhysioNet algorithms on the training dataset.

Feature type Number of features # PSG channel AUPRC AUROC

1) cross-channel
13 9-11 0.42 (0.04) 0.88 (0.01)
2) abdominal 6 9 0.38 (0.04) 0.86 (0.01)
3) chest 5 10 0.29 (0.03) 0.82 (0.01)
4) airflow 12 11 0.29 (0.03) 0.79 (0.02)

5) SaO
5 12 0.28 (0.05) 0.82 (0.04)

6) EEGs
22 1-6 0.28 (0.04) 0.82 (0.01)
7) EOG 7 7 0.22 (0.04) 0.77 (0.02)
8) chin EMG + ECG 5 8,13 0.25 (0.03) 0.80 (0.01)

41 9-12 0.46 (0.04) 0.90 (0.01)

34 1-8,13 0.33 (0.03) 0.85 (0.01)

Physiology informed
75 1-13 0.51 (0.04) 0.92 (0.01)
Physiology agnostic 390 7,9-13 0.46 (0.04) 0.89 (0.01)

465 1-13 0.49 (0.04) 0.90 (0.01)

TABLE II: The Classification Results of Different Sets of Features on the Training Dataset

The performance of the physiology informed features may raise a question concerning our submitted method. The reason that the method with the inferior result (0.49 vs. 0.51; see Table I) is submitted for evaluation on the test dataset is that in the intermediate stage of this work in order to speed up the experiments, the performances of different methods were assessed by holdout validation strategy and the proposed method with 465 features achieved the best results. However, when we evaluate the models using the 10-fold cross-validation assessment technique we notice that the 75 physiology informed features outperform our proposed method by 0.02 points in terms of the AUPRC score. Since we only had one submission for the proposed algorithm, we could not evaluate the performance of our physiology informed features on the test dataset.

Table II shows the detailed performance of different types of features for sleep arousal detection using 10-fold cross-validation on the training dataset. Although limited in scope, for the sake of simplicity we use the same BiLSTM network architecture for all experiments. It is clear that for a more reliable comparison, the network architecture and parameters need to be optimized for each set of features. The only parameter which is altered for different sets of features is the learning rate of the Adam optimization algorithm.

The last points to discuss are two technicalities. First, the time resolution for analysis of the PSG data is 5 seconds. In other words, for each 5-second window, our classification algorithm generates only one label (probability of arousal) and in order to have the sample-wise probability of arousals as demanded by PhysioNet, we repeat the value 1000 (

) times. Second, in Tables I and II for training different folds whenever the optimization algorithm gets stuck at a local minimum or much more probably at a saddle point [68, 69] we rerun the training phase with different network initialization.

Vi Discussion

In this study, we investigate a comprehensive set of hand-engineered features for retrospective analysis of PSG data using a BiLSTM classifier for non-apneic/non-hypopneic arousal detection. We extract multi-domain features from different modalities. During multiple steps of feature selection techniques and expert judgment, the irrelevant and/or redundant features are eliminated to obtain a set of 75 physiology informed features. The final set of 465 features are built upon these 75 and an additional set of 390 features derived using a state-of-the-art scattering transform. The features are then fed into a BiLSTM network to classify the PSG data. Our proposed method achieves the second best score of 0.50 AUPRC on the hidden test dataset of the 2018 PhysioNet challenge. In this section, we scrutinize the results and discuss ways to further improve them.

Vi-a Comparative Evaluations of Selected Features

The best single type of features in Table II are the cross-channel features, which achieve average scores of 0.42 and 0.88 in terms of AUPRC and AUROC, respectively. To the best of our knowledge, this is the first time that p-values and SVD-based features are proposed for analysis of respiratory effort signals (chest and abdominal) alongside respiratory airflow. The next best single type of features are the ones extracted from the abdominal-only signal with an average score of 0.38 AUPRC. The features extracted from the chest, airflow, SaO, and EEG signals have nearly 0.29 AUPRC. EOG and chin EMG have also the same performance of 0.22 AUPRC. However, since the number of features extracted from chin EMG and ECG is low, they are combined together, resulting in 0.25 AUPRC.

The respiratory-related features, obtained by combining feature types 1-5, have a high AUPRC score of 0.46 for arousal detection. This is not surprising, considering that most of the arousals are RERAs and for detecting them respiratory-related biosignals such as airflow, chest, and abdominal play a pivotal role. However, the interesting observation is that the performance of SaO is as good as EEG, although the degree of oxygen desaturation is not a requirement for RERA detection [70]. If non-respiratory-related features, obtained by combining feature types 6-8, with 0.34 AUPRC score are added to the aforementioned respiratory-related features the resulting physiology informed features have average scores of 0.51 and 0.92 in terms of AUPRC and AUROC, respectively. This is the best-achieved result among all combination of features.

Regarding the selected EEG-based features, although AASM guidelines determine arousals as abrupt EEG frequency shifts toward rhythms such as alpha, theta, and/or beta above 16 Hz [42], our experiments show that only delta (0.1-4 Hz) power is chosen as a discriminative feature for arousal detection (see Section III-B2). This is an intriguing observation, not expected a priori. However, some studies support the hypothesis of a continuum in arousal activities which start from delta and K-complex bursts toward EEG arousals and full awakening [71, 72]. More specifically, an increase in delta power can be a pre-arousal activity which may or may not culminate to an EEG arousal [71]. Furthermore, in [73] and [74] the association between arousals and K-complexes or delta bursts preceding the events are confirmed. This occurs for arousals in NREM sleep stage but not during REM. Besides, in both upper airway resistance syndrome (UARS) and obstructive sleep apnea syndrome (OSAS), airway opening is associated with an increase in delta power which can be followed by an EEG arousal [75, 76, 77]. Since RERAs are increased in both UARS and OSAS, it might be a reason that delta power is one of the selected features, especially because we use a BiLSTM classifier capable of analyzing the sequence of transient events. However, we stress that at this point we cannot identify the causes of this observation with certainty and it requires further investigation. On top of that, we do not claim that alpha and beta powers are not important, but maybe that their information is covered by other features. For example, the RMS and STD of the signal amplitude can partially retain alpha and beta information. Recall that alpha and beta are low-amplitude high-frequency EEG rhythms.

Another interesting observation is that contrary to the AASM guidelines which recommend central and occipital EEG channels as primary signals for detecting EEG arousals, in our experiments most of the EEG-based features are selected from frontal channels. This might be partially advocated by the fact that delta activity and K-complexes predominate in the frontal lobe of the brain [78, 79, 80]. In addition, in [81] the authors show that for in-home sleep stage scoring the analysis of either frontal or central EEG channels leads to similar results when the recordings are scored by automatic Michele Sleep Scoring (MSS) system [82, 83].

Table II also shows that the performance of the physiology agnostic features based on the scattering transform is less than the performance of the physiology informed features by 0.05 and 0.03 points in terms of AUPRC and AUROC, respectively. This may have the following possible causes. First, in order to decrease the computational complexity in calculating the scattering transform, only 6 PSG time series are used (i.e., the remaining 7 time series including 6 EEGs and 1 chin EMG are not used). Second, to expedite the analysis and decrease the memory usage, the extracted features are downsampled by a factor of 4. Third, we believe that the way we treat the scattering transform as a physiology agnostic method restricts the performance of this set of features by not including any prior physiological information in designing the filter banks; we use the same filter banks for all biosignals (see Fig. 1). However, we already saw in Sections III-B that different clinical time series carry important information in different frequency bands. Forth, as we discussed earlier in Section III-C the invariants and stability conditions for different biosignals need to be explored for achieving the optimal performance of the scattering transform.

Although our first motivation to use the scattering transform was to semi-automatically derive a set of informative features with minimum expert intervention, a more efficient approach is to include minimum physiological information at least in designing the filter banks for different biosignals. Despite the aforementioned constraint imposed by us on using the scattering transform, yet again we underline that even with this suboptimal handling of this method, the result solely based on the scattering transform is still among the top 5 algorithms on the training dataset. Future developments may include the design of the optimal filter banks for each biosignal separately.

Vi-B Adaptation for Real-Time Classification

Although the proposed algorithm is designed for retrospective analysis of PSG data, with minor adaptation it can be used for real-time classification, possibly with an only 5-second delay. It is worth mentioning that all top-ranked algorithms for sleep arousal detection in PhysioNet 2018 challenge [84, 85, 86, 87] use longer analysis windows. The 5-second analysis window used by our algorithm makes it a potential candidate for real-time classification if needed. For real-time classification, we need to design the artifact removal filter based on the information in the current and/or the previous time-windows. To be more precise, we need to remove all future information which is available in the present version of our algorithm; in the current setup, the threshold for the artifact removal filter for each biosignal is calculated from the information of the entire time series (see section  III-A). After this stage, we only need to replace the BiLSTM layers by LSTM ones in the classifier (see Fig. 5). In that case, our algorithm can classify the PSG data in real-time. However, its performance degrades drastically by 0.13 and 0.05 points to 0.36 and 0.85 in terms of AUPRC and AUROC, respectively. This is to be expected since future information of the time series is not utilized in the LSTM network and may also be due to the non-optimized network architecture and parameters for the new setup.

Vi-C Study Limitations

The main limitation of the study is the annotation process of the PhysioNet dataset. During the labeling process, seven sleep technologists annotated the dataset. However, given the burden involved in manual annotation, each PSG recording was annotated by only one sleep technologist [16]. This calls into question both the consistency and the reliability of the annotations. Although the performance of our proposed method indicates that our algorithm can replicate the experts’ annotations accurately, its medical significance is limited by the labeling process. It is clear that high inter- and intra-rater agreement between sleep technologists would lead to a more reliable annotated dataset. This consequently results in greater clinical significance. Moreover, the number of submissions and the run-time constraint, imposed by the PhysioNet challenge rules, are the other limitations of this study.

Vii Conclusion

We have designed and implemented an automated PSG-based classification algorithm to detect non-apneic and non-hypopneic arousals. We have demonstrated and validated its performance using the 2018 PhysioNet challenge dataset, which is the newest and largest publicly available PSG dataset. Our proposed algorithm has tied for the second-best score during the follow-up and official phases of the 2018 PhysioNet challenge, by achieving the state-of-the-art performance of 0.50 in terms of AUPRC.

In this study, we have also paid special attention to extracting features based on the physiological process during RERAs, which is missing in typical end-to-end deep learning algorithms. We have investigated and evaluated the importance of different types of features for automatic arousal detection. We have had interesting findings regarding the selected features, which were not expected a priori and may contribute to a better understanding of the RERAs, helping us for developing new automated algorithms. Besides, we have developed an alternative semi-automatic PSG-based feature extraction method using scattering transform and discussed possible directions for improving the performance.

Viii Acknowledgment

The authors would like to thank Leo Kärkkäinen for his insightful and valuable comments on this work.


  • [1] A. Y. Avidan and P. C. Zee, Handbook of sleep medicine.   Lippincott Williams & Wilkins, 2011.
  • [2] M. Thorpy, “Current classification of sleep disorders,” in Synopsis of Sleep Medicine, S. R. Pandi-Perumal, Ed.   Apple Academic Press, 2016, pp. 83–98.
  • [3] M. Bonnet et al., “EEG arousals: scoring rules and examples. A preliminary report from sleep disorders atlas task force of the American Sleep Disorders Association,” Sleep, vol. 15, no. 2, pp. 173–184, 1992.
  • [4] D. Kirsch, Sleep medicine in neurology.   John Wiley & Sons, 2013.
  • [5] I. Trosman and A. Ivanenko, “Epidemiology of sleep disorders,” in Synopsis of Sleep Medicine, S. R. Pandi-Perumal, Ed.   Apple Academic Press, 2016, pp. 65–82.
  • [6] R. E. Salas et al., “Sleep-related movement disorders and their unique motor manifestations,” in Principles and Practice of Sleep Medicine.   Elsevier, 2017, pp. 1020–1029.
  • [7] H. Sun et al., “Large-scale automated sleep staging,” Sleep, vol. 40, no. 10, 2017.
  • [8] S. Biswal et al., “Expert-level sleep scoring with deep neural networks,” Journal of the American Medical Informatics Association, vol. 25, no. 12, pp. 1643–1650, 2018.
  • [9] J. B. Stephansen et al., “Neural network analysis of sleep stages enables efficient diagnosis of narcolepsy,” Nature Communications, vol. 9, no. 1, 2018.
  • [10] H. Phan et al., “SeqSleepNet: End-to-end hierarchical recurrent neural network for sequence-to-sequence automatic sleep staging,” IEEE Transactions on Neural Systems and Rehabilitation Engineering, vol. 27, no. 3, pp. 400–410, March 2019.
  • [11] H. Phan et al., “Joint classification and prediction CNN framework for automatic sleep stage classification,” IEEE Transactions on Biomedical Engineering, vol. 66, no. 5, pp. 1285–1296, May 2019.
  • [12] N. Cooray et al., “Detection of REM sleep behaviour disorder by automated polysomnography analysis,” Clinical Neurophysiology, vol. 130, no. 4, pp. 505–514, 2019.
  • [13] A. Malafeev et al., “Automatic human sleep stage scoring using deep neural networks,” Frontiers in neuroscience, vol. 12, 2018.
  • [14] T. Penzel et al., “Digital analysis and technical specifications,” Journal of clinical sleep medicine, vol. 3, no. 02, pp. 109–120, 2007.
  • [15] M. M. Ghassemi et al. (2018) You snooze, you win: the PhysioNet/Computing in Cardiology Challenge 2018. [Online]. Available:
  • [16] M. M. Ghassemi et al., “You snooze, you win: the physionet/computing in cardiology challenge 2018,” in 2018 Computing in Cardiology (CinC).   IEEE, 2018.
  • [17] R. S. Rosenberg and S. Van Hout, “The American Academy of Sleep Medicine inter-scorer reliability program: respiratory events,” Journal of clinical sleep medicine, vol. 10, no. 04, pp. 447–454, 2014.
  • [18] M. D. L. Santos and M. Hirshkowitz, “Scoring of sleep stages, breathing, and arousals,” in Oxford Textbook of Sleep Disorders, S. Chokroverty and L. Ferini-Strambi, Eds.   Oxford University Press, 2017.
  • [19] R. B. Berry et al., “The AASM manual for the scoring of sleep and associated events: Rules, terminology and technical specifications,” American Academy of Sleep Medicine, 2012.
  • [20] C. Cracowski et al., “Characterization of obstructive nonapneic respiratory events in moderate sleep apnea syndrome,” American journal of respiratory and critical care medicine, vol. 164, no. 6, pp. 944–948, 2001.
  • [21] C. Guilleminault et al., “A cause of excessive daytime sleepiness: the upper airway resistance syndrome,” Chest, vol. 104, no. 3, pp. 781–787, 1993.
  • [22] C. Guilleminault et al., “Upper airway resistance syndrome, nocturnal blood pressure monitoring, and borderline hypertension,” Chest, vol. 109, no. 4, pp. 901–908, 1996.
  • [23] J. F. Masa et al., “Habitually sleepy drivers have a high frequency of automobile crashes associated with respiratory disorders during sleep,” American Journal of respiratory and critical care medicine, vol. 162, no. 4, pp. 1407–1412, 2000.
  • [24] M. Zabihi et al., “Automatic sleep arousal detection using state distance analysis in phase space,” in 2018 Computing in Cardiology (CinC).   IEEE, 2018.
  • [25] Y. LeCun et al., “Deep learning,” Nature, vol. 521, no. 7553, p. 436, 2015.
  • [26] N. Hatami et al., “Classification of time-series images using deep convolutional neural networks,” in Tenth International Conference on Machine Vision (ICMV 2017), vol. 10696, 2018.
  • [27] S. Mallat, “Group invariant scattering,” Communications on Pure and Applied Mathematics, vol. 65, no. 10, pp. 1331–1398, 2012.
  • [28] S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997.
  • [29] M. Zabihi et al., “Detection of atrial fibrillation in ECG hand-held devices using a random forest classifier,” in 2017 Computing in Cardiology (CinC).   IEEE, 2017.
  • [30] I. Isasi et al., “ECG rhythm analysis during manual chest compressions using an artefact removal filter and random forest classifiers,” in 2018 Computing in Cardiology (CinC).   IEEE, 2018.
  • [31] F. Takens, “Detecting strange attractors in turbulence,” in Dynamical systems and turbulence, Warwick 1980.   Springer, 1981, pp. 366–381.
  • [32] A. B. Rad et al., “ECG-based classification of resuscitation cardiac rhythms for retrospective data analysis,” IEEE Transactions on Biomedical Engineering, vol. 64, no. 10, pp. 2411–2418, 2017.
  • [33] M. J. Thorpy and G. Plazzi, The parasomnias and other sleep-related movement disorders.   Cambridge University Press, 2010.
  • [34] J. F. Pagel and S. R. Pandi-Perumal, Primary Care Sleep Medicine: A Practical Guide.   Springer, 2014.
  • [35] J. Masa et al., “Assessment of thoracoabdominal bands to detect respiratory effort-related arousal,” European Respiratory Journal, vol. 22, no. 4, pp. 661–667, 2003.
  • [36] M. Goldman et al., “Asynchronous thoracoabdominal movements in chronic airflow obstruction (CAO),” in Modeling and Control of Ventilation.   Springer, 1995, pp. 95–100.
  • [37] L. E. Krahn et al., Atlas of Sleep Medicine.   CRC Press, 2010.
  • [38] O. Christensen, An introduction to frames and Riesz bases, 2nd ed.   Springer, 2016.
  • [39] J. P. Burg, “A new analysis technique for time series data,” in NATO Advanced Study Institute of Signal Processing with emphasis on Underwater Acoustics.   New York: IEEE Press, 1968.
  • [40] H. Akaike, “A new look at the statistical model identification,” IEEE Transactions on Automatic Control, vol. 19, no. 6, pp. 716–723, 1974.
  • [41] G. Schwarz et al., “Estimating the dimension of a model,” The Annals of Statistics, vol. 6, no. 2, pp. 461–464, 1978.
  • [42] R. B. Berry et al., The AASM manual for the scoring of sleep and associated events: rules, terminology and technical specifications.   American Academy of Sleep Medicine, 2018.
  • [43] Z. Zaiwalla and R. Killick, “Polysomnography and other investigations for sleep disorders,” in Oxford Textbook of Clinical Neurophysiology.   Oxford Univ. Press, 2017, vol. 187.
  • [44] S. Mallat, “Understanding deep convolutional networks,” Philosophical Transactions of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 374, no. 2065, p. 20150203, 2016.
  • [45] D. E. Rumelhart et al., “Learning representations by back-propagating errors,” Nature, vol. 323, no. 9, 1986.
  • [46] S. Mallat, A Wavelet Tour of Signal Processing, Third Edition: The Sparse Way, 3rd ed.   Orlando, FL, USA: Academic Press, Inc., 2008.
  • [47] J. Andén and S. Mallat, “Multiscale scattering for audio classification.” in ISMIR.   Miami, FL, 2011, pp. 657–662.
  • [48] S. Mallat and I. Waldspurger, “Phase retrieval for the Cauchy wavelet transform,” Journal of Fourier Analysis and Applications, vol. 21, no. 6, pp. 1251–1309, 2015.
  • [49] I. Waldspurger, “Exponential decay of scattering coefficients,” in 2017 International Conference on Sampling Theory and Applications (SampTA).   IEEE, 2017, pp. 143–146.
  • [50] J. Andén and S. Mallat, “Deep scattering spectrum,” IEEE Transactions on Signal Processing, vol. 62, no. 16, pp. 4114–4128, 2014.
  • [51] J. Bruna and S. Mallat, “Invariant scattering convolution networks,” IEEE transactions on pattern analysis and machine intelligence, vol. 35, no. 8, pp. 1872–1886, 2013.
  • [52] L. Sifre and S. Mallat, “Rotation, scaling and deformation invariant scattering for texture discrimination,” in

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , 2013, pp. 1233–1240.
  • [53] J. Andén et al., “Classification with joint time-frequency scattering,” arXiv preprint arXiv:1807.08869, 2018.
  • [54] F. A. Gers et al., “Learning to forget: Continual prediction with LSTM,” Neural Computation, vol. 12, no. 10, pp. 2451–2471, 2000.
  • [55] T. G. Dietterich, “Machine learning for sequential data: A review,” in Joint IAPR international workshops on statistical techniques in pattern recognition (SPR) and structural and syntactic pattern recognition (SSPR).   Springer, 2002, pp. 15–30.
  • [56] L. Breiman, “Random forests,” Machine learning, vol. 45, no. 1, pp. 5–32, 2001.
  • [57] I. Goodfellow et al., Deep Learning.   MIT Press, 2016.
  • [58] Y. Bengio et al., “Learning long-term dependencies with gradient descent is difficult,” IEEE transactions on neural networks, vol. 5, no. 2, pp. 157–166, 1994.
  • [59] C. Djork-Arné et al., “Fast and accurate deep network learning by exponential linear units (ELUs),” in Proceedings of the International Conference on Learning Representations (ICLR), vol. 6, 2016.
  • [60] K. He et al.

    , “Delving deep into rectifiers: Surpassing human-level performance on imagenet classification,” in

    Proceedings of the IEEE international conference on computer vision, 2015, pp. 1026–1034.
  • [61] B. Xu et al., “Empirical evaluation of rectified activations in convolutional network,” arXiv preprint arXiv:1505.00853, 2015.
  • [62] G. E. Hinton et al., “Improving neural networks by preventing co-adaptation of feature detectors,” arXiv preprint arXiv:1207.0580, 2012.
  • [63] N. Srivastava et al., “Dropout: a simple way to prevent neural networks from overfitting,” The Journal of Machine Learning Research, vol. 15, no. 1, pp. 1929–1958, 2014.
  • [64] Y. Gal and Z. Ghahramani, “A theoretically grounded application of dropout in recurrent neural networks,” in Advances in neural information processing systems, 2016, pp. 1019–1027.
  • [65] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” in Proceedings of the International Conference on Learning Representations, 2015.
  • [66] R. Pascanu et al., “On the difficulty of training recurrent neural networks,” in International Conference on Machine Learning, 2013, pp. 1310–1318.
  • [67] M. Howe-Patterson et al., “Automated detection of sleep arousals from polysomnography data using a dense convolutional neural network,” in 2018 Computing in Cardiology (CinC).   IEEE, 2018.
  • [68] Y. N. Dauphin et al., “Identifying and attacking the saddle point problem in high-dimensional non-convex optimization,” in Advances in Neural Information Processing Systems, 2014, pp. 2933–2941.
  • [69] A. Choromanska et al., “The loss surfaces of multilayer networks,” in Artificial Intelligence and Statistics, 2015, pp. 192–204.
  • [70] M. A. C. Bornemann et al., “Upper airway resistance syndrome,” in Geriatric Otolaryngology.   CRC Press, 2006, pp. 437–448.
  • [71] P. Halász et al., “The nature of arousal in sleep,” Journal of Sleep Research, vol. 13, no. 1, pp. 1–23, 2004.
  • [72] E. Sforza et al., “Cardiac activation during arousal in humans: further evidence for hierarchy in the arousal response,” Clinical Neurophysiology, vol. 111, no. 9, pp. 1611–1619, 2000.
  • [73] F. De Carli et al., “Quantitative analysis of sleep EEG microstructure in the time–frequency domain,” Brain Research Bulletin, vol. 63, no. 5, pp. 399–405, 2004.
  • [74] M. G. Terzano et al., “CAP and arousals in the structural development of sleep: an integrative perspective,” Sleep medicine, vol. 3, no. 3, pp. 221–229, 2002.
  • [75] J. E. Black et al., “Upper airway resistance syndrome: central electroencephalographic power and changes in breathing effort,” American Journal of Respiratory and Critical Care Medicine, vol. 162, no. 2, pp. 406–411, 2000.
  • [76] D. Poyares et al., “Arousal, EEG spectral power and pulse transit time in UARS and mild OSAS subjects,” Clinical Neurophysiology, vol. 113, no. 10, pp. 1598–1606, 2002.
  • [77] R. B. Berry et al., “Within-night variation in respiratory effort preceding apnea termination and EEG delta power in sleep apnea,” Journal of Applied Physiology, vol. 85, no. 4, pp. 1434–1441, 1998.
  • [78] L. McCormick et al., “Topographical distribution of spindles and K-complexes in normal subjects,” Sleep, vol. 20, no. 11, pp. 939–941, 1997.
  • [79] E. A. Accolla et al., “Clinical correlates of frontal intermittent rhythmic delta activity (FIRDA),” Clinical Neurophysiology, vol. 122, no. 1, pp. 27–31, 2011.
  • [80] K. Maurer and T. Dierks, Atlas of brain mapping: topographic mapping of EEG and evoked potentials.   Springer Science & Business Media, 2012.
  • [81] M. Younes et al., “Accuracy of automatic polysomnography scoring using frontal electrodes,” Journal of Clinical Sleep Medicine, vol. 12, no. 05, pp. 735–746, 2016.
  • [82] A. Malhotra et al., “Performance of an automated polysomnography scoring system versus computer-assisted manual scoring,” Sleep, vol. 36, no. 4, pp. 573–582, 2013.
  • [83] M. Younes et al., “Utility of technologist editing of polysomnography scoring performed by a validated automatic system,” Annals of the American Thoracic Society, vol. 12, no. 8, pp. 1206–1218, 2015.
  • [84] H. M. Þráinsson et al., “Automatic detection of target regions of respiratory effort-related arousals using recurrent neural networks,” in 2018 Computing in Cardiology (CinC).   IEEE, 2018.
  • [85] R. He et al., “Identification of arousals with deep neural networks (DNNs) using different physiological signals,” in 2018 Computing in Cardiology (CinC).   IEEE, 2018.
  • [86] B. Varga et al., “Using auxiliary loss to improve sleep arousal detection with neural network,” in 2018 Computing in Cardiology (CinC).   IEEE, 2018.
  • [87] A. Patane et al., “Automated recognition of sleep arousal using multimodal and personalized deep ensembles of neural networks,” in 2018 Computing in Cardiology (CinC).   IEEE, 2018.