1 Introduction and Background
When we sleep, our muscles relax. For the Obstructive Sleep Apnea (OSA) patients, the muscles in the back of throat can relax too much and collapse the airway, and lead to breathing difficulty. OSA presents with abnormal oxygenation, ventilation and sleep pattern. The prevalence of OSA has been reported to be between 1% to 5% Dehlink and Tan (2016). Children at risk need timely investigation and treatment.
The gold standard for diagnosing sleep disorders is polysomnography (PSG), which generates extensive data about biophysical changes during sleep. Studies of PSG assist doctors to diagnose sleep disorders and provide the baseline for an appropriate follow up. A clinical sleep study design based on PSG is to acquire several biological signals while patients are sleeping, These signals typically include electroencephalography (EEG) for monitoring brain activity, electromyogram (EMG) to measure muscle activity and Electrocardiography (ECG) for the electrical activity of heart over a period of sleep Moridani et al. (2019).
In recent decades, various alternative methods have been proposed to minimize the number of biosignals required to detect and classify the OSA. These studies include traditional machine learning methods such as Support Vector Machine and linear discriminant analysis on signals such as ECGAlmuhammadi et al. (2015), respiratory signals Varon et al. (2015), a combination of extracted features and shallow neural network on heart rate variability and ECG derived respiration signal Tripathy (2018)
. These studies focused on extracting time domain, frequency domain, and other nonlinear features from physiological signals and applying some feature selection techniques to reduce the number of dimensions comprising the feature space. However, this process can be labour-intensive, requires domain knowledge, and is particularly limited and costly for high-dimensional data. In addition, feature extraction is difficult for traditional machine learning techniques as the number of features increase dramatically.
Deep learning framework has proved its modeling ability in different PSG channels. McCloskey et al. employed a 2D-CNN model on spectrograms of nasal airflow signal, and their model achieved an average accuracy of 77.6% on three severity levels McCloskey et al. (2018)
. Another more outstanding application of deep learning model came from the work of Cheng et al. in which researchers used a four layered Long Short Term Memory (LSTM) model on the RR-ECG signal and achieved an average accuracy of 97.80% on the detection of OSACheng et al. (2017).
Though recurrent model (e.g., RNN, LSTM) can process time-series data and make sequential predictions, CNN can be trained to recognize the same patterns (severity levels) on different subfields within fixed time windows. CNN saves time from manual scoring in the laboratory environment and makes the pre-screening stage easier in contrast to traditional methods. Moreover, in order to increase the model generalization ability, we tried to explore 1D-CNN models with different length of segmentations in EEG, ECG, EMG and respiratory channels. We focused on the model structure and utilized the fine-tuned model for pediatric OSA prediction in our study.
The rest of this paper is organized as follows. Chapter 2 explains the data processing in detail. Chapter 3 displays the structure of the proposed 1D-CNN model. Evaluation and experimental results are presented in Chapter 4. Finally, Chapter 5 draws discussion and conclusion of the research.
2 Cleveland Children’s Sleep and Health Study Database
The data are retrieved from the National Sleep Research Resource (NSRR), which is a new National Heart, Lung, and Blood Institute resource designed to provide big data resources to the sleep research community. The PSG data are available from Cleveland Children’s Sleep and Health Study (CCSHS) database. Each anonymous record includes a summary result of a 12-hour overnight sleep study (awake and sleep stages) including annotation files with scored events and PSG signals and being formatted as the European Data Format (EDF).
The following channels are selected for the 1D CNN Modeling: 4 EEG channels (C3/C4 and A1/A2), 3 EMG channels (EMG1, EMG2, EMG3), 2 ECG channels (ECG1 and ECG2), and 3 respiratory channels including airflow, thoracic and abdominal breathing.
2.1 Individual Labeling
To define the target variable for this classification problem, each participant needs one label based on the OSA severity level. The Obstructive Apnea Hypopnea Index (oahi3) is used to indicate the severity of sleep apnea. It is represented by the number of apnea and hypopnea events per hour of sleep. It combines AHI and oxygen desaturation to give an overall sleep apnea severity score that evaluates both the number of sleep disruptions and the degree of oxygen desaturation (low oxygen level in the blood). The values of oahi3 are used as the thresholds for grouping the participants. The number of participants with different severity levels are shown in Table 1.
|Obstructive Apnea Hypopnea Index||Level of Severity||Number of Participants|
|0 oahi3 1||NL (Normal)||362|
|1 oahi3 5||MIN (Minor)||139|
|5 oahi3 10||MOD (Moderate)||8|
|10 oahi3||SV (Severe)||8|
The dataset has an imbalanced response variable (362 normal / 139 minor / 8 moderate / 8 severe). Those minority classes (moderate and severe) are our most interest. We tried to train classifier to learn more from moderate and severe level data. Under-sampling method was applied during the data pre-processing stage, i.e., we randomly selected an equal number of samples (i.e., 8 participants) from each of the normal and minor groups. Overall, there are 32 participants in the final study data set. In this project, we conduct data pre-processing and CNN modeling on the data in EDF format which have a total size of 13 GB.
2.2 Data Preprocessing
This experiment focuses on the sleep data. The beginning and ending awake signals could be treated as noise and need to be removed. Secondly, the deep learning algorithms tend to be difficult to train when the length of time series is very long. Figure 1 presents a segmentation strategy, i.e., dividing the time series into smaller chunks.
each segment was labeled as the same severity level as the participant. In other words, the segments would inherit the severity label from the participant they belong to. With a starting length of L time steps, one channel is divided into blocks of sequence Seq_L yielding about L / Seq_L of new events (or rows) of shorter length (N).
The PSG data were segmented into 1-minute long events. For the ECG channel (frequency of 256) a 1-minute event has a length of 15360 (256
60) data points. An individual has a 8.24-hour ECG channel, which would have 1D time series data with length of 7595520. After segmentation, the long series data turned into a tensor with dimension 49415360, which indicates 494 events (a length of 15360 for each). Since we have 32 selected participants and 2 ECG channels for each participant, the input tensor has the dimension of 15824 (N)15360 (Seq_L)2 (channels).
With the data segmentation, the length of each time-series is shorter and will be helpful in model training; and the number of data points has increased by a factor of L / Seq_L (number of instances or rows) providing a larger data set to train on.
Since different channels (e.g., ECG, EMG) were measured in different amplitudes, therefore, the last step of data processing is to normalize the PSG data with zero mean and unit standard deviation.
3 1D-CNN Architecture
The convolutional layer and max-pooling layer play the key roles in the CNN’s feature extraction mechanism. The output of convolutional layer of thelayer can be calculated as in Formula 1:
where represents the filter number, denotes the channel number of the input , is the convolutional filter to the channel, and is the bias to the filter, and is the dot product operation.
The max-pooling layer is a sub-sampling function selecting the maximum value within a fixed size filter. After the convolution-pooling blocks, one fully connected layer of neurons which have full connections to all activations in the previous layer, as in the regular Neural Networks. At the end of the convolutional layers, the data were flattened and passed onto the Dropout layer before the softmax classifier.
shows the structure of the 1D CNN model proposed in this project. It contains 3 convolutional and 3 max-pooling layers. We focused our efforts on the CNN building and began the investigation of the CNN method initially by performing a grid search of several hyperparameters.
For each participant, his or her PSG data were served for either training or test data, not for both. We implemented a two-level stratified random sampling. In details, there were 2 splitting steps among 32 participants: firstly, 8 were randomly selected as test participants (i.e., 2 participants were randomly selected for each severity level); secondly, the remaining 24 participants were split into two groups: 18 participants for training set and 6 participants for validation set. The tensorflow graph was fed with batches of the training data and the hyperparameters were tuned on a validation set. Finally the trained model was evaluated on the test set.
The CNN model was trained in a fully supervised manner, and the gradients were back-propagated from the softmax layer to the convolutional layers. The network parameters were optimized by minimizing the cross-entropy loss function based on the gradient descent with the Adam updating rule and a learning rate of 0.0001.
|CNN Layer||of filters||Filter Size||Stride||Padding||Activation Function|
Table 2 presents the final values of parameters within each layer. Dropout rate of was used as it is the general setting for CNN models. Model classification performance is evaluated by using the following metrics: classification accuracy, cross-entropy loss, precision, recall and F1-score. While accuracy and loss can be used for evaluating the overall performance, some other metrics can be used to measure the performance of specific class.
4 Results and Analysis
Figure 3 shows the learning curve on training and validation phases. Accuracy and loss were obtained with various number of iterations. The accuracy increases as the number of iteration increases, and the loss decreases at the same time. The accuracy and the loss reach stable values after iterative learning on both phases.
For ECG, we can observe the stable accuracy and loss values after 1000 iterations (Training acc: 0.9987, loss: 0.0114; Validation acc: 0.9916, loss: 0.0289). For EEG, the accuracy and the loss start to converge to a value after 2500 iterations (Training acc: 0.9718, loss: 0.0945; Validation acc: 0.9447, loss: 0.1985). For EMG, the accuracy and the loss become stable after 4000 iterations (Training acc: 0.9999, loss: 0.0013; Validation acc: 0.9707, loss: 0.1131). However, there are a large number of big fluctuations before the convergence during the learning process. This means some portion of the randomness: (1) The Dropout method could cause the network to keep only some portion of neurons (weights) on each iteration. Sometimes those neurons do not fit the current batch well, and this may cause large fluctuations; (2) There is randomness in initialization and data sampling for SGD in back-propagation.
For Respiratory, we can see the train and validation accuracy begin to stay steady with similar values indicating slight overfitting in the classification (Training acc: 0.9854, loss: 0.0378; Validation acc: 0.9180, loss: 0.2945).
The CNN Evaluation Metrics
|ECG Training||ECG Test|
|EEG Training||EEG Test|
|EMG Training||EMG Test|
|Respiratory Training||Respiratory Test|
The evaluation metrics and confusion matrices for all channels with training and test data are presented in Tables 3 and 4 respectively. The results from Table 4 are summarized in Table 3. It can be observed from Table 3 that, for the test data, the CNN model can achieve 98.97% for ECG, 94.63% for EEG, 95.81% for EMG, and 91.99% for Respiratory; We can also verify the training curves from Figure 3 by checking the training accuracy score from Table 3 and the classified results from Table 4. Furthermore, the precision, recall and F1-score for each class are collected in Table 3.
For ECG, the model can achieve a value of for all three metrics for all classes on the training data and for the test data; For EEG, the model achieves score for training data, and for the test data.
For EMG, the scores of 1.0000 are obtained in the training phase on all classes, which means the perfect classification for the training data during the learning process, while the scores of are obtained from the test data.
Similarly, for Respiratory, CNN achieves scores of for the training and slightly lower scores, which are over for the test data. The reason why there exists the gap between training and test scores can be that the respiratory signal sensors is different from ECG, EEG and EMG. In this case, the signal in the respiratory system may not be sensitive enough to detect small changes when OSA happens. Table 4 displays the classification details on the training and test data.
5 Conclusion and Discussion
Firstly, with the correct hyper-parameter setup, our 1D-CNN model can successfully extract the temporal features from the PSG data and achieve high performance in OSA detection for different channels; secondly, our well trained CNN model can be an efficient tool for clinicians to identify OSA severity without manually going through tons of PSG data. Furthermore, our CNN models can replace the traditional data processing such as signal extraction and transforming, which can be time-consuming and labour-intense.
There are some limitations of our work. Firstly, only a small sample of 32 subjects was investigated in this study. Secondly, we used ECG, EEG, EMG and Respiratory channels to build CNN models separately, so there was no cross-checking between different channels. Lastly, our CNN model is slow to be trained without GPU. The well-trained models require a big data set and the fine-tuned hyperparameters in the training step.
The future work can aim at feeding the four single CNN models into an ensemble-like model to making a prediction. There are other possible architectures that would be of great interest for this problem. One of most popular deep learning architectures that models sequence and time-series data is the long-short-term memory (LSTM) cells within recurrent neural networks (RNN).
Acknowledgements.We are so grateful that National Sleep Research Resource (NSRR) allows us to use the PSG data from Cleveland Children’s Sleep and Health Study. The project is supported by Natural Sciences and Engineering Research Council of Canada (NSERC).
-  (2015-05) Efficient obstructive sleep apnea classification based on eeg signals. In 2015 Long Island Systems, Applications and Technology, Vol. , pp. 1–6. External Links: Cited by: §1.
-  (2017-07) Recurrent neural network based classification of ecg signal features for obstruction of sleep apnea detection. pp. 199–202. External Links: Cited by: §1.
-  (2016) Update on paediatric obstructive sleep apnoea. Journal of Thoracic Disease 8 (2). External Links: Cited by: §1.
-  (2018-06) Detecting hypopnea and obstructive apnea events using convolutional neural networks on wavelet spectrograms of nasal airflow. pp. 361–372. External Links: Cited by: §1.
-  (2019-02) A reliable algorithm based on combination of emg, ecg and eeg signals for sleep apnea detection : (a reliable algorithm for sleep apnea detection). In 2019 5th Conference on Knowledge Based Engineering and Innovation (KBEI), Vol. , pp. 256–262. External Links: Cited by: §1.
-  (2018) Application of intrinsic band function technique for automated detection of sleep apnea using hrv and edr signals. Biocybernetics and Biomedical Engineering 38 (1), pp. 136 – 144. External Links: Cited by: §1.
-  (2015-Sep.) A novel algorithm for the automatic detection of sleep apnea from single-lead ecg. IEEE Transactions on Biomedical Engineering 62 (9), pp. 2269–2278. External Links: Cited by: §1.