Automatic scoring of apnea and hypopnea events using blood oxygen saturation signals

03/22/2020 ∙ by R. E. Rolón, et al. ∙ 0

The obstructive sleep apnea-hypopnea (OSAH) syndrome is a very common and frequently undiagnosed sleep disorder. It is characterized by repeated events of partial (hypopnea) or total (apnea) obstruction of the upper airway while sleeping. This study makes use of a previously developed method called DAS-KSVD for multiclass structured dictionary learning to automatically detect individual events of apnea and hypopnea using only blood oxygen saturation signals. The method uses a combined discriminant measure which is capable of efficiently quantifying the degree of discriminability of each one of the atoms in a dictionary. DAS-KSVD was applied to detect and classify apnea and hypopnea events from signals obtained from the Sleep Heart Health Study database. For moderate to severe OSAH screening, a receiver operating characteristic curve analysis of the results shows an area under the curve of 0.957 and diagnostic sensitivity and specificity of 87.56 represent improvements as compared to most state-of-the-art procedures. Hence, the method could be used for screening OSAH syndrome more reliably and conveniently, using only a pulse oximeter.



There are no comments yet.


page 6

This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

1 Introduction

Pulse oximetry, being a cheap and non-invasive technique, has become a promising supporting tool for the diagnosis of sleep disorders [1, 2, 3]. Sleep disorders comprise several types of medical conditions. The most common one of them is the Obstructive Sleep Apnea-Hypopnea (OSAH) syndrome, which is caused by frequent breathing pauses due to partial (hypopnea) or total (apnea) blockage of the upper airway during sleeping, which lead to several physiological changes such as blood oxygen desaturation [4, 5]. To establish the severity of this pathology, the apnea-hypopnea index (AHI) is commonly used. This index is defined as the number of apnea-hypopnea events per hour of sleep or record according to whether it refers to a complete study or a simplified one, respectively (more on this later). Most screening methods do not discriminate between apnea and hypopnea events since it is not strictly required for computing the AHI index [2]. However, recognition of single apnea and hypopnea events provides additional information regarding the severity of OSAH syndrome that could be important for clinical and decision-making purposes [6]. Nevertheless, automatically distinguishing and identifying those two respiratory events is a challenging task, specially when the number of available signals is low.

Achieving a good AHI estimation using recordings of just a few signals is a difficult problem that requires of precise ad-hoc evaluation tools for the clinical screening of OSAH syndrome

[7]. In the past decade much interest in the development of portable devices using at most two sensors for OSAH screening has been observed (e.g. [8, 9, 10, 11]). In particular, the authors in [9] present a detailed review of existing methods that use only pulse oximetry signals for automatically classifying patients having OSAH syndrome. It is important to highlight however that all methods mentioned in that review address only the detection of the pathology and do not recognize nor classify small segments of oximetry signals as normal breathing, apnea or hypopnea events. In that way, up to our knowledge, the problem of individually classifying abnormal respiratory events using only signals in a multi-class scenario has never been explored before. Therefore, properly identifying hypopneas which were not detected by other approaches may add value in the diagnosis and tratement of the patiens.

There are methods for binary classification (existence or nonexistence of abnormal respiratory events) of signals from which the AHI index can be estimated [2, 3, 12, 13]. In particular, the articles [12] and [13] make use of the so called Oxygen Desaturation Index (ODI) defined as the number of times that the signal falls below a prescribed percentage of signal saturation regarding a baseline level per hour of study. It is timely to point out however that although the concept of “baseline level” is somewhat intuitive, there is yet no consensus about its formal definition, and different authors have adopted different ones [12, 13]. In [12], for instance, the baseline level was defined as the desaturation mean of the previous minute, while a completely different approach was followed in [13] where it was computed using a moving time average. In [2], the authors present a method for detecting blood oxygen desaturations using specific waves (or modes) coming from empirical mode decompositions of signals. In that work, the desaturations are identified by making use of a few thresholds and a set of simple rules which lead to the detection of the sleep apnea-hypopnea syndrome. Finally, in [3], we introduced a different approach based on sparse representations of signals. In that work, the AHI index is directly estimated without computing the ODI index, as the average of the number of abnormal respiratory events per hour of study.

To tackle the problem of individually identifying and distinguishing between apnea and hypopnea events using only signals we make use of a previously developed method [14]. For that, segments of training signals are used to learn a discriminant dictionary. Also, at the dictionary learning stage, a multi-class multi-objective information measure is used for quantifying the discriminability of each atom in the dictionary. Finally, sparse representations of the data in terms of the dictionary are computed and then used as input of a classifier (see Section 4.3).

The organization of this paper is as follows: In Section 2, a brief description about abnormal respiratory events during sleep is presented. Dictionary learning methods for sparse representation are introduced in Section 3. Section 4 contains details on all designed experiments. Results and discussions are introduced in Section 5 while conclusions are finally presented in Section 6.

2 Sleep apnea

It is well known that getting enough sleep is extremely important for maintaining both mental and physical health. However, good sleeping very often becomes affected by the presence of sleep-related breathing disorders. Poor sleep quality causes excessive daytime sleepiness affecting the productivity and efficiency of people, including their ability to think clearly, react quickly and memorize efficiently, triggering bad decisions and highly increasing the risk of having domestic, work and traffic accidents [15].

Polysomnography (PSG) is the reference study for diagnosing OSAH syndrome. This study requires of specially conditioned sleep units as well as the simultaneous recording of several biomedical signals. However the accessibility to PSG is very limited mainly because PSG units are not commonly available and because the studies are both lengthy and costly, making the process of obtaining good quality signals extremely complicated. In addition, a PSG study requires the attention of specialized technicians to ensure continuous time visualization and recording of all the signals being acquired. A complete PSG study consists of the simultaneous measuring of a minimum of seven physiological signals such as electroencephalography (EEG), electrooculography (EOG), electromiography (EMG), electrocardiography (ECG), airflow and . It is important to point out however that the continuous acquisition of these signals highly affects the quality of sleep, making it even more difficult to achieve an accurate diagnosis. Because all those difficulties, new screening approaches are always been developed. An ideal screening method can be considered as one that, on one hand leads to precise results, and on the other hand it uses as few signals as possible without degrading the quality of sleep [7].

For the reasons described above, portable systems for assessing OSAH syndrome, that can be used outside sleep units, have been developed. In this sense other evaluation procedures exist, such as home PSG, home Respiratory Poligraphy (RP) and other simplified procedures, to name a few. Although home PSG has the advantage of not requiring of any trained personnel, it still needs the acquisition of at least seven respiratory and sleep signals, just like a standard PSG. On the other hand, home RP studies allow for the evaluation of cardiorespiratory variables without taking into account EEG, EOG and EMG signals and therefore they are unable to detect wakefulness and to determine sleep stages [16]. Hence, even though home RP is simpler than both standard PSG and home PSG, it still needs the continuous measurement of several physiological signals, whose acquisition affects sleep quality. Finally, simplified procedures make use of only one or two cardiorespiratory variables, such as airflow, respiratory movements, heart rate, tracheal sound and . In particular, the signal has become a reasonable alternative for OSAH syndrome screening and it is the one that will be used in this article [1, 2, 3].

The severity of OSAH syndrome is classified as normal, mild, moderate or severe depending on whether the AHI values fall within the intervals , , , or , respectively. It is known that towards the end of each apnea or hypopnea event, a desaturation of the hemoglobin occurs. It is therefore reasonable to think that these deasaturations contain valuable information related the particular events of apnea and hypopnea, which are very often impossible to be recognized and distinguished by the human eye. The top and middle waveforms in Figure 1 show a six-minutes portion of a typical airflow signal and the corresponding filtered signal, respectively (see Section 4.1) [3]. The labels N (normal breathing), A (apnea) and H (hypopnea) are shown at the bottom. It is important to mention that these labels were introduced by medical experts, after a detailed analysis of all the signals acquired during the PSG study. By observing both the airflow and the signals, it can be seen that the time frame between the reduction (or stopping) of airflow and the beginning of oxygen desaturation levels is very variable. The signal at the middle of Figure 1 shows two gray-highlighted portions on the left, corresponding to the time intervals where desaturations produced by a hypopnea event (left) and an apnea event (right) occur.

[width=]fig_1.pdf NHNANHNHN133501340013450135001355013600Time (sec.)500-5050-5

Figure 1: A small portion of an airflow signal (top), a wavelet filtered signal (middle) and labels of normal breathing and abnormal respiratory events (apnea and hypopnea) that occur during sleeping (bottom). Data obtained from [17].

As it can be observed, the minimum saturation values and the general morphology of the signal on those two intervals are very similar. Hence, it becomes evident that automatic recognition of single apnea and hypopnea events from only signals is a very challenging classification problem. To further visualize the difficulty of this classification problem, a technique for dimensionality reduction called “Sammon Mapping” was applied to low-dimensional samples of signals [18]. Figure 2 shows projections to two-dimensional attributes of signals for the classes N, H and A. It can be observed that the distribution of the different classes in the attributes space highly overlap each other. Although the distributions representing both classes normal breathing and apnea events seems to be fairly separated, the distribution of hypopnea events presents a very high dispersion leading to a great degree of overlap with them.

[width=6.7cm]fig_2.pdf -40-2002040-60-40-2002040NAH

Figure 2: A representation of the class distribution after applying a mapping denoted by Sammon mapping, in its two most relevant attributes obtained from signals (estimated taking into account 200 examples for each class). Data obtained from [17].

3 Dictionary Learning for Sparse Representation

3.1 Basic methods

The representation of signals based on a dictionary consists of finding appropriate linear combinations of atoms in the prescribed dictionary to represent a given set of signals. This representation problem can be divided in two sub-problems: an inference problem and a learning problem. We proceed to describe each one of them. For that, let be an input signal and let (usually ) be a dictionary whose columns , , are atoms that we want to use for representing in the form

. Here, and in the sequel, we shall refer to the vector

as a “representation” of .

The inference problem essentially consists of finding the optimal (in a certain sense) representation of the given signal . A sparse solution of this problem is a representation with just a few non-zero components. If in a given representation a certain coefficient is non-zero, then we shall refer to it as an “active” component.

A way of obtaining a sparse representation of the signal based on the dictionary consists of solving the following problem:

where denotes the pseudo-norm, defined as the number of non-zero elements of .

Solving is generally an NP hard problem yielding this approach highly unsuitable for most applications [19, §1.8]. This is so because in we are imposing an exact representation which, in most practical cases, is neither strictly necessary nor desired. To overcome the computational burden which entails solving problem , several relaxed versions of it have been considered. One of them consists of allowing a small representation error while imposing an upper bound on the pseudo-norm, i.e. solve:

where is a prescribed integer parameter. Several approaches for solving problem were proposed [20, 21, 22]. The one most widely used is Orthogonal Matching Pursuit (OMP) which consists of approximating the solution in a greedy way providing a good trade-off between computational cost and representation error [23]. Additionally, the method ensures convergence to the projection of into the span of the dictionary atoms [22].

The dictionary can be constructed either using a pre-specified group of atoms (such as those obtained through the Wavelet Packet decomposition) or by means of data-driven learning approaches. The dictionary learning problem associated to the data , , , and a collection of signals in , , can be formally written as:

A solution of this problem yields on one hand a dictionary and, on the other hand, representations for all the signals

(in terms of such a dictionary) complying with the imposed sparsity constraint. Although several methods for solving (DL) exist, the most widely used is an iterative algorithm called K Singular Value Decomposition (KSVD)

[24]. This approach consists of two steps: an inference step and a dictionary learning step. The OMP algorithm (for example) is used for obtaining the representation coefficients, which is then followed by a dictionary learning step where the atoms are updated one-at-a-time and the representation coefficients are adjusted in order to minimize the total representation error.

3.2 Discriminant dictionaries

As mention above, a dictionary

can be constructed using data-driven learning methods aimed exclusively to minimize the total representation error. However, a dictionary learned in this way quite often produces representations of signals which turn out to be unsatisfactory if the final objective is pattern recognition. This is so because, as it is well known, a good representation does not necessarily guarantee good classification performance. A way to overcome this flaw consists of incorporating available prior information about class membership of the signals into the objective function in (DL)

[25, 26]. In [25]

, for example, a discriminant version of the standard KSVD method applied to face recognition was presented. In that work, the authors included a discriminant term into the objective function of the standard KSVD algorithm. Results have shown that such a modification constitutes an appropriate way to learn dictionaries simultaneously complying with both desired properties: low reconstruction error and high recognition rates. In

[26], a sparse-constrained optimization problem combining the objective function of the classification and the representation error of both labeled and unlabeled data, was formulated.

With the objective of improving classification performance, new approaches based on the design of structured dictionaries were recently proposed [27, 28, 29, 30]. A structured dictionary can be thought of as a collection of class-specific sub-dictionaries which are designed to capture discriminant properties of each class as well as common features among all classes in the data. In this direction, an initial approach consists of learning one dictionary for each class, then classify by minimizing the representation error among all classes [31]. Recently, a method called “Most Discriminative Columns Selection” (MDCS), which was shown to be capable of efficiently building structured dictionaries in a binary classification scheme, was developed [3]. Figure 3 shows a schematic representation of the MDCS procedure for a three-class classification problem. In this case the classes are identified as N, A and H. The dictionary is learned in an unsupervised way using all training signals for solving problem (DL). After that, the representation matrices , and whose columns are the corresponding representation vectors, are computed using the three separate sets of labeled signals , and , respectively. Next, the atoms of are ranked according to a prescribed measure of discriminability in terms of their role in the sparse representation of the signals for each class (see Section 3.3). Following this ranking procedure, and given a prescribed positive integer (more on this later), the best atoms for each class are selected and used for building new class-specific sub-dictionaries , and for classes N, A and H, respectively. The structured dictionary, which we denote by , is finally constructed by stacking side-by-side all sub-dictionaries, i.e. . The parameter is used to restrict the size of the final dictionary, in the sense that will end up having exactly columns, where is the number of classes. This restriction intends to improve the generalization capabilities reducing the size of the final feature vectors, what in turn, reduces the computing time required for classification.

Along MDCS, a method for discriminant features selection called “Most Discriminative Atoms Selection” (MDAS) was proposed

[3]. The main difference between both MDCS and MDAS is that in the later no new structured dictionary is built. Instead the original dictionary is preserved and the ranking of the atoms is used only to select the components to be used for classification. It is important to point out that although both MDCS and MDAS were originally proposed for dealing only with binary classification problems, their extension to multi-class problems is straight forward. In what follows, we shall denote by MDCS-BC, MDCS-MC, MDAS-BC and MDAS-MC the binary and multiclass versions of MDCS and MDAS, respectively.

[width=14cm]fig_3.pdf measure of discriminability

Figure 3: A schematic representation of the learning process of discriminant structured dictionaries using the MDCS method.

Following on the idea behind MDCS, an iterative extension of it naturally emerges. In this sense, a new method called “Discriminant Atom Selection KSVD” (DAS-KSVD) was recently proposed [14]. This method is suitable for multi-class classification problems and it can be thought of as a generalization of MDCS. The main difference with MDCS is that, instead of selecting all class-specific atoms in a unique step, DAS-KSVD chooses only one discriminant atom for each one of the classes at each step. Additionally, DAS-KSVD incorporates a re-sampling technique which promotes diversity in the generation of the discriminant atoms. This re-sampling process requires of a prescribed parameter ,

, for adjusting the sampling probability of all training signals. It is important to mention that all sampled signals are degraded by incorporating additive noise of magnitude proportional to

, where is the number of iterations and is another prescribed parameter. For more details about these re-sampling and signal degradation procedures, we refer the reader to [14]. The steps for constructing the dictionary with DAS-KSVD are summarized in Algorithm 1 below.

1:procedure das-ksvd()
2:     , for all
3:     for  do
4:          SampleData()
5:          Ksvd()
6:          OMP()
7:          DiscMeasure()
8:          GetAtoms()
9:          SaveAtoms()
10:     end for
11:     return
12:end procedure
Algorithm 1 DAS-KSVD method

Figure 4 shows a schematic representation of one iteration of DAS-KSVD for a three-class classification problem. Observe that before using a method for solving (DL), a re-sampling technique is applied. Then, the dictionary is learned in an unsupervised way using all learning signals .

[width=14cm]fig_4.pdf measure of discriminability

Figure 4: A schematic representation of one iteration of the learning process of discriminant structured dictionaries using the DAS-KSVD method.

After that, the representation matrices , and whose columns are the corresponding representation vectors, are computed using the three separate sets of learning signals , and , respectively. Next, the atoms of are ranked according to an appropriately defined multi-class measure of discriminability (details about this measure are presented in Section 3.3). After this ranking procedure, only one atom for each class is selected and used for building new class-specific sub-dictionaries , and for classes N, A and H, respectively. The structured dictionary, which is denoted by , is finally constructed by stacking side-by-side all sub-dictionaries, i.e. .

3.3 Discriminant criteria

As previously mentioned, both MDCS and DAS-KSVD require of measures for quantifying the discriminant capabilities of each one of the dictionary atoms. The detection of atoms containing useful discriminant information can be addressed in different ways. Among all existing alternatives, the most commonly used strategy consists of performing comparisons between conditional probability distributions

[32, 33]. In what follows, we proceed to describe two different criteria that shall be used in this article.

Based on the idea that the discriminant atoms of a dictionary are those more frequently used for representing signals belonging to a particular class, a measure called “Discriminative Conditional Activation Frequency” (DCAF) was proposed [3, 33]. This measure was shown to be capable of efficiently quantifying the discriminability of the atoms in the context of a binary classification problem. The approach essentially consists of using the conditional activation probability of the atom given the class . This conditional activation probability, which is defined as , can be approximated efficiently by the quotient , where and are the conditional activation frequency (number of times that the atom becomes active for representing class signals) and the number of class signals, respectively. To quantify the discriminant capability of an atom , the absolute value of the difference of its conditional activation probabilities for classes and is computed. This value will be close to one if (an only if) the atom becomes much more active for one class only and, in that case, it can be thought of as a quantifier of the capability of to provide important information for signal classification. In addition, observe that DCAF is symmetric, its value is always non-negative and is inexpensive in terms of computing time. Finally, if the classes are balanced, DCAF can be computed just by counting the number of times that each atom becomes active without dividing by the number of class-specific signals.

With the objective of extending DCAF to a more general framework, a new strategy to detect discriminant atoms in the context of multi-class classification problems was recently proposed [14]. The approach consists of defining and using a new multi-objective function aimed at quantifying the discriminant properties of each one of the atoms in a given dictionary. This function is defined as a convex combination of three discriminant terms, all based on the affine sparse representations of the data. In what follows, we proceed to describe each one of such terms.

For a given , , we denote by the class that maximizes all conditional activation probabilities , for all . If there is more than one value of maximizing , is defined by randomly choosing one of them, for instance the smallest one (note that the order of the classes is completely irrelevant). Similarly, for a fixed , , is defined as the class leading to the second largest conditional activation probability. Here again if there is more than one value of satisfying that condition, then is randomly chosen among them.

Next, the function , known as the “activation frequency” measure, is defined by


Note that . The atom is said to be discriminant (for class ) if and only if . Clearly, within this setting, if an atom is discriminant, it will be so only for the class . Moreover, the value of can be thought of as a “measure” of the degree of discriminability of the atom (for the corresponding class ), based solely on the conditional activation frequency information.

The conditional activation probability of the atoms can be graphically illustrated by taking into account sparse representations of signals coming from different classes in terms of a given dictionary. For that, we shall consider a signal matrix , which comprises four different class-labeled signals, i.e. . Moreover, let be the matrix which provides a sparse representation of in terms of the dictionary , through . Figure 5 shows a graphic representation of sparse representations , for , of all signal matrices in terms of a dictionary . In particular, both and its corresponding active components (the active elements that are part of the -row of ) have been highlighted in orange.


Figure 5: An illustration of the atoms activations by taking into account signals coming from 4 different classes.

It can be observed that becomes more frequently active for signals of class than for all the others. Also, since is always used to represent class signals, the conditional activation probability of given the class is maximum, i.e. and therefore . On the other hand, it is easy to determine that is the class leading to the second largest conditional activation probability and, in this case, and therefore . Hence, according to the activation frequency measure, . Additionally, Figure 6 shows a bar plot representing each one of the conditional activation probability values of the atom .

[width=5cm]fig_6.pdf 123410.

Figure 6: A schematic representation of conditional activation probabilities of a certain atom for each one of the classes.

Besides providing useful information regarding the activation of the atoms, the sparse representation of signals is capable of efficiently highlighting intrinsic properties and relevant class-related features of the data. With this observation in mind, a second criterion that takes into account the magnitude of the representation coefficients is presented. For that, given an atom , let and be as before, and let and be matrices providing sparse representations of and , respectively, in terms of the dictionary , i.e. and . Additionally, let denote the quotient , where represents the -row of the matrix . Then, the “coefficient magnitude” measure is the function defined by


Here again . Based on this measure, an atom is said to be discriminant (for the class ) if and only if and, in that case, the value of quantifies the ability of to discriminate class data, according to this criterion.

We now proceed to describe a third criterion for quantifying the discriminant degree of the atoms. This criterion takes into account the contribution of each atom , , to the total representation error. Let be a matrix providing a sparse representation of . The total representation error for all class signals when is removed can then be written as (for more details we refer the reader to [24]). A large value of indicates that is a highly discriminant atom. Then, the “representation error” measure is defined by


where , for , . It is clear that , and an atom is said to be discriminant (for class ) with respect to this criterion if and only if .

Each one of the three previously mentioned criteria quantifies the discriminant properties of an atom from three different perspectives. It is then reasonable to think of a criterion that properly combines all three of them. With that in mind, given two positive parameters and , with , the combined discriminant measure is defined as


Clearly, as and vary between and , (4) exhausts all possible convex combinations of the three single measures , and . A challenging problem that immediately arises is to find the “optimal” pair of parameters leading to the best recognition rate. However, up to our knowledge, no analytical method exist for finding . For this reason, in this article a discrete search for such a pair of parameters in the plane is performed.

4 Experimental setup

The main objective of this article is the comparison of the overall classification performances in the context of OSAH syndrome screening of MDCS, MDAS (both in their binary and multiclass versions) and DAS-KSVD. To achieve that objective, two experiments were carried out. The first one was designed with the final goal of classifying the segments of signals in one and only one of the three classes: normal breathing (N), apnea (A) or hypopnea (H). The second experiment was designed to detect the existence or non-existence of the pathology. The whole experimental setup is described below.

4.1 Database and signal pre-processing

The Sleep Heart Health Study (SHHS) database was originally designed to explore possible correlations between sleep related breathing disorders and cardiovascular diseases [17, 34]. This database consists of several complete PSG studies, each one of them containing a group of physiological signals such as EEG, ECG, nasal airflow and . In addition, annotations of sleep stages, arousals and events of apnea and hypopnea are provided. The criteria that medical experts adopted for identifying apnea and hypopnea events were the following [5]. An apnea event is a complete (or almost complete) blockage of the upper airflow for at least ten seconds, usually associated with a desaturation in the signal or an arousal. A hypopnea event is a reduction in airflow by less than a 70% of the baseline level, associated with a desaturation in the signal or an arousal.

In this article we make use of the first online version of the database called “Sleep Heart Health Study” (SHHS-2)111 This database contains 995 complete PSG studies, 41 of which were discarded due to labeling flaws [3]. Among the remaining 954 studies, a set of 667 (70%) were randomly selected for training purposes. The remaining 287 (30%) were left out for the final test.

Mainly due to patient movements, baseline wander and undesired disconnections (among many other factors), the original raw

signals require of an appropriate pre-conditioning process. For that, linear interpolation and wavelet filters, as those used in a previous work

[3], were applied. Figure 1 shows a small portion of a signal (top) and its wavelet-filtered version (middle).

Signals are segmented into vectors of length (corresponding to 128 seconds of the signal recording) with a 75% overlapping between two consecutive segments. In this process, segments containing artifacts or disconnections are discarded. Then, a matrix is constructed by stacking side-by-side , and vectors belonging to the classes N, A and H, respectively. Clearly, . Similarly, another matrix is built using the vectors associated to the testing set.

4.2 Dictionary learning settings

For DAS-KSVD, all experiments were performed setting (i.e. 20 iterations). Thus, the final structured dictionary consists of 60 atoms (assuming ). For each one of the classes used to learn the full dictionary (by means of KSVD), the number of samples was set to . Also, several trials were performed in order to obtain adequate values for both parameters and . In particular, it was found that values of and are suitable for this application. In addition, resulted in the best trade-off between signal degradation and the number of iterations. Finally the optimal pair of parameters was found to be . All parameters of the KSVD method such as the sparsity constrain and the redundancy factor of the dictionary , were set equal to those used in a previous work [14]. Finally, for both MDCS-MC and MDAS-MC, all parameters were set as for DAS-KSVD. It is important to mention, however, that these two methods make use of a different input data matrix which is composed of a balanced set of randomly selected segments from . Since segments were chosen for each class, the final size of was where is the number of segments chosen from each class.

4.3 Classification

In order to classify segments of

signals into the three different classes, a feed-forward Multilayer Perceptron (MLP) neural network was used. In particular the experiments were run using three layers (input, hidden and output). Naturally, input and output layer sizes were set to 60 and 3 corresponding to


, respectively. The hidden layer consisted of 500 neurons with a

tansigactivation function. To train this network, conjugate gradient descent was used. For classification purposes, the cost function was chosen as the mean squared error (MSE).

To carry out the first experiment, two balanced sets of 21000 and 4500 samples were randomly selected from and used for training and validation purposes, respectively. Also, an additional balanced set of 4500 samples was randomly chosen from and used for testing purposes. Then, sparse representations of these new datasets in terms of the previously learned dictionary were found and used as input of the classifier.

4.4 Detection of OSAH syndrome

In a typical PSG study, the recorded signals are provided to medical experts who identify and label apnea and hypopnea events, which are later used for computing the AHI index. In a similar way, in our analysis, each testing study was appropriately filtered and segmented in order to classify its segments as N, A and H, by means of the previously described process. Then, an estimated AHI () was computed by counting the total number of segments classified as A or H and dividing it by the duration of the study, in hours. This new index was used for OSAH syndrome detection. Finally, each study was considered as pathological if the obtained was greater than a certain prescribed detection threshold [33].

4.5 Performance measures

To analyze and quantify the ability of the MLP to classify segments of

signals in a multiclass scenario, a confusion matrix was constructed. The confusion matrix is a very useful tool for reporting results in multiclass classification problems because it gives a full overview of all relations between the classifier predictions and the known (true) labels. Rows and columns of such a matrix refer to known and predicted class labels of the dataset, respectively, while its diagonal and off-diagonal elements correspond to observations that are correctly and incorrectly classified, respectively. This information summarizes the types of errors that occur during training, validation and testing. Based on the confusion matrix, the overall accuracy as well as other three widely used class-specific measures (sensitivity (Se), specificity (Sp) and precision (Pr)) were extracted. In this article, the confusion matrix is normalized by dividing each one of the elements in its rows by the total number of testing samples that belong to each class.

To assess the ability of the proposed system in detecting patients suspected of suffering from moderate to severe OSAH syndrome, i.e. persons having an AHI index greater than 15, a Receiver Operating Characteristics (ROC) analysis was performed [35]. The optimal cut-off point (associated to a prescribed detection threshold) of the ROC curve is the one that simultaneously maximizes sensitivity and specificity. Also, the accuracy (Acc) and the area under the ROC curve (AUC) were computed.

5 Results and discussions

This section presents a qualitative description of the atoms learned by DAS-KSVD as well as the findings achieved through the experiments described above: classification of segments and detection of OSAH syndrome.

5.1 A qualitative analysis

DAS-KSVD was used to learn a structured dictionary for signals using the procedures and parameters described in Section 4. A structured dictionary of size was obtained. Figure 7 shows the waveforms of some representative atoms corresponding to each one of the discriminant dictionaries (upper), (middle) and (bottom).


Figure 7: Typical atoms corresponding to (top), (middle) and (bottom).

A detailed analysis shows that each one of the dictionaries is built with atoms that capture particular types of class-related information. For instance, as it can be seen, most atoms in present quite regular waveforms associated to inhalation-exhalation related changes in the oxygen saturation. On the other hand, atoms of , representing apnea events, present noticeable low frequency fluctuations. In pulse oximetry this is a typical behavior associated to the absence of respiratory airflow for a relatively long period of time. Finally, atoms of , associated to hypopnea events, reflect abnormal breathing through irregular patterns in pulse oximetry. It is timely to mention that all atoms of the dictionary are normalized so that their -norm is equal to unity.

5.2 Classification

Features generated by DAS-KSVD were used to assess the ability of the MLP in classifying segments of signals. Table 1 shows the confusion matrix constructed using all testing samples (left) and a summary of all class-specific performance measures extracted from such a matrix (right). The elements in the diagonal of Table 1 (left) represent the normalized true positive rates. As it can be seen, the algorithm achieved true positive rates of 86.09%, 63.20% and 23.36% for the classes N, A and H, respectively, resulting in an overall accuracy of 57.55%. Note that if we were to limit our analysis only to the classes N and A (i.e. without tacking into account the third row and the third column of the confusion matrix), then the inter-class confusions would be relatively small. From the analysis of all these results several remarks can be drawn. First, DAS-KSVD constitutes a reasonable approach for classifying normal (breathing) and apnea events in pulse oximetry. Second, the results fall short of being good for detecting hypopnea events. In fact, more than half of them are misclassified as belonging to class N and more than one fourth are misclassified as belonging to class A. This last remark, however, is consistent with the results obtained using the Sammon mapping (see Section 2 and Figure 2

) where we saw that the projections of class A and N segments into the first two most important attributes of the mapping are clearly well separated, while the projections of class H segments overlap the other two classes and present a wide variance.



N 8 6.09 4 .26 9 .65
A 2 1.33 6 3.20 1 5.46
H 5 0.74 2 5.90 2 3.36
Class Se (%) Sp (%) Pr (%)
N 86.09 64.17 55.65
A 63.20 85.24 68.15
H 23.36 87.49 47.19
Table 1: Normalized multiclass confusion matrix obtained using DAS-KSVD for segment classification (left) and the corresponding performance measures (right).

In order to gain insight into the reasons why DAS-KSVD outperforms all other evaluated approaches for OSAH syndrome detection (see next section), we compared its performance with that of MDCS-BC in classifying segments of signals as containing an event or not (i.e. without tacking into account whether it is an apnea or a hypopnea). It is important to mention that MDCS-BC was chosen because it achieved the best performance among all previously developed methods. In order to analyze the performance of DAS-KSVD in the binary classification problem, we unified labels of segments belonging to the classes A and H which led to a new (and unique) class denoted by A+H. Table 2 shows a summary of the performance of DAS-KSVD and MDCS-BC using all testing samples. It is important to point out that, in this case, the target class is A+H. As it can be observed, although both methods yielded similar sensibility percentages, DAS-KSVD reached a significantly better specificity and precision percentages than MDCS-BC. In other words, DAS-KSVD has become more specific having fewer false positives than the other one. This clearly indicates that in the classification process, segments that were misclassified as N, are now correctly classified as H.

Method Se (%) Sp (%) Pr (%)
DAS-KSVD 64.42 86.09 89.83
MDCS-BC 64.15 78.99 71.23
Table 2: Performance measures for A+H event detection from segments of signals using DAS-KSVD and MDCS-BC.

5.3 Detection of OSAH syndrome

In this article, besides analyzing the ability of DAS-KSVD to classify segments of signals into the classes N, A and H, we make use of these predictions to detect the presence of the pathology (according to a prescribed AHI diagnostic threshold). In that sense, a comparison between DAS-KSVD with many other state-of-the-art methods in the diagnosis of moderate to severe OSAH syndrome () was performed. Table 3 shows a comparative summary of the results achieved by DAS-KSVD, MDCS-BC, MDCS-MC, MDAS-BC and MDAS-MC, and by the approaches introduced by Chiner et al. [12], Vázquez et al. [13] and Schlotthauer et al. [2]. It is important to point out that all results presented here were obtained using the same data partitions.

Method AUC Se(%) Sp(%) Acc(%)
DAS-KSVD 0.957 87.56 88.32 87.94
MDCS-MC 0.942 86.13 86.62 86.37
MDAS-MC 0.913 82.32 82.78 82.55
MDCS-BC [3] 0.937 85.65 85.92 85.78
MDAS-BC [3] 0.891 81.02 83.10 82.06
Schlotthauer et al. [2] 0.922 84.11 85.94 85.02
Vázquez et al. [13] 0.909 80.84 87.50 84.17
Chiner et al. [12] 0.795 76.17 78.12 77.15
Table 3: Performance measures for moderate to severe OSAH syndrome detection using different methods.

As can be observed in Table 3, DAS-KSVD outperforms all other methods in both binary and multiclass versions. The application of DAS-KSVD resulted in an AUC value of 0.957 and a sensitivity, specificity and accuracy of 87.56%, 88.32% and 87.94%, respectively. The method leading to the second largest performances is the multiclass version of MDCS (MDCS-MC). This method achieved an AUC value of 0.942 and a sensitivity, specificity and accuracy of 86.13%, 86.62% and 86.37%, respectively. In addition, if we compare the results yielded by DAS-KSVD with those obtained by MDCS-MC, then it can be concluded that DAS-KSVD significantly enhances the overall performance achieved by MDCS-MC (assuming a -value of 0.05).

It is important to note that, in most cases, multiclass-based methods outperform binary-based ones in the detection of the pathology. In fact, the application of MDCS-MC and MDAS-MC resulted in better performances than the ones yielded by their respective binary versions. For instance, MDAS-MC obtained an AUC value of 0.913 representing an improvement of 2.2% regarding MDAS-BC, which achieved an AUC value of 0.891. Similarly, MDCS-MC yielded an improvement of 0.5% with respect to MDCS-BC. On the other hand, it is appropriate to mention that although MDAS-MC shows improvements regarding MDAS-BC, its overall performance remains still below that of MDCS-BC and Schlotthauer et al.

A more comprehensive analysis of Table 3 indicates that, although most discriminant methods achieved good results, DAS-KSVD outperforms all the others. The application of this method results in an area under the ROC curve of 0.957 as well as sensitivity and specificity of 87.56% and 88.32%, respectively. According to the original labels and taking into account a detection threshold of 15, the whole testing partition (287 studies) contains 216 and 71 studies diagnosed as pathological and normal (or healthy) patients, respectively. A 87.56% sensitivity indicates that of the 216 pathological cases, 189 were correctly detected (true positive) while 27 were false positive. On the other hand, an 88.32% specificity indicates that of the 71 healthy cases, 62 were appropriately identified (true negative) while only 9 were false negative. It is timely to note that for the 9 cases that DAS-KSVD yielded an AHI higher than 15, most events identified by the medical expert were hypopneas and most of them were not associated with noticeable desaturations in the signal. This fact indicates that the final scoring process was carried out following the AASM criteria. Hence, this issue may be one of the causes that led to the misclassification of hypopneas, since its distribution highly overlaps with the one corresponding to class N segments. Finally, if we look at the signal, there are a lot of cases where it becomes difficult to distinguish between class H and N segments.

6 Conclusions

In this article, with the objective of OSAH syndrome screening, we applied a previously developed method called DAS-KSVD to classify segments of signals into normal breathing and abnormal respiratory events in a multiclass scenario. It was found that the combined discriminant measure, which is used by DAS-KSVD in the process of building the structured dictionary, is capable of efficiently selecting the most discriminant atoms for each one of the classes. In addition, DAS-KSVD yielded a structured dictionary composed by three sub-dictionaries each one associated to a particular class. We evaluated DAS-KSVD in two different but related applications, namely, classification of abnormal respiratory events and detection of moderate to severe OSAH syndrome. Although it is a very challenging task, the proposed method has demonstrated to be efficient for automatically discriminating between apnea and hypopnea events in a multiclass scheme. To detect the presence or absence of events, DAS-KSVD resulted more specific than the most competitive binary-based approach (MDCS-BC). This improvement is due to the ability of DAS-KSVD in separating between (apnea or hypopnea) events and normal breathing. In a similar way, the application of DAS-KSVD led to the best reported performance in OSAH syndrome screening using a well known and publicly available database. This fact constitutes a strong evidence that our approach could be helpful in the development of new intelligent technologies for portable OSAH syndrome screening devices.


The authors would like to acknowledge the financial support of Consejo Nacional de Investigaciones Científicas y Técnicas, CONICET, of the Air Force Office of Scientific Research, AFOSR /SOARD, through Grant FA9550-14-1-0130, of the Universidad Nacional del Litoral through projects CAI+D 50120110100519 and CAI+D 5012011010 0525 and of the Universidad Tecnológica Nacional through projects TEUTIPA0004711TC and ICUTIPA0007803TC.

7 References


  • [1] Azadeh Yadollahi, Eleni Giannouli, and Zahra Moussavi. Sleep apnea monitoring and diagnosis based on pulse oximetry and tracheal sound signals. Medical & Biological Engineering & Computing, 48(11):1087–1097, 2010.
  • [2] Gastón Schlotthauer, Leandro E. Di Persia, Luis D. Larrateguy, and Diego H. Milone. Screening of obstructive sleep apnea with empirical mode decomposition of pulse oximetry. Medical Engineering & Physics, 36(8):1074–1080, August 2014.
  • [3] R.E. Rolón, L.D. Larrateguy, L.E. Di Persia, R.D. Spies, and H.L. Rufiner. Discriminative methods based on sparse representations of pulse oximetry signals for sleep apnea–hypopnea detection. Biomedical Signal Processing and Control, 33:358–367, 2017.
  • [4] J. Hedner, L. Grote, M. Bonsignore, W. McNicholas, P. Lavie, G. Parati, P. Sliwinski, F. Barbé, W. De Backer, P. Escourrou, I. Fietze, J.A. Kvamme, C. Lombardi, O. Marrone, J.F. Masa, J.M. Montserrat, T. Penzel, M. Pretl, R. Riha, D. Rodenstein, T. Saaresranta, R. Schulz, R. Tkacova, G. Varoneckas, A. Vitols, H. Vrints, and J. Zielinski. The european sleep apnoea database (esada): report from 22 european sleep laboratories. European Respiratory Journal, 38(3):635–642, 2011.
  • [5] Richard B Berry, Rohit Budhiraja, Daniel J Gottlieb, David Gozal, Conrad Iber, Vishesh K Kapur, Carole L Marcus, Reena Mehra, Sairam Parthasarathy, Stuart F Quan, et al. Rules for scoring respiratory events in sleep: update of the 2007 AASM manual for the scoring of sleep and associated events. Journal of clinical sleep medicine, 8(05):597–619, 2012.
  • [6] Sulaiman Khadadah, Philippe Lachapelle, Sushmita Pamidi, Allen E Olha, Andrea Benedetti, and RJ Kimoff. Does Scoring Of Autonomic Hypopneas Improve Clinical Decision Making In Obstructive Sleep Apnea?, pages A2606–A2606. American Thoracic Society, 2017.
  • [7] Charlene Gamaldo, Luis Buenaver, Oleg Chernyshev, Stephen Derose, Reena Mehra, Kimberly Vana, Harneet K Walia, Vanessa Gonzalez, and Indira Gurubhagavatula. Evaluation of clinical tools to screen and assess for obstructive sleep apnea. Journal of Clinical Sleep Medicine, 14(07):1239–1244, 2018.
  • [8] MB Uddin, CM Chow, and SW Su. Classification methods to detect sleep apnea in adults based on respiratory and oximetry signals: a systematic review. Physiological measurement, 39(3):03TR01, 2018.
  • [9] Félix del Campo, Andrea Crespo, Ana Cerezo-Hernández, Gonzalo C Gutiérrez-Tobal, Roberto Hornero, and Daniel Álvarez. Oximetry use in obstructive sleep apnea. Expert review of respiratory medicine, 12(8):665–681, 2018.
  • [10] Fábio Mendonça, Sheikh Shanawaz Mostafa, Antonio G Ravelo-García, Fernando Morgado-Dias, and Thomas Penzel. A review of obstructive sleep apnea detection approaches. IEEE journal of biomedical and health informatics, 23(2):825–837, 2019.
  • [11] Philip I Terrill. A review of approaches for analysing obstructive sleep apnoea-related patterns in pulse oximetry data. Respirology, 2019.
  • [12] E. Chiner, J. Signes-Costa, J. M. Arriero, J. Marco, I. Fuentes, and A. Sergado. Nocturnal oximetry for the diagnosis of the sleep apnoea hypopnoea syndrome: a method to reduce the number of polysomnographies? Thorax, 54(11):968–971, November 1999.
  • [13] Juan-Carlos Vázquez, Willis H. Tsai, W. Ward Flemons, Akira Masuda, Rollin Brant, Eric Hajduk, William A. Whitelaw, and John E. Remmers. Automated analysis of digital oximetry in the diagnosis of obstructive sleep apnoea. Thorax, 55(4):302–307, April 2000.
  • [14] R. E. Rolón, L. E. Di Persia, R. D. Spies, and H. L. Rufiner. A multi-class structured dictionary learning method using discriminant atom selection. CoRR, abs/1812.01389, 2018.
  • [15] Goran Medic, Micheline Wille, and Michiel EH Hemels. Short-and long-term health consequences of sleep disruption. Nature and science of sleep, 9:151–161, 2017.
  • [16] Emilio García-Díaz, Esther Quintana-Gallego, Aránzazu Ruiz, Carmen Carmona-Bernal, Angeles Sánchez-Armengol, Georgina Botebol-Benhamou, and Francisco Capote. Respiratory polygraphy with actigraphy in the diagnosis of sleep apnea-hypopnea syndrome. Chest, 131(3):725–732, March 2007.
  • [17] S. F. Quan, B. V. Howard, C. Iber, J. P. Kiley, F. J. Nieto, G. T. O’Connor, D. M. Rapoport, S. Redline, J. Robbins, J. M. Samet, and P. W. Wahl. The Sleep Heart Health Study: design, rationale, and methods. Sleep, 20(12):1077–1085, 1997.
  • [18] Laurens Van Der Maaten, Eric Postma, and Jaap Van den Herik. Dimensionality reduction: a comparative. J Mach Learn Res, 10(66-71):13, 2009.
  • [19] Michael Elad. Sparse and Redundant Representations. Springer-Verlag New York, 2010.
  • [20] Scott Shaobing Chen, David L Donoho, and Michael A Saunders. Atomic decomposition by basis pursuit. SIAM review, 43(1):129–159, 2001.
  • [21] S. G. Mallat and Zhifeng Zhang. Matching pursuits with time-frequency dictionaries. IEEE Transactions on Signal Processing, 41(12):3397–3415, 1993.
  • [22] J.A. Tropp and A.C. Gilbert. Signal Recovery From Random Measurements Via Orthogonal Matching Pursuit. IEEE Transactions on Information Theory, 53(12):4655–4666, December 2007.
  • [23] Sujit Kumar Sahoo and Anamitra Makur. Signal recovery from random measurements via extended orthogonal matching pursuit. IEEE Transactions on Signal Processing, 63(10):2572–2581, 2015.
  • [24] M. Aharon, M. Elad, and A. Bruckstein. KSVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation. IEEE Transactions on Signal Processing, 54(11):4311–4322, November 2006.
  • [25] Q. Zhang and B. Li. Discriminative K-SVD for dictionary learning in face recognition. In

    2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition

    , pages 2691–2698, June 2010.
  • [26] D. S. Pham and S. Venkatesh. Joint learning and dictionary construction for pattern recognition. In 2008 IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, June 2008.
  • [27] Zhuolin Jiang, Zhe Lin, and L.S. Davis. Label Consistent K-SVD: Learning a Discriminative Dictionary for Recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 35(11):2651–2664, November 2013.
  • [28] N. Rao and F. Porikli. A clustering approach to optimize online dictionary learning. In 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 1293–1296, 2012.
  • [29] X. Chen, J. Li, D. Zou, and Q. Zhao. Learn Sparse Dictionaries for Edit Propagation. IEEE Transactions on Image Processing, 25(4):1688–1698, 2016.
  • [30] Zivar Ataee and Hadis Mohseni. Structured dictionary learning using mixed-norms and group-sparsity constraint. The Visual Computer, pages 1–14, 2019.
  • [31] J. Wright, A. Y. Yang, A. Ganesh, S. S. Sastry, and Y. Ma. Robust Face Recognition via Sparse Representation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(2):210–227, February 2009.
  • [32] Jianhua Lin. Divergence measures based on the shannon entropy. IEEE Transactions on Information theory, 37(1):145–151, 1991.
  • [33] RE Rolón, Iván E Gareis, Leandro E Di Persia, Ruben D Spies, and Hugo Leonardo Rufiner. Complexity-based discrepancy measures applied to detection of apnea-hypopnea events. Complexity, 2018, 2018.
  • [34] Bonnie K. Lind, James L. Goodwin, Joel G. Hill, Tauqeer Ali, Susan Redline, and Stuart F. Quan. Recruitment of healthy adults into a study of overnight sleep monitoring in the home: experience of the Sleep Heart Health Study. Sleep and Breathing, 7(1):13–24, 2003.
  • [35] J. A. Swets. ROC analysis applied to the evaluation of medical imaging techniques. Investigative Radiology, 14(2):109–121, 1979.