RespNet: A deep learning model for extraction of respiration from photoplethysmogram

Respiratory ailments afflict a wide range of people and manifests itself through conditions like asthma and sleep apnea. Continuous monitoring of chronic respiratory ailments is seldom used outside the intensive care ward due to the large size and cost of the monitoring system. While Electrocardiogram (ECG) based respiration extraction is a validated approach, its adoption is limited by access to a suitable continuous ECG monitor. Recently, due to the widespread adoption of wearable smartwatches with in-built Photoplethysmogram (PPG) sensor, it is being considered as a viable candidate for continuous and unobtrusive respiration monitoring. Research in this domain, however, has been predominantly focussed on estimating respiration rate from PPG. In this work, a novel end-to-end deep learning network called RespNet is proposed to perform the task of extracting the respiration signal from a given input PPG as opposed to extracting respiration rate. The proposed network was trained and tested on two different datasets utilizing different modalities of reference respiration signal recordings. Also, the similarity and performance of the proposed network against two conventional signal processing approaches for extracting respiration signal were studied. The proposed method was tested on two independent datasets with a Mean Squared Error of 0.262 and 0.145. The Cross-Correlation coefficient of the respective datasets were found to be 0.933 and 0.931. The reported errors and similarity was found to be better than conventional approaches. The proposed approach would aid clinicians to provide comprehensive evaluation of sleep-related respiratory conditions and chronic respiratory ailments while being comfortable and inexpensive for the patient.



There are no comments yet.


page 1


Recognition of Patient Groups with Sleep Related Disorders using Bio-signal Processing and Deep Learning

Accurately diagnosing sleep disorders is essential for clinical assessme...

Deep Network for Capacitive ECG Denoising

Continuous monitoring of cardiac health under free living condition is c...

A Novel Non-Invasive Estimation of Respiration Rate from Photoplethysmograph Signal Using Machine Learning Model

Respiratory ailments such as asthma, chronic obstructive pulmonary disea...

End-to-end Sleep Staging with Raw Single Channel EEG using Deep Residual ConvNets

Humans approximately spend a third of their life sleeping, which makes m...

Inter-Beat Interval Estimation with Tiramisu Model: A Novel Approach with Reduced Error

Inter-beat interval (IBI) measurement enables estimation of heart-rate v...

Attenuating Random Noise in Seismic Data by a Deep Learning Approach

In the geophysical field, seismic noise attenuation has been considered ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.

I Introduction

Measurement and monitoring respiration of an individual is useful in a plethora of clinical conditions such as pulmonary diseases and sleep-related respiratory ailments. An abnormal respiration pattern is a clinically relevant event for identifying patient deterioration under intensive care [1]. Dysfunctional breathing refer to a group of breathing disorders that has been associated with a wide range of chronic respiratory ailments such as asthma and chronic obstructive pulmonary disease (COPD) [2]. Sleep apneas not only cause significant disruptions to sleep but also increase the risk factor for various cardiac diseases such as hypertrophy and heart failure [3]. Traditional measurement of respiration is carried out through the use of either a spirometer (airflow), pneumography (chest movement), or abdominal electromyography (diaphragm muscle activity). These measurement modalities are cumbersome and expensive for widespread adoption in a general ward or home setting. This necessitates the use of unobtrusive sensors for obtaining respiratory information from patients in free-living conditions. Indirect measurement of respiration can be obtained from either the ECG or PPG. The amplitude, baseline and frequency modulation of the ECG and PPG signals due to respiration is well documented in literature [4]. Many consumer-grade activity monitors and smartwatches, in particular, allow for inexpensive and ambulatory monitoring of PPG. Research in the domain of respiratory information extraction from PPG has revolved around estimating respiratory rate.

Despite the substantial diagnostic value offered by respiration rate, extracting information about the respiratory pattern from ECG and PPG would allow for a more comprehensive evaluation of sleep conditions and other chronic respiratory ailments. The accessibility and comfort of PPG make it a viable candidate for extraction of respiration information over ECG. Prinable et al. [5]

propose a novel approach of estimating tidal volume from PPG utilizing features extracted by applying various bandpass filters. However, this method was limited to using only 30 features and was evaluated on a dataset comprising of a single healthy volunteer which limits the variability for the respiration extraction task. The task of extracting respiration waveform from PPG can be formulated into a deep learning task similar to image segmentation wherein a set of filters are learned to transform an input image to a mask. U-Net is a well established deep learning network for performing image segmentation which uses a fully convolutional network utilizing “crop and concatenate” operation between pairs of encoder and decoder sections

[6]. U-net and the plethora of networks adapted from the U-net such as ResU-Net [7], V-net [8] have shown exemplary performance in the image and volumetric medical segmentation domain. Recently Stoller et al. [9] have proposed Wave-U-Net which is used to perform sound source seperation in the 1D audio domain. This method, however, uses the rudimentary version of U-Net as proposed by Ronneberger et al. [6]. A deep learning approach to extract respiratory signal from a PPG sensor would provide tremendous utility in the fitness and clinical domains. To this end, we have developed an end-to-end deep learning framework to extract the respiratory signal from an input PPG signal. This paper emphasizes a novel approach to separate a desired encoded signal contained within an input signal through training against a target reference signal.

In summary, the contributions in this paper are as follows:

  • We propose a fully convolutional network to perform end-to-end respiratory signal extraction from an input PPG signal.

  • We study the performance of the network on two datasets using different modalities of respiratory measurement.

  • We extensively evaluated the proposed method on the two datasets and have compared similitude and signal reconstruction error performance against two state of the art signal processing methods.

Ii Methodology

Ii-a Problem formulation

The task is to extract a respiratory signal from a given input PPG signal. The dataset consists of the input PPG signal and reference respiration signal , where and . The reference respiration signal is of the same size of the input PPG signal .

The proposed RespNet network is designed in the topology of a fully convolutional encoder-decoder. The encoder section take

as input and through downsampling, it produces feature vectors

. The decoder section uses as its input and through upsampling produces an output predicted respiration signal . These are represented by equations 1 and 2.


where , are function representations of the encoder and decoder with parameters and . The decoder network outputs which denotes the predicted respiratory signal.

The parameters in the proposed architecture is optimized by minimizing the smooth loss between and

. The loss function

is defined as:


Ii-B Model Architecture

The proposed network is adapted from the IncResU-Net network [10] which was made for 2D medical image segmentation application. The architecture of the proposed fully convolutional network for performing Respiration signal extraction is shown in Fig. 1. The encoder section is divided into eight levels to perform the downsampling operation. A 1D convolution operation of size 1

4 is used to downsample the input features. Instead of carrying out the downsampling operation using max-pooling, strided convolution is used instead to improve training efficiency


. The downsampling operation decreases the input size while increasing the number of filters at each encoder by a factor of two till the number of encoder filters are 512 after which subsequent encoder levels are maintained at 512 filters. In each encoder level, 1D Convolution with stride 4 is applied followed by Batch Normalization and leaky ReLU (with slope 0.2).

The output of each encoder level is then provided to the dilated residual inception block is seen in Fig. 2

. Usage of the dilated residual inception block provides a larger receptive field without a significant increase in parameters. Further the use of residual connections within the block is meant to greatly reduce the vanishing gradient problem and reduce the convergence time during training

[12]. The decoder section of the proposed network utilizes feature concatenation between the feature map of its corresponding encoder pair at the respective level similar to the original U-Net [6].

Fig. 1: Proposed Respiration Extraction Network: Encoder-Decoder architecture utilizing Dilated residual inception blocks
Fig. 2: Dilated residual inception block

After performing convolution and dilated residual convolution operation, upsampling is performed using a deconvolution operation at each level of the decoder. In the final level of the decoder 11 1D convolution operation is performed to map the features channels to the desired number of output channels.

Iii Dataset Description

To perform training and evaluation of the proposed network, we make use of two distinct and publically available datasets: CapnoBase [13] and Vortal [14]. While both datasets record ECG, PPG and a reference respiratory signal, the reference signal used in both cases are different. While the reference respiratory signal in the CapnoBase dataset was collected using capnometry, the Vortal dataset used Impedance Pneumography and Oral-Nasal pressure signals. The details about the datasets including the different sampling rate can be found in Table I. The CapnoBase dataset is comprised of 8 minute recordings of ECG and transmittance PPG along with capnometry from 42 subjects (13 adults, 29 children & neonates). The data collection was performed on patients undergoing elective surgery or routine anesthesia. The Vortal dataset consists of recordings of ECG, reflectance PPG and reference respiratory signals. The dataset was collected on healthy volunteers of different age groups at supine posture. The PPG singals and reference respiration signals from both datasets were resampled to 256 Hz, this is to ensure compatibility with the proposed network which requires an input and label of size 2048 (2568). We extract 8-second length windows of PPG and reference respiration signal for both datasets and prepare an 80-20 train and test split for the respective datasets. Table I summarizes the dataset size used for training and testing.

sampling rate
Respiration reference
sampling rate
Number of
8-second windows
CapnoBase 100 Hz 25 Hz 2520 2016 504
Vortal 500 Hz 25 Hz 10443 8354 2089
TABLE I: Dataset Description

Iv Experiments and results

Fig. 3: (a) Sample PPG, reference respiration signal (label source) & RespNet prediction (label U-net) for CapnoBase dataset  (b) Sample PPG, reference respiration signal and RespNet prediction for Vortal dataset

Iv-a Training Method

During training, the network was initialized with random weights. Smooth L1 error is determined between network prediction and ground-truth signal for each minibatch comprising of 256 input windows. The network parameters were optimized using Stochastic Gradient Descent. The learning rate of the network was set to 0.01 and the momentum to 0.7. The training was carried out for 2000 epochs. The model was developed and implemented in PyTorch

[15]. The training was carried out in a workstation using a i7 8700K CPU and Nvidia GTX1080Ti 11GB GPU. Training and evaluation of the proposed network was carried out separately for both datasets due to the difference in sensing modality used to acquire their respective reference respiration signals.

Iv-B Evaluation Method

The respiratory signals extracted from both CapnoBase and Vortal PPG datasets using the proposed network was validated with their corresponding ground truth respiratory signals. Further, comparative metrics were obtained through similar evaluation of amplitude modulation (WAM), frequency modulation (WFM) respiratory signals extracted from the above mentioned PPG datasets using the RRest toolbox [16]. Signal similarity evaluation is carried out by finding the cross-correlation and Mean Square Error (MSE), which are commonly used to measure similarity of two signals [17] [18]. Lag is also evaluated along with cross-correlation to study if the change in respiration signal reference influences lag between the model prediction and reference. The WAM and WFM predictions from the RRest library is obtained using the input PPG signal. The respiration predictions provided by the library were 60 Hz, hence the respiration signal predictions obtained from the RespNet network were accordingly downsampled before evaluation. Min-Max normalization was applied to all the signals to scale the values between the range 0 and 1 before determining the comparison metrics. MSE and cross-correlation were found for the WAM, WFM and RespNet predictions against the normalized reference respiration signal for both datasets. Table II demonstrates the performance of the proposed RespNet network against conventional methods through mean cross-correlation, mean lags and mean MSE.

Dataset Method MSE Cross-Correlation Lag
CapnoBase WAM 0.301 0.925 0.024
WFM 0.364 0.858 0.014
RespNet (Ours) 0.262 0.933 0.004
Vortal WAM 0.247 0.927 1.929
WFM 0.272 0.853 1.706
RespNet (Ours) 0.145 0.931 0.052
TABLE II: Comparison between RespNet and other methods

V Discussion

As can be seen in Table II the proposed network, RespNet shows better performance in the task of respiration signal extraction from an input PPG when compared to conventional approaches. The respiration predictions provided by RespNet have lower MSE while having high cross-correlation and low lag with reference respiration signal for both datasets. The lower lag of the proposed network is more apparent during evaluation in the Vortal dataset wherein a different reference respiration modality (Impedance Pneumograph) was used. This also shows the unique advantage poised by such a learning method compared to standard signal processing approach which allows for adaptation towards a diverse range of respiration sensing modalities during training. Figure. 2(a) shows a sample input PPG signal from the CapnoBase dataset along with the reference respiration signal and  RespNet prediction. Figure. 2(b) shows a sample input PPG signal from the Vortal dataset along with the reference respiration signal and RespNet prediction.

Vi Conclusion

The present work describes a novel approach to extract the respiration signal from PPG as opposed to performing respiratory rate estimation. The end-to-end deep learning framework proposed utilizes PPG signal as input and allows training with any corresponding reference respiratory signal. We report superior performance compared to traditional methods with a MSE of 0.262 and 0.145 and cross-correlation of 0.933 and 0.931 for the respective datasets. This indicates the feasibility of extracting respiratory signal from wearable devices for a variety of applications including an inexpesive approach to monitor breathing retraining exercises. However extensive training has to be carried out on a wide range of breathing anomalies and the corresponding performance study needs to be carried out. Future scope of the proposed study would be to improve network performance in detecting inspiratory and expiratory loads and exploring the feasibility of performing respiration extraction from a wrist-worn reflectance PPG sensor. Additionally, the performance of the network under mild and major motion conditions requires evaluation.


The authors would like to acknowledge Dr. Peter H Charlton from King’s College London for providing access to the Vortal dataset for carrying out this study and for the development of the RRest library on MATLAB.


  • [1] P. Theerawit, Y. Sutherasan, L. Ball, and P. Pelosi, “Respiratory monitoring in adult intensive care unit,” Expert Review of Respiratory Medicine, vol. 11, no. 6, pp. 453–468, 2017.
  • [2] R. Boulding, R. Stacey, R. Niven, and S. J. Fowler, “Dysfunctional breathing: A review of the literature and proposal for classification,” European Respiratory Review, vol. 25, no. 141, pp. 287–294, 2016.
  • [3] T. D. Bradley and J. S. Floras, “Obstructive sleep apnoea and its cardiovascular consequences,” The Lancet, vol. 373, no. 9657, pp. 82–93, 2009.
  • [4] D. Clifton, G. J. Douglas, P. S. Addison, and J. N. Watson, “Measurement of respiratory rate from the photoplethysmogram in chest clinic patients,” Journal of Clinical Monitoring and Computing, vol. 21, no. 1, pp. 55–61, 2007.
  • [5]

    J. B. Prinable, P. W. Jones, C. Thamrin, and A. Mcewan, “Using a recurrent neural network to derive tidal volume from a photoplethsmograph,” 2018.

  • [6] O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,”

    Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

    , vol. 9351, pp. 234–241, 2015.
  • [7] S. M. Shankaranarayana, K. Ram, K. Mitra, and M. Sivaprakasam, “Joint optic disc and cup segmentation using fully convolutional and adversarial networks,” in Fetal, Infant and Ophthalmic Medical Image Analysis, (Cham), pp. 168–176, Springer International Publishing, 2017.
  • [8]

    F. Milletari, N. Navab, and S. A. Ahmadi, “V-Net: Fully convolutional neural networks for volumetric medical image segmentation,”

    Proceedings - 2016 4th International Conference on 3D Vision, 3DV 2016, pp. 565–571, 2016.
  • [9] D. Stoller, S. Ewert, and S. Dixon, “Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation,” 2018.
  • [10] S. M. Shankaranarayana, K. Ram, K. Mitra, and M. Sivaprakasam, “Fully convolutional networks for monocular retinal depth estimation and optic disc-cup segmentation,” arXiv preprint arXiv:1606.04797, February 2019.
  • [11] J. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, “Striving for simplicity: The all convolutional net,” in ICLR workshop), 2015.
  • [12] K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” in

    Proceedings of the IEEE conference on computer vision and pattern recognition

    , pp. 770–778, 2016.
  • [13] W. Karlen, M. Turner, E. Cooke, G. Dumont, and J. M. Ansermino, “Capnobase: Signal database and tools to collect, share and annotate respiratory signals,” in 2010 Annual Meeting of the Society for Technology in Anesthesia, pp. 25–25, Society for Technology in Anesthesia, 2010.
  • [14] P. H. Charlton, T. Bonnici, L. Tarassenko, D. A. Clifton, R. Beale, and P. J. Watkinson, “An assessment of algorithms to estimate respiratory rate from the electrocardiogram and photoplethysmogram,” Physiological Measurement, vol. 37, no. 4, pp. 610–626, 2016.
  • [15] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
  • [16] P. H. Charlton, M. Villarroel, and F. Salguiero, “Waveform analysis to estimate respiratory rate,” in Secondary Analysis of Electronic Health Records, pp. 377–390, Springer, 2016.
  • [17] L. Estrada, A. Torres, L. Sarlabous, and R. Jané, “Emg-derived respiration signal using the fixed sample entropy during an inspiratory load protocol,” in Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE, pp. 1703–1706, IEEE, 2015.
  • [18] L. V. Batista, E. U. K. Melcher, and L. C. Carvalho, “Compression of ecg signals by optimized quantization of discrete cosine transform coefficients,” Medical engineering & physics, vol. 23, no. 2, pp. 127–134, 2001.