I Introduction
Measurement and monitoring respiration of an individual is useful in a plethora of clinical conditions such as pulmonary diseases and sleep-related respiratory ailments. An abnormal respiration pattern is a clinically relevant event for identifying patient deterioration under intensive care [1]. Dysfunctional breathing refer to a group of breathing disorders that has been associated with a wide range of chronic respiratory ailments such as asthma and chronic obstructive pulmonary disease (COPD) [2]. Sleep apneas not only cause significant disruptions to sleep but also increase the risk factor for various cardiac diseases such as hypertrophy and heart failure [3]. Traditional measurement of respiration is carried out through the use of either a spirometer (airflow), pneumography (chest movement), or abdominal electromyography (diaphragm muscle activity). These measurement modalities are cumbersome and expensive for widespread adoption in a general ward or home setting. This necessitates the use of unobtrusive sensors for obtaining respiratory information from patients in free-living conditions. Indirect measurement of respiration can be obtained from either the ECG or PPG. The amplitude, baseline and frequency modulation of the ECG and PPG signals due to respiration is well documented in literature [4]. Many consumer-grade activity monitors and smartwatches, in particular, allow for inexpensive and ambulatory monitoring of PPG. Research in the domain of respiratory information extraction from PPG has revolved around estimating respiratory rate.
Despite the substantial diagnostic value offered by respiration rate, extracting information about the respiratory pattern from ECG and PPG would allow for a more comprehensive evaluation of sleep conditions and other chronic respiratory ailments. The accessibility and comfort of PPG make it a viable candidate for extraction of respiration information over ECG. Prinable et al. [5]
propose a novel approach of estimating tidal volume from PPG utilizing features extracted by applying various bandpass filters. However, this method was limited to using only 30 features and was evaluated on a dataset comprising of a single healthy volunteer which limits the variability for the respiration extraction task. The task of extracting respiration waveform from PPG can be formulated into a deep learning task similar to image segmentation wherein a set of filters are learned to transform an input image to a mask. U-Net is a well established deep learning network for performing image segmentation which uses a fully convolutional network utilizing “crop and concatenate” operation between pairs of encoder and decoder sections
[6]. U-net and the plethora of networks adapted from the U-net such as ResU-Net [7], V-net [8] have shown exemplary performance in the image and volumetric medical segmentation domain. Recently Stoller et al. [9] have proposed Wave-U-Net which is used to perform sound source seperation in the 1D audio domain. This method, however, uses the rudimentary version of U-Net as proposed by Ronneberger et al. [6]. A deep learning approach to extract respiratory signal from a PPG sensor would provide tremendous utility in the fitness and clinical domains. To this end, we have developed an end-to-end deep learning framework to extract the respiratory signal from an input PPG signal. This paper emphasizes a novel approach to separate a desired encoded signal contained within an input signal through training against a target reference signal.In summary, the contributions in this paper are as follows:
-
We propose a fully convolutional network to perform end-to-end respiratory signal extraction from an input PPG signal.
-
We study the performance of the network on two datasets using different modalities of respiratory measurement.
-
We extensively evaluated the proposed method on the two datasets and have compared similitude and signal reconstruction error performance against two state of the art signal processing methods.
Ii Methodology
Ii-a Problem formulation
The task is to extract a respiratory signal from a given input PPG signal. The dataset consists of the input PPG signal and reference respiration signal , where and . The reference respiration signal is of the same size of the input PPG signal .
The proposed RespNet network is designed in the topology of a fully convolutional encoder-decoder. The encoder section take
as input and through downsampling, it produces feature vectors
. The decoder section uses as its input and through upsampling produces an output predicted respiration signal . These are represented by equations 1 and 2.(1) |
(2) |
where , are function representations of the encoder and decoder with parameters and . The decoder network outputs which denotes the predicted respiratory signal.
The parameters in the proposed architecture is optimized by minimizing the smooth loss between and
. The loss function
is defined as:(3) |
(4) |
Ii-B Model Architecture
The proposed network is adapted from the IncResU-Net network [10] which was made for 2D medical image segmentation application. The architecture of the proposed fully convolutional network for performing Respiration signal extraction is shown in Fig. 1. The encoder section is divided into eight levels to perform the downsampling operation. A 1D convolution operation of size 1
4 is used to downsample the input features. Instead of carrying out the downsampling operation using max-pooling, strided convolution is used instead to improve training efficiency
[11]. The downsampling operation decreases the input size while increasing the number of filters at each encoder by a factor of two till the number of encoder filters are 512 after which subsequent encoder levels are maintained at 512 filters. In each encoder level, 1D Convolution with stride 4 is applied followed by Batch Normalization and leaky ReLU (with slope 0.2).
The output of each encoder level is then provided to the dilated residual inception block is seen in Fig. 2
. Usage of the dilated residual inception block provides a larger receptive field without a significant increase in parameters. Further the use of residual connections within the block is meant to greatly reduce the vanishing gradient problem and reduce the convergence time during training
[12]. The decoder section of the proposed network utilizes feature concatenation between the feature map of its corresponding encoder pair at the respective level similar to the original U-Net [6].

After performing convolution and dilated residual convolution operation, upsampling is performed using a deconvolution operation at each level of the decoder. In the final level of the decoder 11 1D convolution operation is performed to map the features channels to the desired number of output channels.
Iii Dataset Description
To perform training and evaluation of the proposed network, we make use of two distinct and publically available datasets: CapnoBase [13] and Vortal [14]. While both datasets record ECG, PPG and a reference respiratory signal, the reference signal used in both cases are different. While the reference respiratory signal in the CapnoBase dataset was collected using capnometry, the Vortal dataset used Impedance Pneumography and Oral-Nasal pressure signals. The details about the datasets including the different sampling rate can be found in Table I. The CapnoBase dataset is comprised of 8 minute recordings of ECG and transmittance PPG along with capnometry from 42 subjects (13 adults, 29 children & neonates). The data collection was performed on patients undergoing elective surgery or routine anesthesia. The Vortal dataset consists of recordings of ECG, reflectance PPG and reference respiratory signals. The dataset was collected on healthy volunteers of different age groups at supine posture. The PPG singals and reference respiration signals from both datasets were resampled to 256 Hz, this is to ensure compatibility with the proposed network which requires an input and label of size 2048 (2568). We extract 8-second length windows of PPG and reference respiration signal for both datasets and prepare an 80-20 train and test split for the respective datasets. Table I summarizes the dataset size used for training and testing.
|
|
|
|
|
|
||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
CapnoBase | 100 Hz | 25 Hz | 2520 | 2016 | 504 | ||||||||||||
Vortal | 500 Hz | 25 Hz | 10443 | 8354 | 2089 |
Iv Experiments and results
![]() |
![]() |
Iv-a Training Method
During training, the network was initialized with random weights. Smooth L1 error is determined between network prediction and ground-truth signal for each minibatch comprising of 256 input windows. The network parameters were optimized using Stochastic Gradient Descent. The learning rate of the network was set to 0.01 and the momentum to 0.7. The training was carried out for 2000 epochs. The model was developed and implemented in PyTorch
[15]. The training was carried out in a workstation using a i7 8700K CPU and Nvidia GTX1080Ti 11GB GPU. Training and evaluation of the proposed network was carried out separately for both datasets due to the difference in sensing modality used to acquire their respective reference respiration signals.Iv-B Evaluation Method
The respiratory signals extracted from both CapnoBase and Vortal PPG datasets using the proposed network was validated with their corresponding ground truth respiratory signals. Further, comparative metrics were obtained through similar evaluation of amplitude modulation (WAM), frequency modulation (WFM) respiratory signals extracted from the above mentioned PPG datasets using the RRest toolbox [16]. Signal similarity evaluation is carried out by finding the cross-correlation and Mean Square Error (MSE), which are commonly used to measure similarity of two signals [17] [18]. Lag is also evaluated along with cross-correlation to study if the change in respiration signal reference influences lag between the model prediction and reference. The WAM and WFM predictions from the RRest library is obtained using the input PPG signal. The respiration predictions provided by the library were 60 Hz, hence the respiration signal predictions obtained from the RespNet network were accordingly downsampled before evaluation. Min-Max normalization was applied to all the signals to scale the values between the range 0 and 1 before determining the comparison metrics. MSE and cross-correlation were found for the WAM, WFM and RespNet predictions against the normalized reference respiration signal for both datasets. Table II demonstrates the performance of the proposed RespNet network against conventional methods through mean cross-correlation, mean lags and mean MSE.
Dataset | Method | MSE | Cross-Correlation | Lag |
CapnoBase | WAM | 0.301 | 0.925 | 0.024 |
WFM | 0.364 | 0.858 | 0.014 | |
RespNet (Ours) | 0.262 | 0.933 | 0.004 | |
Vortal | WAM | 0.247 | 0.927 | 1.929 |
WFM | 0.272 | 0.853 | 1.706 | |
RespNet (Ours) | 0.145 | 0.931 | 0.052 |
V Discussion
As can be seen in Table II the proposed network, RespNet shows better performance in the task of respiration signal extraction from an input PPG when compared to conventional approaches. The respiration predictions provided by RespNet have lower MSE while having high cross-correlation and low lag with reference respiration signal for both datasets. The lower lag of the proposed network is more apparent during evaluation in the Vortal dataset wherein a different reference respiration modality (Impedance Pneumograph) was used. This also shows the unique advantage poised by such a learning method compared to standard signal processing approach which allows for adaptation towards a diverse range of respiration sensing modalities during training. Figure. 2(a) shows a sample input PPG signal from the CapnoBase dataset along with the reference respiration signal and RespNet prediction. Figure. 2(b) shows a sample input PPG signal from the Vortal dataset along with the reference respiration signal and RespNet prediction.
Vi Conclusion
The present work describes a novel approach to extract the respiration signal from PPG as opposed to performing respiratory rate estimation. The end-to-end deep learning framework proposed utilizes PPG signal as input and allows training with any corresponding reference respiratory signal. We report superior performance compared to traditional methods with a MSE of 0.262 and 0.145 and cross-correlation of 0.933 and 0.931 for the respective datasets. This indicates the feasibility of extracting respiratory signal from wearable devices for a variety of applications including an inexpesive approach to monitor breathing retraining exercises. However extensive training has to be carried out on a wide range of breathing anomalies and the corresponding performance study needs to be carried out. Future scope of the proposed study would be to improve network performance in detecting inspiratory and expiratory loads and exploring the feasibility of performing respiration extraction from a wrist-worn reflectance PPG sensor. Additionally, the performance of the network under mild and major motion conditions requires evaluation.
Acknowledgment
The authors would like to acknowledge Dr. Peter H Charlton from King’s College London for providing access to the Vortal dataset for carrying out this study and for the development of the RRest library on MATLAB.
References
- [1] P. Theerawit, Y. Sutherasan, L. Ball, and P. Pelosi, “Respiratory monitoring in adult intensive care unit,” Expert Review of Respiratory Medicine, vol. 11, no. 6, pp. 453–468, 2017.
- [2] R. Boulding, R. Stacey, R. Niven, and S. J. Fowler, “Dysfunctional breathing: A review of the literature and proposal for classification,” European Respiratory Review, vol. 25, no. 141, pp. 287–294, 2016.
- [3] T. D. Bradley and J. S. Floras, “Obstructive sleep apnoea and its cardiovascular consequences,” The Lancet, vol. 373, no. 9657, pp. 82–93, 2009.
- [4] D. Clifton, G. J. Douglas, P. S. Addison, and J. N. Watson, “Measurement of respiratory rate from the photoplethysmogram in chest clinic patients,” Journal of Clinical Monitoring and Computing, vol. 21, no. 1, pp. 55–61, 2007.
-
[5]
J. B. Prinable, P. W. Jones, C. Thamrin, and A. Mcewan, “Using a recurrent neural network to derive tidal volume from a photoplethsmograph,” 2018.
-
[6]
O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for
biomedical image segmentation,”
Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
, vol. 9351, pp. 234–241, 2015. - [7] S. M. Shankaranarayana, K. Ram, K. Mitra, and M. Sivaprakasam, “Joint optic disc and cup segmentation using fully convolutional and adversarial networks,” in Fetal, Infant and Ophthalmic Medical Image Analysis, (Cham), pp. 168–176, Springer International Publishing, 2017.
-
[8]
F. Milletari, N. Navab, and S. A. Ahmadi, “V-Net: Fully convolutional neural networks for volumetric medical image segmentation,”
Proceedings - 2016 4th International Conference on 3D Vision, 3DV 2016, pp. 565–571, 2016. - [9] D. Stoller, S. Ewert, and S. Dixon, “Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation,” 2018.
- [10] S. M. Shankaranarayana, K. Ram, K. Mitra, and M. Sivaprakasam, “Fully convolutional networks for monocular retinal depth estimation and optic disc-cup segmentation,” arXiv preprint arXiv:1606.04797, February 2019.
- [11] J. Springenberg, A. Dosovitskiy, T. Brox, and M. Riedmiller, “Striving for simplicity: The all convolutional net,” in ICLR workshop), 2015.
-
[12]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image
recognition,” in
Proceedings of the IEEE conference on computer vision and pattern recognition
, pp. 770–778, 2016. - [13] W. Karlen, M. Turner, E. Cooke, G. Dumont, and J. M. Ansermino, “Capnobase: Signal database and tools to collect, share and annotate respiratory signals,” in 2010 Annual Meeting of the Society for Technology in Anesthesia, pp. 25–25, Society for Technology in Anesthesia, 2010.
- [14] P. H. Charlton, T. Bonnici, L. Tarassenko, D. A. Clifton, R. Beale, and P. J. Watkinson, “An assessment of algorithms to estimate respiratory rate from the electrocardiogram and photoplethysmogram,” Physiological Measurement, vol. 37, no. 4, pp. 610–626, 2016.
- [15] A. Paszke, S. Gross, S. Chintala, G. Chanan, E. Yang, Z. DeVito, Z. Lin, A. Desmaison, L. Antiga, and A. Lerer, “Automatic differentiation in pytorch,” 2017.
- [16] P. H. Charlton, M. Villarroel, and F. Salguiero, “Waveform analysis to estimate respiratory rate,” in Secondary Analysis of Electronic Health Records, pp. 377–390, Springer, 2016.
- [17] L. Estrada, A. Torres, L. Sarlabous, and R. Jané, “Emg-derived respiration signal using the fixed sample entropy during an inspiratory load protocol,” in Engineering in Medicine and Biology Society (EMBC), 2015 37th Annual International Conference of the IEEE, pp. 1703–1706, IEEE, 2015.
- [18] L. V. Batista, E. U. K. Melcher, and L. C. Carvalho, “Compression of ecg signals by optimized quantization of discrete cosine transform coefficients,” Medical engineering & physics, vol. 23, no. 2, pp. 127–134, 2001.
Comments
There are no comments yet.