Photoplethysmography (PPG) is an optical technique commonly employed in wearables and other medical devices to measure volume changes of blood in the microvascular tissue during the cardiac cycle. Light becomes reflected and absorbed at different rates during this cycle and the reflected light is read by a photo-sensor to detect these changes. The output from this sensor is then processed so a valid heart rate estimation can be determined.
Heart rate can be measured at multiple sites on the body using PPG including, but not limited to; ear, forehead, fingertip, ankle and wrist. In the context of personalised health and fitness monitoring using wearables, the wrist is the most frequently used location for photoplethysmographic heart rate monitoring. Accuracies of consumer-grade wearables, for the most part, are acceptable but are prone to errors during daily activities . The difficulties associated with correctly estimating heart rate arise mostly in obtaining a strong physiological reading from the sensors. Often the signals read from the PPG modules are heavily corrupted with motion artefacts and the movement of the limbs is a major contributor to this introduced artefact. Retrieval of a clean PPG signal from a heavily corrupted signal can be achieved by applying filtering techniques including adaptive methods based on a measure of the artefact sourced from an accelerometer-based measurement .
We have shown previously that human activity recognition (HAR) can be performed on optical signals, taking advantage of the artefact present in the signal . In this study we take this a step further, exploring to what extent HAR is sufficiently accurate when decreasing the sampling frequency and investigating whether we can obtain a valid heart rate estimation without further on-board filtering of the PPG signal thus reducing computing requirements.
The battery life of smartwatches and fitness trackers vary greatly depending on the features and functionality available on-board the wearable. The Apple Watch Series 5, which is more of a lifestyle and fitness tracker, can run for a period of up to 18 hours whereas the Fitbit Charge 3 fitness tracker can go for up to 7 days on a single charge. Continuous activity and heart rate monitoring speed up the depletion of the battery of most wearables. Gathering and processing of simultaneous sensor data can further increase the power consumption of the devices. Without explicitly stating the sampling frequency, Apple state that their heart rate monitor Light Emitting Diodes (LEDs) blink “hundreds of times per second” .
Capitalising on recent advancements in machine learning could pave the way for the simplification of wearables, allowing for a reduction in power requirements and subsequently smaller and lower-cost devices. The work described in this paper is part of a larger-scoped effort to develop easily deployed artificial intelligence which can be used and interpreted by end-users who do not have deep levels of signal processing expertise.
In this paper we demonstrate the contributions of our pipeline, using a standalone optical sensor for both activity recognition and heart rate monitoring with significantly reduced sampling frequencies. This novel approach yields not only improved power efficiency but does so without significantly sacrificing accuracy thus advancing the development of simpler, more cost-effective wearables.
Although globally people are using hospitals more efficiently, public healthcare expenditure is rising. For example, in Ireland expenditure has risen from €14.9 billion in 2009 to an estimated €16.8 billion in 2018 with the increasing prevalence of chronic illness requiring long-term patient-provider engagement and management, accounting for roughly 80% of spending [9, 13, 7]. Frost & Sullivan in 2010 predicted, based on the then-current trends, that healthcare spending in Western economies would almost double (as a proportion of GDP) by 2050, reaching 20%-30% of GDP in some cases. The report also stated that per capita, healthcare spending is rising faster than per capita income in most countries .
As a response, globally there is a change in how healthcare is managed. For example, in Ireland, the Department of Health has signalled a major shift in the paradigm of treating people with illnesses. This signals a change in health policy from a reactive to proactive treatment-based models where the focus is increasingly on keeping people healthy . Advancements in digital health technologies, including mHealth and MedTech, have the potential to contribute significantly to a transformation in healthcare delivery, e.g. enabling proactive care through the use of continuous monitoring devices and application of advanced data analytics that enable greater personalisation of treatments . Thus the role of data gathered from wearables is important as part of such a shift in healthcare provision policies.
Ii Related Work
Convolutional Neural Networks (CNN) have contributed tremendously to the success of machine learning since their introduction in the 1990s. They are an example of neuroscientific principles influencing deep learning , in that they are designed to mimic the processing of images in the visual cortex of the human brain 
. Fully automatic learning of a CNN allows the neural network to extract features that are salient in the input data across different layers. Given the correct training, a CNN allows for the implementation of high accuracy classifiers without the need for signal processing or feature extraction knowledge. This had contributed to their success in practical applications, particularly with image classification.
The current state of the art in HAR systems are camera-based which allow for direct capture of the data but consequently requires significant computer processing to determine distinct activities. HAR studies are frequently carried out using data from inertial measurement units (IMU) which measure proper acceleration of a body or limb. Signal processing and feature extraction for these HAR studies are not trivial, including (but not limited to); singular value decomposition (SVD), support vector machine (SVM) and Random Forest (RF). High accuracy ranging from 80% to 99% can be achieved with such signal processing techniques but they often require a combination of sensor modalities and using multiple IMUs located on various parts of the body which in turn gives rise to scalability and functionality issues in these studies.
Few studies have employed the use of a PPG sensor only for HAR as they are more commonly used with heart rate estimation [6, 4]. Biagetti et al. conducted a study on the same dataset used in this paper for activity recognition . Using the PPG data only for HAR they achieved 44.7% classification accuracy using their feature extraction algorithm. Later the authors combined the PPG data with accelerometer data and achieved 78.0% accuracy using their feature extraction technique. Mehrang et al. used a combination of PPG and accelerometer with feature extraction and classification techniques such as RF and SVM, achieving accuracy of and respectively.
It should be noted that leading, modern feature extraction and classification techniques using multiple IMUs can achieve 80% to 99% classification accuracy, which may require several sensors located throughout the body .
, the authors proposed a method to estimate heart rate using a CNN trained on a sequence of facial images. Reiss et. al sought to solve a regression problem by estimating heart rate from PPG and accelerometer data by computing the Fast Fourier Transform (FFT) and z-normalisation on the 4 input channels to a CNN. Extending on this, using a standalone PPG we develop a CNN regression architecture for heart rate estimation on a single channel time series without any preprocessing.
Junker, Lukowicz and Tröster  downsampled wearable accelerometers from 100 Hz in a wearable context recognition system. The authors found that they could achieve sufficient classification accuracy rates for sampling frequencies as low as 20 Hz. However, a significant drop in accuracy (below 60%) is observed when the sampling rate is reduced to 10Hz.
Finally, in  the authors use a developed wrist-worn wearable consisting of a two-axis accelerometer, microphone, light and temperature sensors for context-aware wearable computing. They found that a sampling frequency of 6 Hz yields comparative accuracy compared to much high sampling rates using available time domain features with machine learning. Following this Krause et al. demonstrated that this decrease in sampling frequency from 20Hz to 6Hz increases the battery life of their constructed wearable by 85%.
Iii-a Computing Platform
A readily available wrist PPG exercise dataset collected by Jarchi and Casson  and publicly available on PhysioNet was used for the experiments in this paper . Data was collected during exercise by 8 healthy patients (5 male, 3 female) with a sampling frequency of 256 Hz. Data was gathered using a wrist-worn PPG sensor on board the Shimmer 3 GSR+ unit for an average recording time of 4 to 6 minutes with a maximum time of 10 minutes. Four exercises were performed; two on a stationary exercise bike and two on a treadmill. The exercises are broken down as follows; walk on a treadmill, run on a treadmill, high resistance exercise bike and low resistance exercise bike. No further filtering is applied to the PPG data for the treadmill exercises other than what the Shimmer unit provides on board. For the exercise bike recordings there was high frequency noise present which was filtered in MATLAB using a second order IIR Butterworth filter with a 15Hz cutoff frequency .
To accurately evaluate the unfiltered PPG heart rate performance, we compare it with a concurrent ECG that was collected by the authors of the data gathering experiment described above. This will provide a ground truth against which to assess our heart rate estimation.
Iii-C Downsampling and Segmentation
Prior to segmenting and plotting the PPG signal it was downsampled to a number of different sampling frequencies. The classifier was trained in Python using the full 256Hz sampling frequency, then retrained on the downsampled frequencies of 30Hz, 15Hz, 10Hz, 5Hz and 1Hz respectively.
Once the signal had been downsampled it was then segmented into smaller chunks. A simple rectangular windowing function was used to capture 8 seconds worth of data and step through the data in increments of 1 second.
Iii-D Human Activity Recognition
A CNN based on the Inception-V3 architecture and pre-trained on ImageNet was used as the classifier for the HAR experiments. The deep model was retrained leveraging the technique of transfer learning
, the penultimate layer had its weights updated while all other layers remained the same. This allowed the use of smaller amounts of data to train a model with a large learning capacity that would normally require a lot of data and time to train from scratch. The retraining process can be fine-tuned through the optimisation of hyperparameters. The parameters were set as their defaults in this experiment except for the number of training steps which were changed from 10,000 to 4,000. This helped minimise overfitting through sufficient convergence of the loss function (cross-entropy). See Figure2 below for a block diagram of the processing pipeline associated with our methodology.
As a machine vision approach is applied using this classifier, the temporal PPG signals are saved as images rather than time series vectors to be used as input. Matplotlib, a Python plotting library was used to plot the PPG signal as images, which were saved as 299x299 JPEGs. All axis labels, legends, titles and grid ticks were removed. Python’s wfdb library was used to pull and load the data from PhysioNet.
To train the HAR classifier, a total of 6,653 images were stored in four sub-directories of the possible predicted classes (High, Low, Run and Walk). A train/test split validation approach was taken in this experiment. 80% of the data was used for training, 10% for validation and 10% for testing. See Figure 1 for examples of PPG data used during training of the classifier.
Iii-E Estimation of Heart Rate
We designed a CNN with the output layer replaced by a regression layer. We refer to this model as CNNR (Convolutional Neural Network with Regression). It is a four-layer 1-D convolutional network with batch normalization and ReLU (Rectified Linear Units) followed by a fully connected and regression layer respectively. The model architecture can be seen in Figure3 below. This model is used to estimate heart rate from the noisy PPG data. We used a train-test split of 90/10 for the CNNR.
HeartPy, an open-source toolkit for estimating heart rate from the PPG data, was used in our work as a baseline reference to compare the performance of our CNNR approach. The HeartPy toolkit is designed to handle clean and noisy PPG data collected from either PPG or camera sensors. In the case of both our CNNR and HeartPy work, the PPG data used was the noisy, raw time-series signal. The estimated heart rate value for a segment of the signal was then compared to its concurrent ECG time series. The QRS peaks from the ECG were annotated as part of the data collection experiment. An estimated heart rate obtained using the CNNR and PPG toolkit on noisy data was then compared to the ECG heart rate which acted as the ground truth.
Iv-a Human Activity Recognition
The results for the HAR experiment are shown in Table I below. As expected, the highest classification accuracy of 90.8% is achieved when the original sampling frequency of 256Hz is used. However, we can still achieve a very competitive estimation performance even after downsampling the original sampling frequency to 5 Hz. Perhaps what is most surprising is the superior performance of our classifier when 10 Hz is chosen as the sampling frequency compared to the higher frequencies (15 Hz and 30 Hz) tested as part of this project. Due to the higher accuracy of 10Hz we also tested 12Hz, 11Hz, 9Hz and 8Hz as the chosen sampling frequency but found no anomaly as the surrounding frequencies yield similar accuracies. To further investigate the 10Hz performance, we low-pass filtered the PPG with a 4.5Hz cut-off frequency to remove possible aliasing but this did not impact the classification accuracy.
As a sampling frequency of 10 Hz performed the best out of the sampling frequencies tested, we show the training results for this sampling frequency over the 4,000 epochs along with the cross-entropy loss function and confusion matrix for exercise classification in Figures4, 5 and 6 respectively. We also show the relevant precision, recall and F1-scores in Table II.
Iv-B Estimation of Heart Rate
Results for estimating heart rate from the motion artefact (MA) corrupted PPG signal using HeartPy and our CNNR method are displayed below. Figure 7 and Figure 8 presents the average heart rate error across the various sampling frequencies for each exercise for the two methods. The Heart Rate Error (HRE) is defined here as the absolute difference between the estimated heart rate for a given PPG sample and the heart rate ground truth calculated from the concurrent ECG sample.
For the HeartPy method, exercise specific HRE is similar across all sampling rates except from the 10 Hz sampling frequency on the walk exercise. Other sampling frequencies return an error of between 46% and 55% whereas the 10 Hz sampling frequency reduces the error to 39%. The numerical results for the heart rate experiments is displayed in Table III where it can be clearly seen that 10Hz sampling frequency performs best for estimating heart rate from the MA corrupted signal.
Our CNNR results can be found in Table IV below. It can be seen that the HRE is similar across all exercises and there is not a distinguishable loss in accuracy for any of the sampling frequencies. For the walk exercise there is a great improvement in accurately estimating heart rate compared to the HeartPy method. It should be noted that average HRE across all exercises and sampling frequencies has decreased using the CNNR method from 22.59% to 20.15%, an increase in over 2 percentage points.
Iv-C Optimisation of CNNR
Following on from our results produced in , we wanted to further decrease the heart rate error. Computing a non-exhaustive grid search over some of the hyperparameters for the CNNR returned an average HRE of 13.62%, a decrease of nearly 7 percentage points from that of the CNNR without optimisation. We chose, number of epochs, learning rate and the train-test split as some parameters to optimise. The results for the optimisation process have been graphically presented in Figure 9 below. To the authors knowledge, this is the best result using CNNs adapted for regression to estimate heart rate data from raw, noisy PPG sensor data.
The approaches used in this paper yield highly competitive results for HAR even though only the optical signal is used. This demonstrates that more cost and power-efficient wearables are possible through the exploitation of secondary information available from a simple optical sensor. This suggests single-sensor based wearables can achieve much of the functionality and capabilities of more complex multi-modal wearables.
The sampling rate did not have too much of an adverse effect on the performance of the algorithms. Interestingly, the CNN performed better at a 10 Hz sampling frequency compared to 15 Hz and 30 Hz. The reasons behind this have not been fully investigated.
Perhaps what was the most surprising from the results presented in this paper was the heart rate estimation results. We demonstrate how a CNN regression approach is capable of robust heart rate estimation even during periods of high artefact. The performance, particularly during these high artefact scenarios, was superior to conventional signal processing approaches for such estimation as demonstrated by the relative performance of the open-source tool kit HeartPy which served as a baseline here. Furthermore, this heart rate estimation performance was sustained even at reduced sampling frequencies. Notably sampling the sensor at 5 samples per second just as well as all other sampling frequencies, including the original 256 Hz.
A pervasive computing approach to wearables is taken here. Using a low power wearable with a single optical sensor and a sampling frequency of 10 Hz we can demonstrate compelling performance both in heart rate estimation and human activity recognition. This has the potential to reduce costs, improve battery performance and encourage wider adoption of digital technologies to a larger population and allow the transition to personalised, patient-centred preventative models of healthcare. Increasing access and affordability to these technologies will in turn lower costs and the strain on public healthcare expenditure, as well as helping to improve overall wellness.
We gratefully acknowledge the support of NVIDIA Corporation with the donation of the Titan Xp used for this research.
-  (2007) Photoplethysmography and its application in clinical physiological measurement. Physiological Measurement 28 (3). External Links: Cited by: §I.
-  (2019)(Website) External Links: Cited by: §I.
-  (2018) Human activity recognition using accelerometer and photoplethysmographic signals. In Intelligent Decision Technologies 2017, I. Czarnowski, R. J. Howlett, and L. C. Jain (Eds.), Cham, pp. 53–62. External Links: Cited by: §II.
-  (2019) ActiPPG: using deep neural networks for activity recognition from wrist-worn photoplethysmography (ppg) sensors. Smart Health 14, pp. 100082. External Links: Cited by: §II.
-  (”forthcoming”) CNNs for heart rate estimation and human activity recognition in wrist worn sensing applications. In 2020 IEEE International Conference on Pervasive Computing and Communications, PerCom, Austin, Texas, March 23-27, 2020, Cited by: §IV-C.
-  (2018) An Interpretable Machine Vision Approach to Human Activity Recognition using Photoplethysmograph Sensor Data. In 26th AIAI Irish Conference on Artificial Intelligence and Cognitive Science, Dublin. Cited by: §I, §II.
-  (2019) 2019 global health care outlook. Note: https://www2.deloitte.com/content/dam/Deloitte/global/Documents/Life-Sciences-Health-Care/gx-lshc-hc-outlook-2019.pdf Cited by: §I, §I.
-  (2012) Future health – a strategic framework for reform of the health service 2012 – 2015. Note: https://assets.gov.ie/18890/d44343d71ee4484e85e7e2b45f693107.pdf Cited by: §I.
-  (2018) Health in ireland: key trends 2018. Note: https://www.gov.ie/en/press-release/4f8096-health-in-ireland-key-trends-2018/?referrer=/wp-content/uploads/2018/12/key-health-trends-2018.pdf/ Cited by: §I.
-  (2010) Top 20 global mega trends and their impact on business, cultures and society. Cited by: §I.
-  (2000 (June 13)) PhysioBank, PhysioToolkit, and PhysioNet: components of a new research resource for complex physiologic signals. Circulation 101 (23), pp. e215–e220. Note: Circulation Electronic Pages: http://circ.ahajournals.org/content/101/23/e215.full PMID:1085218; doi: 10.1161/01.CIR.101.23.e215 Cited by: §III-B.
-  (2016) Deep learning. MIT Press. Note: http://www.deeplearningbook.org Cited by: §II.
-  (2015) Digital healthcare -local challenges, global opportunities. Cited by: §I.
-  (2016) Description of a Database Containing Wrist PPG Signals Recorded during Physical Exercise with Both Accelerometer and Gyroscope Measures of Motion. Data 2 (1), pp. 1. External Links: Cited by: §III-B.
-  (2004-10) Sampling frequency, signal resolution and the accuracy of wearable context recognition systems. In Eighth International Symposium on Wearable Computers, Vol. 1, pp. 176–177. External Links: Cited by: §II.
-  (2005-10) Trading off prediction accuracy and power consumption for context-aware wearable computing. In Ninth IEEE International Symposium on Wearable Computers (ISWC’05), Vol. , pp. 20–26. External Links: Cited by: §II.
-  (2010) Machine learning methods for classifying human physical activity from on-body accelerometers. Sensors 10 (2), pp. 1154–1175. External Links: Cited by: §II, §II.
-  (2018) Human activity recognition using a single optical heart rate monitoring wristband equipped with triaxial accelerometer. In EMBEC & NBC 2017, H. Eskola, O. Väisänen, J. Viik, and J. Hyttinen (Eds.), Singapore, pp. 587–590. External Links: Cited by: §II.
-  (2019-03-11) Accuracy of consumer wearable heart rate measurement during an ecologically valid 24-hour period: intraindividual validation study. JMIR Mhealth Uhealth 7 (3), pp. e10828. External Links: Cited by: §I.
-  (2010) A Survey on Transfer Learning. IEEE Transactions on Knowledge and Data Engineering 22 (10), pp. 1345–1359. External Links: Cited by: §III-D.
-  (2019-07) EVM-cnn: real-time contactless heart rate estimation from facial video. IEEE Transactions on Multimedia 21 (7), pp. 1778–1787. External Links: Cited by: §II.
-  (2017) Python Machine Learning. Packt Publishing. External Links: Cited by: §II.
-  (2019-07) Deep ppg: large-scale heart rate estimation with convolutional neural networks. Sensors 19 (14), pp. 3079. External Links: Cited by: §II.
-  (2018) Visual heart rate estimation with convolutional neural network. In BMVC, Cited by: §II.
-  (2018) Heart rate analysis for human factors: development and validation of an open source toolkit for noisy naturalistic heart rate data. In Proceedings of The 6th HUMMANIST Conference, pp. 170–175. Cited by: §III-E.