The gold standard for monitoring instantaneous heart rate (iHR) is electrocardiogram (ECG) . Another popular noninvasive technique is photoplethysmogram (PPG) [2, 3]. Both techniques require direct skin contact with the subject, which might not be suitable in contexts such as driver drowsiness, or sleep monitoring. PPG relies on measuring the rapid variations in light absorption in an illuminated skin region caused by the difference in absorption curves for oxigenated and non-oxigenated blood. This principle motivated the use of digital cameras to measure the plethysmographic signals from face videos under ambient light conditions [4, 5, 6]. Several methodologies for estimating heart rate from face videos have been developed over the years [7, 8, 9, 10, 11, 12]. In particular,  provides a comprehensive overview of the history of the research done in this area and compares the performance of some of these approaches. As a general rule, most of these methods need an illumination source, depend on color band manipulation, and require control over the signal acquisition process (e.g., controlled light sources, or subjects remaining motionless during acquisition).
The recent inclusion of infrared (IR) cameras in many conventional devices, coupled with their resilience to low-light and variable-light conditions, make them especially attractive for remote monitoring in the context of iHR detection. Their use has just now started to be explored in the detection of heart rate using infrared face videos [14, 15], but so far these approaches are limited to estimating a heart rate average over a considerable time frame (over 30 seconds). This paper shows that, under controlled motion conditions, it is feasible to extract even sub-second approximations to the iHR using basic spatiotemporal analysis and time-frequency analysis.
We describe this approach, and show its performance on face IR videos acquired using a Kinect camera from 7 healthy volunteers. The extracted iHR is compared against ECG and contact PPG ground truth signals that were simultaneously acquired.
2 Non-contact PPG signal from IR video
Here we describe the proposed algorithm to construct the non-contact PPG signal from an IR face video and hence extract the instantaneous heart rate. We divide this process into three main steps. The first step is detecting and segmenting the face in the video into disjoint spatial regions. Secondly, we take the mean activity of each region, denoise it, and decompose it into a smaller subset of sources. Finally, we introduce a signal quality index to select the signals of interest, and combine them to construct the non-contact PPG signal we are after. Figure 1 summarizes the initial preprocessing stages, while Figure 2 shows an example of the subsequent recovery process of the non-contact PPG.
2.1 Input IR video
For each subject, denote the recorded IR video as , where denotes the recorded frame at time , which is of size (height width). Suppose the video is sampled every seconds; that is, sampled at Hz, and the recording starts at time and lasts for seconds. We have thus frames. In this study, . We additionally assume that the subject’s head is fixed, so major movements between frames are ignored.
2.2 Preprocessing the IR video
We detect the boundaries of the face using the Dlib landmark detector  on the average face location, frame-by-frame detection is not performed since the subject is assumed to be immobile. An example is shown on Figure 1(a). We then divide the area inside the detected face into disjoint regions following a predefined mesh grid.
Denote those disjoint regions as , . For each video frame , , we compute the mean IR value on each region
As a result, we obtain the data matrix
In other words, the -th row of matrix contains a time series with the mean IR activity over region , there are such regions defined across the face. We will refer to these as channels. Note that the constructed data matrix is commonly encountered in spatiotemporal analysis. For the purposes of this study, the face is subdivided into regions using a non-overlapping -pixel grid. See Figure 1(b) for illustration.
To denoise the time series, we apply to each channel in the data matrix an order bandpass Butterworth with cutoff frequencies at and bpm, a range that comfortably acommodates most normal heart rates. Denote the filtered signals as the data matrix . This bandpass filter is chosen based on the physiological knowledge that, for a normal subject, the heart rate is between and bpm.
2.3 Low rank spatiotemporal model
We assume that the IR video captures different physiological dynamics, such as respiration, body movement, and hemodynamics, among others. Denote these physiological sources as , where and . Note that in general and might not be orthogonal when ; for example, the hemodynamics and respiration might be coupled due to the respiratory sinus arrhythmia.
The data matrix is then modeled as a mixture of these source signals with additive and uncorrelated noise
where contains the physiological source signals, is the source mixture matrix,is a scalar constant that describes the noise variance. In other words, the recorded signal on each region, , is a mixture of different sources via , contaminated by noise. We further make the low rank assumption that is fixed and small. This assumption means that there are limited sources of physiological dynamics that are captured by the IR video.
2.4 Determine important sources
Due to the low-rank assumption and the high-dimensional nature of the spatiotemporal model, apply SVD to the data matrix :
consists of the left singular vectors,consists of the right singular vectors, and
consists of singular values. Denote and to be the -th left and right singular vectors respectively. Note that contains the relevant temporal signals that are mixed in each region, and their weight in each spatial location. An example is illustrated in Figure 2.
Denote and denote is the number of singular values such that . Since the noise level is in general not known, we estimate it as proposed in :
where is the median of the Marcenko-Pastur distribution  with parameter . Applying this procedure to reduced the number of non-zero singular values by over on average.
2.5 Reconstructing the non-contact PPG signal
Due to the non-orthogonal nature of physiological sources, we cannot recover directly from by applying the usual blind source separation technique. We thus propose the following procedure to reconstruct the non-contact PPG signal.
Define a signal quality index (SQI) for a signal of length as
is the Fourier transform of the time series, and is the expected heart rate of a normal subject. Note that quantifies how concentrated the time series is around in the frequency domain.
We rank all temporal signals , where , according to their SQIs . Consider the reordering permutation so that . Our hemodynamic estimator, the non-contact PPG signal denoted as , is defined as
for chosen by the user. Here we determine by greedily accumulating the sources until the maximal quality is achieved; that is,
Figure 2 shows an outline of the recovery process for non-contact PPG over the full face. Figure 3 shows the recovered iHR signal when we applied the proposed method to the channels contained in each of the five major facial areas independently, this is provided merely for illustration purposes. In general, using the entire facial area provided the best results. Figure 4 shows short time segments of non-contact PPG compared against ground truth contact PPG. Additional examples will be provided in the following sections.
2.6 Estimation of the instantaneous heart rate
Denote the short time Fourier transform (STFT) of the constructed non-contact PPG signal as , where is the STFT coefficient at time and frequency . From the STFT we extract the dominant curve using the curve extractor proposed in ,
where is a regularization constant. The iHR is thus determined by
Figure 5 shows the obtained and its STFT; ground truth iHR from ECG is also shown for comparison.
We acquired 9 simultaneous ECG, PPG, and IR face video using a standard patient monitor (Philips IntelliVue MP70 Patient Monitor) and a Microsoft Kinect camera. The clocks in the Kinect camera and the patient monitor were synchronised with a time accuracy of . Acquisitions were done over 7 healthy subjects. The subjects were asked to look straight into the camera and maintain a steady posture, but otherwise behave, blink, and breathe normally. The instantaneous heart rate (iHR) was estimated from the IR video using the process described in Section 2. Ground truth iHR was extracted from the ECG signal using the R-peak detection algorithm implemented in the python library biosppy .
For each of the 9 datasets we measured the differences between the recovered iHR signal and ground truth using root mean square error (RMSE) and relative error
|Every 1s||Every 10s||Every 30s||Every 30 s|
In general, RMSE results averaged for longer time-frames () are satisfactory. Perhaps surprisingly, RMSE results for iHR at intervals are also reasonable. Figure 5 shows good correspondence between the ground truth ECG iHR and the STFT of the recovered non-contact PPG.
5 Concluding remarks
In this paper, we extracted non-contact PPG from IR facial video. We showed that a simple, principled method based on matrix decomposition was sufficient to recover instantaneous heart rate with small relative errors on a second-by-second basis when subjects remain relatively stationary.
This suggests the viability of IR for non-contact PPG, particularly when we consider the low-light and varying-light performance of IR in general compared to traditional RGB methods. Additional work is required to adequately and robustly correct for motion artifacts. Improvements can be done on the process by which we combine the singular vectors to obtain our final hemodynamic estimator. Finally, more research needs to be done on the characterization of absorption curves of biological processes of interest in the near infrared spectrum. We leave this physiological research as a future collaborative work. We could also consider more sophisticated time-frequency representation tools to further analyze the obtained non-contact PPG signal for the instantaneous heart rate estimation. A more general manifold learning algorithm and matrix denoise technique can be applied to capture motion and time latency; for example, due to the high dimensional nature of , the matrix can be denoised by the optimal shrinkage algorithm proposed in : , where is the optimal shrinkage under the Frobenius norm. This approach has the potential to further improve the overall quality of the signal. We will explore these possibilities in future work.
-  JA Dawson, COF Kamlin, C Wong, AB Te Pas, M Vento, TJ Cole, SM Donath, SB Hooper, PG Davis, and CJ Morley, “Changes in heart rate in the first minutes after birth,” Archives of Disease in Childhood-Fetal and Neonatal Edition, vol. 95, no. 3, pp. F177–F181, 2010.
-  Aymen A Alian and Kirk H Shelley, “Photoplethysmography,” Best Practice & Research Clinical Anaesthesiology, vol. 28, no. 4, pp. 395–406, 2014.
-  John Allen, “Photoplethysmography and its application in clinical physiological measurement,” Physiological measurement, vol. 28, no. 3, pp. R1, 2007.
-  Wim Verkruysse, Lars O Svaasand, and J Stuart Nelson, “Remote plethysmographic imaging using ambient light.,” Optics express, vol. 16, no. 26, pp. 21434–21445, 2008.
-  Ming-Zher Poh, Daniel J McDuff, and Rosalind W Picard, “Non-contact, automated cardiac pulse measurements using video imaging and blind source separation.,” Optics express, vol. 18, no. 10, pp. 10762–10774, 2010.
-  Maria I Davila, Gregory F Lewis, and Stephen W Porges, “The physiocam: cardiac pulse, continuously monitored by a color video camera,” Journal of Medical Devices, vol. 10, no. 2, pp. 020951, 2016.
-  Ming-Zher Poh, Daniel J McDuff, and Rosalind W Picard, “Advancements in noncontact, multiparameter physiological measurements using a webcam,” IEEE transactions on biomedical engineering, vol. 58, no. 1, pp. 7–11, 2011.
-  Sungjun Kwon, Hyunseok Kim, and Kwang Suk Park, “Validation of heart rate extraction using video imaging on a built-in camera system of a smartphone,” in Engineering in Medicine and Biology Society (EMBC), 2012 Annual International Conference of the IEEE. IEEE, 2012, pp. 2174–2177.
-  Xiaobai Li, Jie Chen, Guoying Zhao, and Matti Pietikainen, “Remote heart rate measurement from face videos under realistic situations,” in , 2014, pp. 4264–4271.
-  Mayank Kumar, Ashok Veeraraghavan, and Ashutosh Sabharwal, “Distanceppg: Robust non-contact vital signs monitoring using a camera,” Biomedical optics express, vol. 6, no. 5, pp. 1565–1588, 2015.
-  Antony Lam and Yoshinori Kuno, “Robust heart rate measurement from video using select random patches,” in Proceedings of the IEEE International Conference on Computer Vision, 2015, pp. 3640–3648.
-  Sergey Tulyakov, Xavier Alameda-Pineda, Elisa Ricci, Lijun Yin, Jeffrey F Cohn, and Nicu Sebe, “Self-adaptive matrix completion for heart rate estimation from face videos under realistic conditions,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 2396–2404.
-  Chen Wang, Thierry Pun, and Guillaume Chanel, “A comparative survey of methods for remote heart rate detection from frontal face videos,” Frontiers in Bioengineering and Biotechnology, vol. 6, 2018.
-  Jie Chen, Zhuoqing Chang, Qiang Qiu, Xiaobai Li, Guillermo Sapiro, Alex Bronstein, and Matti Pietikäinen, “Realsense= real heart rate: Illumination invariant heart rate estimation from videos,” in Image Processing Theory Tools and Applications (IPTA), 2016 6th International Conference on. IEEE, 2016, pp. 1–6.
-  Qi Zhang, Yimin Zhou, Shuang Song, Guoyuan Liang, and Haiyang Ni, “Heart rate extraction based on near-infrared camera: Towards driver state monitoring,” IEEE Access, vol. 6, pp. 33076–33087, 2018.
Davis E. King,
“Dlib-ml: A machine learning toolkit,”Journal of Machine Learning Research, vol. 10, pp. 1755–1758, 2009.
-  David L Donoho and Matan Gavish, “The optimal hard threshold for singular values is 4/√ 3,” arXiv preprint, 2013.
Vladimir Alexandrovich Marchenko and Leonid Andreevich Pastur,
“Distribution of eigenvalues for some sets of random matrices,”Matematicheskii Sbornik, vol. 114, no. 4, pp. 507–536, 1967.
-  Antonio Cicone and Hau-Tieng Wu, “How nonlinear-type time-frequency analysis can help in sensing instantaneous heart rate and instantaneous respiratory rate from photoplethysmography in a reliable way,” Frontiers in Physiology, vol. 8, pp. 701, 2017.
-  Carlos Carreiras, Ana Priscila Alves, André Lourenço, Filipe Canento, Hugo Silva, Ana Fred, et al., “BioSPPy: Biosignal processing in Python,” 2015–, [Online; accessed Jan 29, 2019].
-  Matan Gavish and David L Donoho, “Optimal shrinkage of singular values,” IEEE Transactions on Information Theory, vol. 63, no. 4, pp. 2137–2152, 2017.