Traffic accidents have become one of the main non-natural causes of death in today’s society. The World Health Organization (WHO) published a report in  declaring that millions of people die annually all over the world due to traffic accidents, even becoming the main cause of death among young population (those under years old).
Some types of traffic accidents can not be predicted by any manner because they occur due to external factors such as bad weather, roads in poor condition, mechanical issues, etc. However, there is still a high amount of accidents caused by human factors that can be avoided . For example, fatigue is one of the most common causes of accidents, and it is also one of the most preventable. Drivers experiencing fatigue have a decrease in their visual perception, reflexes, and psychomotor skills, and they may even fall asleep while driving.
In order to reduce the number of accidents, driver monitoring has attracted a lot of research attention in the recent years [12, 7, 3]. A driver monitoring system must be able to detect the presence of signals related to fatigue, allowing to take preventive actions to avoid a possible accident. Some of these actions are recommending the driver to stop in a rest area until he is fully recovered, and displaying acoustic and luminous warnings inside the car to keep the driver awake until he can stop.
Driver monitoring systems may follow different ways for achieving their target. Some of them use information about the way the driver is conducting the car, i.e. movements of the steering wheel, status of the pedals, etc . Physiological signals such as the heart rate (HR), the blood pressure, the brain activity, etc, can also be used to detect fatigue in the driver .
A monitoring system capable of estimating physiological components such as the heart rate, or the blood pressure, may present additional benefits. These systems could be able not only to detect signs of fatigue, but also changes in the driver’s general health condition. This kind of monitoring systems allow to acquire and process health information daily and non-intrusively. The captured data can be used to help doctors to make better diagnostics, or even for recommending the driver to visit a practitioner if a potential health issue is detected.
The accurate extraction of physiological signals in a real driving scenario is still a challenge. There exist different approaches depending of the acquisition method, i.e. contact-based and image-based, each one with its own strengths and weaknesses. In this paper we focus in improving the performance of an image-based method by introducing a quality assessment algorithm . The target of this algorithm is selecting the video sequences more favorable to a specific heart rate estimation method, in a kind of quality-based processing .
The rest of this paper is organized as follows: Section 2 introduces driver monitoring techniques, with focus in remote photoplethysmography and its challenges. Section 3 describes the proposed system. Section 4 summarizes the dataset used. Section 5 describes the evaluation protocol and the results obtained. Finally, the concluding remarks and the future work are drawn in Section 6.
2 Driver Monitoring Techniques
|Method||Type of Data||Parameters Extracted||Performance||Target|
|Brandt et al. 2004 ||RGB and NIR Video||Head Motion and Eye Blinking||N/A||Driver Fatigue|
|Shin et al. 2010 ||ECG||Heart Rate||N/A||Driver Fatigue|
Jo et al. 2011 
|NIR Video||Head Pose and Eye Blinking||Accuracy = 98.55%||Driver Drowsiness and Distraction|
|Poh et al. 2011 ||RGB Video||Heart and Breath Rate, HR Variab.||RMSE = 5.63%||Physiological Measurement|
|Jung et al. 2014 ||ECG||Heart Rate||N/A||Driver Drowsiness|
|Tasli et al. 2014 ||RGB Video||Heart Rate, HR Variab.||MAE = 4.2%||Physiological Measurement|
|McDuff et al. 2014 ||RGB-CO Video||Heart and Breath Rate, HR Variab.||Correlation = 1.0||Physiological Measurement|
|Chen et al. 2016 ||RGB and NIR Video||Heart Rate||RMSE = 1.65%||Physiological Measurement|
|Present Work||NIR Video||Heart Rate||MAE = 8.76%||Driver Monitoring|
Early research in driver monitoring was mostly based on acquiring accurate physiological signals from the drivers using contact sensors (e.g. ECG, EEG, or EMG), but this approach may result uncomfortable and impractical in a realistic driving environment. Some parameters that can be obtained this way are the heart rate, respiration, brain activity, muscle activation, corporal temperature, etc. Some works related to this approach are  and .
Contactless approaches are more convenient for its use in real driver monitoring without bothering the driver with cables and other uncomfortable devices. Regarding this approximation, computer vision techniques result really practical since they use images acquired non-invasively from a camera mounted inside the vehicle. These images can be processed to analyze physiological parameters using remote photoplethysmography (rPPG). With this technique it is possible to estimate the heart rate, the oxygen saturation, and other pulse related information using only video sequences.
2.1 Remote Photoplethysmography
Photoplethysmography (PPG)  is a low-cost technique for measuring the cardiovascular Blood Volume Pulse (BVP) through changes in the amount of light reflected or absorbed by human vessels. PPG is often used at hospitals to measure physiological parameters like the heart rate, the blood pressure, or the oxygen saturation. PPG signals are usually be measured with contact sensors often placed at the fingertips, the chest, or the feet. This type of contact measurement may be suitable for a clinic environment, but it can be uncomfortable and inconvenient for daily driver monitoring.
In recent works like , , , and  remote photoplethysmography techniques have been used for measuring physiological signals from face video sequences captured at distance. These works used signal processing techniques for analyzing the images, and looking for slight color and illumination changes related with the BVP. However, using these methods in a real moving vehicle is not straightforward due to all the variability sources present in this type of video sequences. A selection of works related to driver monitoring and photoplethysmography is shown in Table 1.
2.2 Challenges and Proposed Approach
A moving vehicle is not a perfect environment for obtaining high accuracy when using rPPG algorithms. Images acquired in this scenario may present external illumination changes, low illumination levels, noise, movement of the driver, occlusions, and vibrations of the camera due to the movement of the vehicle. All these factors can make the performance of the rPPG algorithms to drop significantly .
In this work we propose a system for pulse estimation for driver monitoring that tries to overcome some of these challenges. We use a NIR camera with active infrared illumination mounted in the dashboard of a real moving car. The NIR spectrum band is highly invariant to ambient light, providing robustness against this external source of variability at a low cost. This also allowed us to extend the application of heart rate estimation to very low illumination environments, e.g. night conditions.
Regarding to the presence of other variability factors such as movement or occlusions, a quality-based approach to rPPG could be adequate . With a short-time analysis, small video segments without enough quality for extracting a robust rPPG signal could be discarded without affecting the global performance of pulse estimation. To accomplish this target, we have proposed a quality metric for short segments of rPPG signals.
Summarizing, in this work: i) we performed pulse estimation using NIR active illumination to be robust to external illumination variability; ii) we proposed a quality metric for classifying short rPPG segments and deciding which ones can be used and which ones should be discarded in order to obtain a robust heart rate estimation; and iii) we compared the performance of a classic rPPG algorithm and our quality-based approach.
3 Proposed System
In this section, we describe the improvements we have done to a baseline rPPG-based heart rate estimation system to increase its performance in a real driving scenario. Classic rPPG systems drastically degrade when facing the variability sources mentioned in the previous sections. This performance problem is caused by the low quality of the extracted rPPG signals which may be affected (in their totality or only in some fragments) by variability sources that the rPPG method does not know how to deal with.
Having this into mind, we thought that computing a quality measure for knowing the amount of variability in each temporal segment of a rPPG signal could be useful for deciding which segments are more suitable for extracting a robust heart rate estimation.
In the next subsection we describe the vanilla rPPG system we used to obtain the baseline results. This method corresponds to the system shown in Figure 1. In the second subsection we describe the addition of a quality metric to the baseline system. That approach is shown in Figure 2. In the third subsection we describe how we have obtained the groundtruth of the heart rate for our experiments.
3.1 Baseline rPPG System
The basic method is based in the one used in , and consists of the next three main steps:
Face detection and ROI tracking: The first step consists in detecting the face of the driver on the first frame of the NIR video. We used the Matlab implementation of the Viola-Jones algorithm. This algorithm is known to perform reasonably well and in real time when dealing with frontal faces, as in our case. After the recognition stage we selected the left cheek as the Region Of Interest (ROI), since it is a zone lowly affected by objects like hats, glasses, beards, or mustaches. The next step consisted of detecting corners inside the ROI for tracking them over time using the Kanade-Lucas-Tomasi algorithm, also implemented in Matlab. If at some point of the video the ROI is lost, the face will be redetected, and after that also the ROI and the corners.
rPPG signal extraction: For each frame from the video, we calculated its raw rPPG value as the averaged intensity of the pixels inside the ROI. The final output for each video sequence is a rPPG temporal signal composed by the concatenation of these averaged intensities.
rPPG postprocessing: We wanted to estimate a HR value each seconds. In order to achieve that target, we extracted windows of
seconds from the rPPG signal, with a stride ofseconds between them. The length of the window () is configurable in order to perform a time dependent analysis. For each window we postprocessed the raw rPPG signal and we obtained an estimation of the HR. This postprocessing method consists of three filters:
Detrending filter: this temporal filter is employed for reducing the stationary part of the rPPG signal, i.e. eliminating the contribution from environmental light and reducing the slow changes in the rPPG level that are not part of the expected pulse signal.
Moving-average filter: this filter is designed to eliminate the random noise on the rPPG signal. That noise may be caused by imperfections on the sensor and inaccuracies in the capturing process. This filter consists in a moving average of the rPPG values (size ).
Band-pass filter: we considered that a regular human heart rate uses to be into the - beats per minute (bpm) range, which corresponds to signals with frequencies between Hz and Hz approximately. All the rPPG frequency components outside that range are unlikely to correspond to the real pulse signal so they are discarded.
After this processing stage we transformed the signal from the time domain to the frequency domain using the Fast Fourier Transform (FFT). Then, we estimated its Power Spectral Density (PSD) distribution. Finally, we searched for the maximum value in that PSD. The frequency correspondent to that maximum is the estimated HR of that specific video segment.
3.2 Proposed Quality-Based Approach
The baseline method is able to obtain robust HR estimations in controlled scenarios without too much variability or noise in the recordings. However, the raw rPPG signals acquired in a realistic driver monitoring scenario use to have high variations due to external illumination changes, and frequent movements of the driver’s head. There are also other sources of noise, e.g. noise inherent to the acquisition sensor.
All the mentioned factors make the performance of the baseline rPPG algorithm to dramatically fall. In order to make it as robust as possible, we decided to develop an new approach, consisting of an evolution of the basic system combined with a quality metric of the raw rPPG signals. A scheme of the proposed quality-based method can be seen in Figure 2.
The target of using the quality metric is selecting the temporal subwindow of seconds with the highest quality from all the subwindows available inside each seconds window. The criteria for determining the best quality consists in looking for the rPPG segment with the less presence of noise, head motion, and external illumination variability, i.e. the rPPG signal closest to one that has been captured with a contact sensor.
In order to compute the quality level, we divided each window into several subwindows of seconds, with a stride of seconds between them (both parameters are configurable). Then we performed the processing of the rPPG signal in the same way done in the baseline system. From each processed rPPG subwindow we extracted several features, and we combined them to obtain a single numerical quality measure () representative of how close is the rPPG signal of each subwindow to one acquired in perfect conditions.
Finally, from each seconds window, we selected the segment of seconds with the highest , and we estimate the user’s HR with that rPPG segment. This way we discarded the rPPG fragments that may be more affected by variability. This value of the HR is used as the final HR estimation for the whole seconds window.
4.1 OMRON Database
We tested our method with a self-collected dataset called OMRON Database. The data in the dataset is composed by Near Infrared (NIR) active videos of the driver’s faces, recorded with a camera mounted in a car dashboard. The images were captured at a sampling rate of fps, and a resolution of pixels. The PPG signals used for the groundtruth were captured using a BVP fingerclip sensor with a sampling rate of Hz, and then downsampled to Hz to synchronize them with the images from the camera.
The dataset is comprised of male users, with different ages, skin tones and some of them wearing glasses. Each participant was in front of the camera during a single session with a different duration for each one. The sessions went from minutes to minutes long. The full database contains images with an average of images for each subject. The recordings try to represent a real driving scenario inside a moving car. They present different types of variability such as head movement, occlusions, car vibration, or external illumination. These variations mean different levels of quality in the estimated rPPG signals. Examples of images from this database can be seen in Figure 3.
In this section we compare the performance of the heart rate estimations obtained using the quality-based rPPG method with the performance obtained using the baseline rPPG method.
5.1 Setting quality parameters and features
|Signal Noise Ratio (SNR)||
|Window Size T||7 seconds|
|Window Stride d||1 second|
|Ratio Peaks (RP)||
|Subwindow Size T’||5, 6, and 7 seconds*|
|Subwindow Stride d’||2 seconds|
|SNR, BW, and RP|
Left: Features extracted to compute the qualityof the rPPG postprocessed signals. Right: Final configuration of the parameters of the quality-based method. *From each window of seconds, we extracted subwindows of , , and seconds of duration, and we selected the one with the highest value.
The quality-based method has several parameters to be configured: the window size , the subwindow size , the window stride , and the subwindow stride . It is also necessary to decide which features to extract from the rPPG signals, as they must contain information about the quality level of each subwindow .
For this work we extracted different features that can give us information about how close/far is a rPPG signal from the one captured in perfect conditions. The features and their descriptions can be seen in Table 2 (left). The final quality metric is computed as the arithmetic mean of these features after normalizing them to the [,] range using a normalization .
Based on our own previous rPPG experiments, we decided to test values of going from seconds to seconds, with second of increment for the loop. From our previous work  we know that seconds was the lowest value that gave good HR estimation with favorable conditions, and using windows longer than seconds did not show to improve the results.
For setting the subwindow duration , we decided to test values going from seconds (limited by the minimum possible size), to the correspondent value in each case. We also incremented the values using a step of second. The stride is set to second in order to give an estimation of the HR for each second of the input video. The stride , i.e. the temporal step between each subwindow, took values going from a minimum of second to a maximum of seconds (when possible), with second of increment. After this initial configuration experiments the best results were obtained for the parameters shown in Table 2 (right).
To compute the performance of heart rate estimations we decided to use the Mean Absolute Error (MAE) between the groundtruth heart rate in beats per minute (bpm), and the one estimated with the rPPG algorithm.
For the final evaluation of both methods (baseline and proposed quality-based), we processed NIR videos of minute duration each one from the OMRON Database. We used the configuration of parameters shown in Table 2 (right) for both methods (only and
in the case of the baseline system). We first computed the Mean Absolute Error (MAE) for each NIR video separately. We did this to have an idea of which videos are working better and which ones are working worse. We also computed the mean and standard deviation of the MAE for the whole evaluation dataset.
|MAE [bpm]||Video Number||1||2||3||4||5||6||7||8||9||10||11|
As can be seen in Table 3, using the quality-based rPPG approach we obtained a MAE value averaged across videos of beats per minute (bpm), and a standard deviation of bpm for the whole evaluation dataset. Compared to this result, the baseline system (without the quality approach) obtained a MAE of bpm and a MAE standard deviation of bpm for the minutes. This difference in the performance represents a relative improvement of a % in the mean value, and of the % in the standard deviation of the Mean Absolute Error.
Table 3 also shows the MAE values for each NIR video of the evaluation dataset, both using the baseline and the quality-based methods. It can be seen that for some specific videos the baseline result is obtaining a more accurate estimation of the HR, but in general, the MAE values obtained using the quality-based approach are lower. The specific cases in which the quality-based method is working worse coincide with those videos with long sequences with high variability, what makes difficult to find clean segments.
In Figure 4 we are showing the quality scores we obtained for a selection of the evaluation videos. We decided to show the distribution of scores from those videos with a MAE value (obtained with the quality-based method) lower than bpm, and those with a MAE value higher than bpm. The histograms show two different distributions, with the best performing videos (i.e., MAE ¡ 8 bpm) presenting a higher mean value of the quality score .
The results of this section evidenced that, at least with the data from the OMRON Dataset, the quality metric has shown to be an effective way to discard segments of video that may impact negatively to the general performance in rPPG, and therefore obtaining an improvement of the global accuracy of HR estimation.
6 Conclusion and Future Work
In this paper we developed a method for improving heart rate (HR) estimation using remote photoplethysmography (rPPG) in challenging scenarios with multiple sources of high-variability and degradation. Our method employs a quality measure to extract a rPPG signal as clean as possible from a specific face video segment, trying to obtain a more robust HR estimation.
Our main motivation is developing robust technology for contactless driver monitoring using computer vision. Therefore, in our experiments we employed Near Infrared (NIR) videos acquired with a camera mounted in a car dashboard. This type of videos present a high number of variability sources such as head movement, external illumination changes, vibration, blur, etc. The target of the quality metric we have proposed consists in estimating the amount of presence of those factors. Even though our experimental framework is around driver monitoring, our methods may find application in other high-variability face-based human-computer interaction scenarios such as mobile video-chat.
We have compared the performance of two different methods for HR estimation using rPPG. The first one consisted in a classic rPPG algorithm. The second method consisted in the same algorithm, but using the quality measure for selecting which video segments present a lower amount of variability. We used those segments for extracting rPPG signals and their associated HR estimations. The quality metric showed to be a reliable estimation of the amount of variability. We achieved better performance in HR estimation using the video segments with the highest possible quality, compared to using all the video frames indistinctly.
Our solution is based on defining the quality
as a combination of hand-crafted features. As future work, other definitions of quality could be also investigated. A different set of features that may correlate more accurately to the presence of noise factors in the rPPG signal can be studied. Training a Deep Neural Network (DNN) for extractingfrom the video sequences is also an interesting possibility. This type of networks may be able of estimating the quality level by learning which factors are more relevant for obtaining robust rPPG signals directly from training data. However, the lack of labeled datasets makes it difficult to train DNNs from scratch, so it would be also beneficial to acquire a larger database. This new database may contain a higher number of users, and it may also present more challenging conditions for testing our quality-based rPPG algorithm, e.g. variant ambient illumination, motion, blur, occlusions, etc.
This work was supported in part by projects BIBECA (RTI2018-101248-B-I00 from MICINN/FEDER), and BioGuard (Ayudas Fundacion BBVA). The work was conducted in part during a research stay of J. H.-O. at the Vision Sensing Laboratory, Sensing Technology Research Center, Technology and Intellectual Property H.Q.,OMRON Corporation, Kyoto, Japan. He is also supported by a PhD Scholarship from UAM.
-  Allen, J.: Photoplethysmography and its application in clinical physiological measurement. Physiological measurement 28(3), R1 (2007)
-  Alonso-Fernandez, F., Fierrez, J., Ortega-Garcia, J.: Quality Measures in Biometric Systems. IEEE Security Privacy 10(6), 52–62 (2012)
-  Awasekar, P., Ravi, M., Doke, S., Shaikh, Z.: Driver fatigue detection and alert system using non-intrusive eye and yawn detection. Int. Journal of Computer Applications 975, 88–87 (2019)
-  Brandt, T., Stemmer, R., Rakotonirainy, A.: Affordable visual driver monitoring system for fatigue and monotony. In: IEEE Int. Conf. on Systems, Man and Cybernetics. pp. 6451–6456 (2004)
-  Chen, J., Chang, Z., Qiu, Q., Li, X., Sapiro, G., Bronstein, A., Pietikäinen, M.: RealSense = real heart rate: Illumination invariant heart rate estimation from videos. In: Image Processing Theory Tools and Applications (IPTA) (2016)
-  Fierrez, J., Morales, A., Vera-Rodriguez, R., Camacho, D.: Multiple classifiers in biometrics. part 2: Trends and challenges. Information Fusion 44, 103–112 (November 2018)
-  Flores, M.J., Armingol, J.M., de la Escalera, A.: Real-time warning system for driver drowsiness detection using visual information. Journal of Intelligent & Robotic Systems 59(2), 103–125 (2010)
-  Hernandez-Ortega, J., Fierrez, J., Morales, A., Tome, P.: Time Analysis of Pulse-based Face Anti-Spoofing in Visible and NIR. In: IEEE CVPR Computer Society Workshop on Biometrics (2018)
-  Jo, J., et al.: Vision-based method for detecting driver drowsiness and distraction in driver monitoring system. Optical Engineering 50(12), 127202 (2011)
-  Jung, S.J., Shin, H.S., Chung, W.Y.: Driver fatigue and drowsiness monitoring system with embedded electrocardiogram sensor on steering wheel. IET Intelligent Transport Systems 8(1), 43–50 (2014)
-  Kang, H.B.: Various approaches for driver and driving behavior monitoring: a review. In: IEEE Int. Conf. on Computer Vision Workshops (2013)
-  Lal, S.K., et al.: Development of an algorithm for an EEG-based driver fatigue countermeasure. Journal of Safety Research 34(3), 321–328 (2003)
-  McDuff, D., Gontarek, S., Picard, R.W.: Improvements in remote cardiopulmonary measurement using a five band digital camera. IEEE Transactions on Biomedical Engineering 61(10), 2593–2601 (2014)
-  Pakgohar, A., Tabrizi, R.S., Khalili, M., Esmaeili, A.: The role of human factor in incidence and severity of road crashes based on the CART and LR regression: a data mining approach. Procedia Computer Science 3, 764–769 (2011)
-  Poh, M.Z., McDuff, D.J., Picard, R.W.: Advancements in noncontact, multiparameter physiological measurements using a webcam. IEEE Transactions on Biomedical Engineering 58(1), 7–11 (2011)
-  Shin, H.S., Jung, S.J., Kim, J.J., Chung, W.Y.: Real time car driver’s condition monitoring system. IEEE Sensors pp. 951–954 (2010)
-  Tasli, H.E., Gudi, A., den Uyl, M.: Remote PPG based vital sign measurement using adaptive facial regions. In: IEEE International Conference on Image Processing (ICIP). pp. 1410–1414 (2014)
-  WHO: Global status report on road safety. https://www.who.int/violence_injury_prevention/road_safety_status/2018/en/ (2018)