Recent technological achievements have contributed to making vehicles greener, safer and smarter. However, despite all the efforts made regarding safety, the number of people who lose their lives due to road accidents is still rising. According to the who road safety report from 2018 , an average of 3700 people die on the road every day, which amounts to 1.35 million victims of car crashes per year (i.e., the eighth leading cause of death of people of all ages, and the primary cause of death for children and young adults between 5 and 29 years old). The growth in the number of available vehicles on open roads is naturally a contributing factor to the rise of accident occurrences; however, the main reason is distracted driving .
Distracted driving is described as being occupied by any activity which is unnecessary for the task of driving, such as talking or texting on the phone, eating and drinking, talking to people in the vehicle, interacting with the stereo and entertainment or navigation system—i.e., anything that takes attention away from the task of safe driving 
. Based on the who’s source, a driver’s probable distractions are clustered as follows:
Visual distraction: taking the eyes off the road;
Manual distraction: taking the hands of the wheel;
Cognitive distraction: taking the mind off the driving task.
Passive safety systems to combat visual and manual distraction are already widely used in commercial vehicles. These systems track the driver’s eye-gaze. Once the driver looks anywhere other than the road, they are judged to be distracted . The downside of this is that if the driver is looking at the road but daydreaming (a phenomenon known as the mind wandering ), they are misjudged as attentive.
Cognitive distracted driving is a dangerous situation which vehicles should be able to detect to increase road safety. It has been highlighted as one of the issues to resolve in the euroncap 2022 requirements (driver inattentiveness) .
This work proposes to detect the cognitive load of the driver with a novel image-based representation of the driver’s eye-gaze dispersion (see Figure 1), called a heatmap. Features are extracted from this representation and a svm classifier is trained to estimate cognitive distracted driving. Additionally, the designed data collection protocol is presented. Section II details the scientific foundation for the eye movements and the cognitive load, as well as the state-of-the-art method; the following section, III, explains our experimental protocol and the data acquisition process. Then, section IV presents the obtained results, and finally section V presents the conclusion and further discussions.
Ii The State of The Art
Both biological and physiological approaches naturally influence human behavior by nature (aspects of behavior that are inherited) and nurture (aspects of behavior that are acquired). The cognitive approach deals with how people process information and how data is centered on the concept of memory by encoding, storing and retrieving information . Scheme, perception and working memory concepts have been proposed to reveal cognitive processes using physiological behavior.
The Multi-Store Model  proposes that memory consists of a process including a sensory register, stm and ltm. stm is developed as working memory, which is a system for temporarily storing and managing required information to carry out complex cognitive tasks such as learning, reasoning, and comprehension [2, 29].
Cognitive load refers to the used amount of working memory resources. It is a variable which is used to assess and measure the demands on working memory and can be of the following types: intrinsic (relative complexity), extraneous (ineffective or unnecessary) and germane (effective) . With the increased demand on working memory placed by an abundance of novel information or by interactions of present elements, the cognitive load rises.
Existing cognitive load measurement techniques are divided into three categories; self-reports, performance measures, and physiological measures . The self-report method cannot be used as a feature by a real time vehicle application. For performance and physiological measures, numerous clues from different sources contain information about the cognitive load of the driver. For instance, a combination of vehicle data, environment data and the knowledge of the current task is used to estimate the workload placed on the vehicle driver ; the merging of the driver’s eye movement, eye-gaze direction, eye-closure blinking movement, head movement, head position, head orientation, movable facial features and facial temperature image into this method has been proposed . Bio-physiological signals such as driver-facing sensors and relay features such as the hands, fingers, head, eye gaze, feet, facial expression, voice tone, brain activity, heart rate, skin conductance, steering-wheel grip force, muscle activity and skin/body temperature are other signals which could be used for cognitive load estimation . Other methods focus on the brain activity through an eeg by identifying frequency bands which are likely to capture the cognitive load and brain locations related to it ; in contrast, methods based on an ecg assume that heart rhythms, controlled by the autonomic nervous system, can fluctuate with cognitive load  or on the eda .
In addition, the size of the pupils increases in cases of high cognitive load, and the latter also has an impact on blinking speed . In a simulator-based experiment, the cognitive load was detected by the pupil size while the drivers were involved in spoken dialogues . However, the blinking speed and pupil sizes are also influenced by light conditions. In a vehicle application, the cognitive distraction is also been detected by combining steering angle, vehicle speed, gaze location and head heading angle .
Among all these available information sources, our work concentrates on a method which relies on only eye-gaze data. When the driver is distracted and experiences an increasing cognitive load, the rapid, ballistic eye movements—called saccades—of his eyes are altered, and their speed might reveal cognitive distraction. Saccades become quicker and more random with high cognitive load .
Specific eye-related measurements such as blinks, saccades, pupils, and fixations provide a relevant and reliable assessment of cognitive load . An observer’s visual scanning behavior tends to narrow during periods of increased cognitive demand , which is in parallel to the fact that mental tasks produce impairments of spatial gaze concentration and visual-detection . In this work, based on this knowledge, instead of detecting and analyzing all eye-related movements individually, a method which sums all the gaze activity is proposed. Thus, the driver’s eye-gaze vector is projected on an imaginary distant surface. By following the temporal variation of this projection, an image-based representation is created. These shapes are expected to reveal the cognitive distraction of the driver. Similar to our study, Friedman et al.  explored another image-based representation of the movements of eye pupils (without the gaze projection on an imaginary distant surface) and achieved 86.1% accuracy with 3D cnn.
To the best of our knowledge, our method of gaze projection on a distant surface remains original. This method spatially represents all the summed gaze activity, i.e., where the driver looks, and can be extended with additional information, such as through the projection of the positions of other vehicles, pedestrians and road signs on the same imaginary surface (see Section V-A).
Iii-a Cognitive Distraction and Eye Movements
In this work, the link between short-term memory and distraction while driving is explored. Cognitive load, inattention and distraction are three different concepts. Cognitive load refers to the percentage of used resources in working memory, inattention is the state in which the driver is losing attention from the driving task to other secondary tasks, and distraction refers to the involvement of the driver in other tasks. Distraction leads to inattention from a particular task, and this causes a high cognitive load (in a driving task, this is of the germane type).
Therefore, we obtained the following assumption: during neutral driving, the driver has sufficient cognitive resources to explore the environment and performs normal tasks related to driving, such as regularly checking the mirrors, other vehicles, road signs, etc. Among the vestibulo-ocular eye movements (fixations), saccades (rapid, ballistic movements) and smooth pursuits (slower tracking movements) should be observed . However, during distracted driving, the driver has fewer cognitive resources for the driving task; thus, the gaze traces cover a smaller area. As a result, a variation of the eye movements is expected.
Iii-B Experimental Protocol
Iii-B1 Driving Laps
The experimental session was composed of driving two consecutive laps on the same route (see Section III-B2). The first round (Neutral Driving) constituted the baseline, in which the driver performed the driving task naturally. The driver was told to relax and drive carefully. This lap was important as it allowed us to determine the baseline eye-gaze variation of the participants. The second lap (Distracted Driving) was performed immediately after the first one: in the second lap, the driver had to perform secondary tasks (see Section III-B5) designed to cognitively overload them.
Iii-B2 Path and Driving Conditions
An important aspect of the experimental protocol was to recreate driving conditions (road, weather, traffic jams) which were as similar as possible between sessions and for both laps completed by a single participant. Therefore, a highway road near to Bobigny in France was defined as the experimental path for each participant. The speed limit on this highway was constant (90 km/h), and it took 22 minutes to complete a single lap. Driving was performed during the day-time between 10am and 5pm in order to minimize the variation in weather and traffic conditions.
Iii-B3 The Expert
The expert was in charge of the experiment protocol, launching the secondary tasks, annotating events and guiding the driver on the driving path. He was also in charge of momentarily pausing the secondary tasks whenever the road situation became dangerous (i.e., when another vehicle overtook the test vehicle). This expert is called the accompanist in the following sections.
Iii-B4 User Group
Five drivers participated in the data collection protocol. All of them were volunteers working in the automotive industry; however, they were not aware of the purpose of the driving session. All the participants were male, with an average age of 29.4 years.
Iii-B5 Secondary Tasks
The aim of the secondary tasks was to increase the mental workload of the driver. In the literature, distinct secondary tasks have been cited such as foot tapping (secondary task) while learning (primary task) and measuring the rhythmic precision  or measuring the drt while driving . In a simulator-based experiment, drivers had to accomplish visual, manual, auditory, verbal and haptic secondary tasks. Results of the eye-glance analysis showed that the visual drt were more efficient than the other ones . A vehicle oriented study used visuospatial secondary tasks (the participants should visualize the location of this time’s hour and minute hands on the face of an imaginary analog clock) . However, in our study, in order to keep the eye-gaze patterns as neutral as possible, visual and visuospatial secondary tasks were discarded. Immersive and fun secondary tasks have been designed in order to attempt to reach a more natural experimental procedure. The following four games were designed, all for the n-back task strategy. The n-back tasks are cognitively distracting tasks in which the participants have to recall successive instructions. Recalling these successive instructions increases their mental workload . Each game was designed to last four minutes with one minute of pause between them.
Neither Yes nor No: This game was based on avoiding the words ”yes”, ”no” and their alternatives such as ”yeah”, or ”oui”. The accompanist asked successive questions to force the participants to pronounce these words.
In My Trunk There Is: The game consisted of citing ”In my trunk there is” followed by an item’s name. The participant and the accompanist, turn by turn, had to recall all the past objects and add a new one to the list.
Guess Who?: The participant thought about a real or imaginary character and the accompanist tried to determine the identity of the character by asking questions from a mobile application. The participant had to answer the questions correctly.
The 21: The accompanist started to count and stated 1, 2 or 3 digits in numerical order (e.g., 2 digits: 1, 2). The driver followed the numerical order and stated it, and added a different number of digits than the accompanist (e.g., 3 digits: 3, 4, 5). The game continued in this manner; however, it was forbidden to say the number ”21”. When the counter arrived to ”21”, instead of saying ”21”, a new rule had to be added to the game (e.g., do not say multiples of 4) and the counter was reset to zero.
Iii-C Data Acquisition
The position of the vehicle’s interior parts, such as the mirrors and the instrument cluster, were measured and illustrated in a 3D world representation (see objects 2, 3, 4 and 5 in Figure 2).
While driving, the driver was monitored with a nir camera, placed in front of the instrument cluster. This sensor, part of the Valeo dms111https://www.valeo.com/en/driver-monitoring/, extracted the head position and eye position and their direction. These data were also imported to the 3D world representation (see Figure 2). Thus, it was possible to detect if the driver was looking towards one of the objects present in the scene.
In addition, an imaginary plane surface was placed in front of the vehicle as if it were one of the vehicle’s interior parts (object 1 in Figure 2). The eye-gaze vector was projected on this surface, and their intersection point was tracked for a given time window. By following the variations of the intersection point over this surface, image-based representations were generated (see Section III-D). This representation, called a heatmap, was used to detect the cognitive load of the driver.
The vehicle was also equipped with a frontal RGB camera providing an image with a 1280 x 800 pixel resolution. The position and the dimensions of the imaginary surface were set to maximize the junction of this surface with the RGB camera’s field of view and the area in which gaze detection was available. In our vehicle’s configuration, these conditions were met when the virtual wall was placed 4 meters in front of the vehicle (point zero was selected the navigation screen of the car). Then, the vehicle was physically placed in front of a real wall, at the computed distance, and the camera’s field of view was measured in meters (4.15 m x 2.59 m). In conclusion, the first step of the data acquisition process was to detect the location of the projection of the eye-gaze on the 3D imaginary surface (which was 4.15 m x 2.59 m) and convert it to pixels (i.e., 1280 x 800). The generated heatmaps were down-sampled to 640 x 400 to increase computational speed.
The RGB camera was located at the center of the vehicle, whereas the driver was sitting on the front left seat. Thus, the driver’s gaze activity seemed to be concentrated on the left side of the image on the overlays and heatmaps.
Iii-D Heatmap Generation
The heatmap is a data visualization technique used in different studies and solutions. Heatmaps are often used to highlight areas of interest; therefore, we can explore several situations which arise from it. The heatmap (visible in Figure1
was used for both visualization and feature extraction after performing the following steps:
Point acquisition: The timestamped raw intersection points for x and y between the eye-gaze vector and the imaginary surface were the heatmap generator’s input. These data were acquired every 50 ms, if the driver was looking through this imaginary plane (if the driver was not looking through the plane—i.e., checking his phone—see Section III-E3).
Field of view: With the aim of covering the field of view of the driver, a circle of 15 pixels was placed, centered on the intersection points (Figure (b)b). The choice of the circle diameter that represents the gaze fixation was mainly influenced by the pixel dimensions of our heatmaps (640 x 400).
Opacity: After the normalization of the field of view circles, the obtained mask was used to vary the opacity of intersections (see Figure (c)c).
Blurring: Finally, a Gaussian filter was applied to reduce the noise due to the gaze activity and to concentrate on the most explored area (see Figure (d)d).
Iii-E Feature Extraction
Feature engineering was applied on the generated heatmaps in order to reduce the data dimension. From each heatmap, the following feature sets, based on their pixel intensities and shape, were extracted:
Iii-E1 Appearance Features
The pixel intensity variation of a heatmap contains information on the area checked by the driver. The histogram is an efficient tool to visualize the data distributions.
During distracted driving, it is expected that we see a higher concentration on higher intensities than during natural driving, as the driver should cover a wider area, it is expected that the histogram should exhibit a shift towards low-intensity bins. Hence, a six-bin-histogram of the pixel number in terms of pixel intensity is generated per heatmap (see Figure 4).
Iii-E2 Geometric Features
Beyond the raw pixel intensities, during distracted driving, the dispersion of the gaze activity is expected to vary differently on the abscissa and ordinate axes. Thus, their geometric form also has to be considered. The generated heatmap is divided into contours according to the differences in pixel intensities: blobs (see Figure 5).
In order to understand the information about the driver’s gaze dispersion across the imaginary plane, the following features are extracted as statistical measures from all blobs:
Iii-E3 Looking Ahead Confidence
If the driver does not always look through the imaginary plane during the heatmap generation time window (i.e., they are engaging in activities such as checking their phone) or if the camera is not able to detect the driver’s gaze (i.e., the driver might cover the camera with his arm while manipulating the steering wheel), the observation will contain less relevant data. Therefore, the information regarding how much time the driver spent looking ahead is another feature which determines the quality of that heatmap, called lac.
Finally, all the extracted features are standardized by removing the mean and scaling to unit variance per heatmap.
Iii-F Classifier Training
A supervised binary classification algorithm, based on the svm, is trained with the extracted features. Data collected during neutral driving have been annotated as neutral and data collected during the secondary tasks have been annotated as distracted.
The classification is validated through a stratified k-fold cross-validation technique, with 10 iterations (). The leave-one-driver out technique is used to ensure the test data are always different from the training data. Stratification seeks to ensure that each fold is representative of all strata of the data, which aims to ensure that each class is equally represented across each test fold and consists of splitting the data set into samples.
Iv-a Shape Visualization
In accordance with the initial expectations, the variation of the obtained shapes is visually different between neutral and distracted driving (see Figure 6, columns a and b). These shapes occupy a wide area in neutral driving, as the driver checks his environment often. However, in the presence of cognitive distraction, the covered area narrows as the driver fixates more on a single zone.
For a heatmap gained by longer observation times, a better separable visual pattern is obtained. This is due to the fact that with a longer observation time, the driver has more time to explore his environment in neutral driving, whereas in distracted driving, as he often fixates on a narrowed zone, observing for a longer period does not greatly change the heatmap. As a result, the difference between neutral and distracted driving patterns becomes more obvious with a longer observation time (see window size in Figure 6). Nevertheless, safety-oriented solutions should warn of dangerous situations as quickly as possible.
The relationships between the observed window size and the classification result are presented in Table I. A window of 5 seconds achieved 63% accuracy, whereas a window of 60 seconds achieved 85% of accuracy. These results are in accordance with the expectations based on the previous heatmap observations (see Figure 6).
The presented results were obtained by averages of scores from 10 random training–testing splits (stratified k-fold cross validation) in which the subjects in the training sets were always distinct from the subjects in the testing set to prevent over-fitting. The confusion matrix obtained by averaging these folds, based on heatmaps of 30 seconds, is presented in TableII.
The field of human-centered artificial intelligence is tackling its current issues and aims to increasingly assist humans in their daily life. Specifically, intelligent systems are now part of vehicles and assist the driver to increase road safety.
In this work, we have investigated the problem of the detection of the high cognitive load of drivers through an image-based representation created by tracking the driver’s eye-gaze projection on an imaginary plane surface (heatmaps).
The variation of the obtained shapes revealed the driver’s cognitive distraction. These shapes occupy a wide area in neutral driving, as the driver checks his environment often. However, in the presence of cognitive distraction, the covered area narrows, as the driver fixates more on a single zone (see Figure 6).
The trained svm-based classifiers achieved 85.2% accuracy; thus, the proposed method has good discriminative power between neutral and distracted driving scenarios.
For a heatmap obtained by longer observation times, a better separable visual pattern was obtained. Nevertheless, safety-oriented solutions should warn of dangerous situations as quickly as possible. Thus, a window size compromise should be selected between the algorithmic performance and alerting time. For the real participants, we selected two classifiers working in parallel, with different window sizes. The first one classified with a short window size to warn of problems as fast as possible (), and the second one used a long window size in order not to miss any dangerous situations ().
V-a Further Discussions
Future work in this context should involve increasing the participant numbers and collecting more data; however, the scientific background, the obtained heatmap shapes for neutral and distracted driving and the used validation technique shows that this result could be generalized to a wider population.
Further studies should also include other road types and conditions, as in this work, the driver’s cognitive load estimation was studied only under similar conditions (on highway roads with speed limited to 90 km/h, during the day-time, with low traffic and good weather conditions). Once more data are collected, further studies should investigate cnn-based classifiers, and ablation tests per feature set should be presented.
Due to the end-user’s needs, modern vehicles are equipped with only a single central nir camera. In parallel with this demand, our method is based on a single central nir camera. However, multiple cameras would open the possibility of implementing a wider and curved imaginary surface, which would increase the data availability.
The 3D View (see Figure 2) extracts other gaze-related features such as the mirror checking frequency. These data should also be added to the feature set.
Finally, the real positions of other vehicles, pedestrians and road signs could be taken into account in the heatmap creation process; additionally, we could change the weights for specific zones in the heatmap. Figure 7 shows an uncommon case, in which the expected heatmap would be different from the default ones.
The authors would like to thank Kevin Nguyen for his help over the entire course of the project, to Omar Islas-Ramirez for reviewing this article, to the students from Sorbonne University—Antoine Monod, Rodolphe Sencarric, William Taing, Anes Yahiaoui—and to all collaborators—Vincenzo Varano, Joao Rebocho, Philippe Andrianavalona, Emmanuel Doucet et Gabriel De-Roquefeuil.
=0mu plus 1mu
-  (1977) HUMAN memory: a proposed system and its control processes. In Human Memory, G. BOWER (Ed.), pp. 7 – 113. External Links: Cited by: §II.
-  (1974) Working memory. In Psychology of Learning and Motivation, G. H. Bower (Ed.), Vol. 8, pp. 47 – 89. External Links: Cited by: §II.
-  Cited by: §II.
-  (2016) Analyzing cognitive workload through eye related measurements: a meta-analysis. Master’s Thesis, Wright State University, Department of Biomedical, Industrial, and Human Factors Engineering. External Links: Cited by: §II.
-  Cited by: §II.
-  (2017-09) Euro NCAP 2025 Roadmap, In Pursuit of Vision Zero. Technical report Euro NCAP. External Links: Cited by: §I.
-  (2018) Cognitive load estimation in the wild. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, CHI ’18, New York, NY, USA. External Links: Cited by: §II.
-  (2018) What does the n-back task measure as we get older? relations between working-memory measures and other cognitive functions across the lifespan. Frontiers in Psychology 9, pp. 2208. External Links: Cited by: §III-B5.
-  (2015-06) Eye glance analysis of the surrogate tests for driver distraction. In Proceedings of the Eighth International Driving Symposium on Human Factors in Driver Assessment, Training and Vehicle Design, pp. 141–147. External Links: Cited by: §III-B5.
-  (2016) Measurement of cognitive load in hci systems using eeg power spectrum: an experimental study. Procedia Computer Science 84, pp. 70 – 78. Note: Proceeding of the Seventh International Conference on Intelligent Human Computer Interaction (IHCI 2015) External Links: Cited by: §II.
-  (2016) Detection of driver cognitive distraction: an svm based real-time algorithm and its comparison study in typical driving scenarios. In 2016 IEEE Intelligent Vehicles Symposium (IV), Vol. , pp. 394–399. External Links: Cited by: §II, §III-B5.
-  (2014) Cognitive psychology: classic edition. Psychology Press & Routledge Classic Editions, Taylor & Francis. External Links: Cited by: §II.
-  (2018-06) Towards a multimodal multisensory cognitive assessment framework. In 2018 IEEE 31st International Symposium on Computer-Based Medical Systems (CBMS), Vol. , pp. 24–29. External Links: Cited by: §II.
-  (2017) Vehicle driver monitoring – sleepiness and cognitive load. Technical report VTI. Note: VTI rapport 937A External Links: Cited by: §II, §III-B5.
-  (2018) Global status report on road safety 2018: summary. Technical documents World Health Organization, World Health Organization. External Links: Cited by: §I.
-  (2019-02) The evolution of cognitive load theory and the measurement of its intrinsic, extraneous and germane loads: a review. pp. 23–48. External Links: Cited by: §II.
-  (2010) Estimating cognitive load using remote eye tracking in a driving simulator. In Proceedings of the 2010 Symposium on Eye-Tracking Research & Applications, ETRA ’10, New York, NY, USA, pp. 141–144. External Links: Cited by: §II.
-  (2015) The rhythm method: a new method for measuring cognitive load—an experimental dual-task study. Applied Cognitive Psychology 29 (2), pp. 232–243. External Links: Cited by: §III-B5.
-  (2018) Detecting drivers’ cognitive load from saccadic intrusion.. Transportation Research Part F: Traffic Psychology and Behaviour 54, pp. 63–78. External Links: Cited by: §II.
-  (2001) Types of eye movements and their functions. 2 edition, Neuroscience, Sunderland. External Links: Cited by: §III-A.
-  (2003) Mental workload while driving: effects on visual search, discrimination, and decision making.. Journal of experimental psychology. Applied 9 2, pp. 119–37. External Links: Cited by: §II.
-  (2013-11) Wireless ambulatory ecg signal capture for hrv and cognitive load study using the neuromonitor platform. In 2013 6th International IEEE/EMBS Conference on Neural Engineering (NER), Vol. , pp. 497–500. External Links: Cited by: §II.
-  (2014) Chapter one - the middle way: finding the balance between mindfulness and mind-wandering. In Psychology of Learning and Motivation, B. H. Ross (Ed.), Vol. 60, pp. 1 – 33. External Links: Cited by: §I.
-  Cited by: §II.
-  Cited by: §I.
-  Policy statement and compiled faqs on distracted driving. Note: Accessed: 2020-01-03 External Links: Cited by: §I.
-  (2018) National highway traffic safety administration: distracted driving. Note: Accessed: 2020-01-03 External Links: Cited by: §I.
-  (2018-12-01) Pupil dilation as an index of effort in cognitive control tasks: a review. Psychonomic Bulletin & Review 25 (6), pp. 2005–2015. External Links: Cited by: §II.
-  (2005-06) Cognitive load theory and complex learning: recent developments and future directions. Educational Psychology Review 17, pp. 147–177. External Links: Cited by: §II.
-  (2014) The sensitivity of different methodologies for characterizing drivers’ gaze concentration under increased cognitive demand. Transportation Research Part F: Traffic Psychology and Behaviour 26, pp. 227 – 237. External Links: Cited by: §II.