A Comparative Analysis of Decision-Level Fusion for Multimodal Driver Behaviour Understanding

by   Alina Roitberg, et al.

Visual recognition inside the vehicle cabin leads to safer driving and more intuitive human-vehicle interaction but such systems face substantial obstacles as they need to capture different granularities of driver behaviour while dealing with highly limited body visibility and changing illumination. Multimodal recognition mitigates a number of such issues: prediction outcomes of different sensors complement each other due to different modality-specific strengths and weaknesses. While several late fusion methods have been considered in previously published frameworks, they constantly feature different architecture backbones and building blocks making it very hard to isolate the role of the chosen late fusion strategy itself. This paper presents an empirical evaluation of different paradigms for decision-level late fusion in video-based driver observation. We compare seven different mechanisms for joining the results of single-modal classifiers which have been both popular, (e.g. score averaging) and not yet considered (e.g. rank-level fusion) in the context of driver observation evaluating them based on different criteria and benchmark settings. This is the first systematic study of strategies for fusing outcomes of multimodal predictors inside the vehicles, conducted with the goal to provide guidance for fusion scheme selection.


Robust Multiview Multimodal Driver Monitoring System Using Masked Multi-Head Self-Attention

Driver Monitoring Systems (DMSs) are crucial for safe hand-over actions ...

Multimodal Fusion Using Deep Learning Applied to Driver's Referencing of Outside-Vehicle Objects

There is a growing interest in more intelligent natural user interaction...

Multimodal Driver Referencing: A Comparison of Pointing to Objects Inside and Outside the Vehicle

Advanced in-cabin sensing technologies, especially vision based approach...

TransDARC: Transformer-based Driver Activity Recognition with Latent Space Feature Calibration

Traditional video-based human activity recognition has experienced remar...

You Have a Point There: Object Selection Inside an Automobile Using Gaze, Head Pose and Finger Pointing

Sophisticated user interaction in the automotive industry is a fast emer...

Temporal Multimodal Fusion for Driver Behavior Prediction Tasks using Gated Recurrent Fusion Units

The Tactical Driver Behavior modeling problem requires understanding of ...

ML-PersRef: A Machine Learning-based Personalized Multimodal Fusion Approach for Referencing Outside Objects From a Moving Vehicle

Over the past decades, the addition of hundreds of sensors to modern veh...

Please sign up or login with your details

Forgot password? Click here to reset