Vision-Language Models can Identify Distracted Driver Behavior from Naturalistic Videos

06/16/2023
by   Md. Zahid Hasan, et al.
0

Recognizing the activities, causing distraction, in real-world driving scenarios is critical for ensuring the safety and reliability of both drivers and pedestrians on the roadways. Conventional computer vision techniques are typically data-intensive and require a large volume of annotated training data to detect and classify various distracted driving behaviors, thereby limiting their efficiency and scalability. We aim to develop a generalized framework that showcases robust performance with access to limited or no annotated training data. Recently, vision-language models have offered large-scale visual-textual pretraining that can be adapted to task-specific learning like distracted driving activity recognition. Vision-language pretraining models, such as CLIP, have shown significant promise in learning natural language-guided visual representations. This paper proposes a CLIP-based driver activity recognition approach that identifies driver distraction from naturalistic driving images and videos. CLIP's vision embedding offers zero-shot transfer and task-based finetuning, which can classify distracted activities from driving video data. Our results show that this framework offers state-of-the-art performance on zero-shot transfer and video-based CLIP for predicting the driver's state on two public datasets. We propose both frame-based and video-based frameworks developed on top of the CLIP's visual representation for distracted driving detection and classification task and report the results.

READ FULL TEXT

page 1

page 3

page 5

page 10

page 12

page 14

page 15

research
04/05/2023

VicTR: Video-conditioned Text Representations for Activity Recognition

Vision-Language models have shown strong performance in the image-domain...
research
04/13/2023

DeepSegmenter: Temporal Action Localization for Detecting Anomalies in Untrimmed Naturalistic Driving Videos

Identifying unusual driving behaviors exhibited by drivers during drivin...
research
07/20/2023

Ethosight: A Reasoning-Guided Iterative Learning System for Nuanced Perception based on Joint-Embedding Contextual Label Affinity

Traditional computer vision models often require extensive manual effort...
research
11/30/2020

Driver Behavior Extraction from Videos in Naturalistic Driving Datasets with 3D ConvNets

Naturalistic driving data (NDD) is an important source of information to...
research
05/04/2023

Generating Virtual On-body Accelerometer Data from Virtual Textual Descriptions for Human Activity Recognition

The development of robust, generalized models in human activity recognit...
research
03/02/2022

TransDARC: Transformer-based Driver Activity Recognition with Latent Space Feature Calibration

Traditional video-based human activity recognition has experienced remar...
research
08/31/2023

Expanding Frozen Vision-Language Models without Retraining: Towards Improved Robot Perception

Vision-language models (VLMs) have shown powerful capabilities in visual...

Please sign up or login with your details

Forgot password? Click here to reset