An Attention-guided Multistream Feature Fusion Network for Localization of Risky Objects in Driving Videos

by   Muhammad Monjurul Karim, et al.

Detecting dangerous traffic agents in videos captured by vehicle-mounted dashboard cameras (dashcams) is essential to facilitate safe navigation in a complex environment. Accident-related videos are just a minor portion of the driving video big data, and the transient pre-accident processes are highly dynamic and complex. Besides, risky and non-risky traffic agents can be similar in their appearance. These make risky object localization in the driving video particularly challenging. To this end, this paper proposes an attention-guided multistream feature fusion network (AM-Net) to localize dangerous traffic agents from dashcam videos. Two Gated Recurrent Unit (GRU) networks use object bounding box and optical flow features extracted from consecutive video frames to capture spatio-temporal cues for distinguishing dangerous traffic agents. An attention module coupled with the GRUs learns to attend to the traffic agents relevant to an accident. Fusing the two streams of features, AM-Net predicts the riskiness scores of traffic agents in the video. In supporting this study, the paper also introduces a benchmark dataset called Risky Object Localization (ROL). The dataset contains spatial, temporal, and categorical annotations with the accident, object, and scene-level attributes. The proposed AM-Net achieves a promising performance of 85.73 outperforms current state-of-the-art for video anomaly detection by 6.3 the DoTA dataset. A thorough ablation study further reveals AM-Net's merits by evaluating the contributions of its different components.


page 1

page 5

page 6

page 8


A Dynamic Spatial-temporal Attention Network for Early Anticipation of Traffic Accidents

Recently, autonomous vehicles and those equipped with an Advanced Driver...

When, Where, and What? A New Dataset for Anomaly Detection in Driving Videos

Video anomaly detection (VAD) has been extensively studied. However, res...

Fusion-GRU: A Deep Learning Model for Future Bounding Box Prediction of Traffic Agents in Risky Driving Videos

To ensure the safe and efficient navigation of autonomous vehicles and a...

MaskRNN: Instance Level Video Object Segmentation

Instance level video object segmentation is an important technique for v...

Spatial-temporal Fusion Convolutional Neural Network for Simulated Driving Behavior Recognition

Abnormal driving behaviour is one of the leading cause of terrible traff...

A Memory-Augmented Multi-Task Collaborative Framework for Unsupervised Traffic Accident Detection in Driving Videos

Identifying traffic accidents in driving videos is crucial to ensuring t...

Towards Spatio-Temporal Video Scene Text Detection via Temporal Clustering

With only bounding-box annotations in the spatial domain, existing video...

Please sign up or login with your details

Forgot password? Click here to reset