Audio-visual Representation Learning for Anomaly Events Detection in Crowds

by   Junyu Gao, et al.

In recent years, anomaly events detection in crowd scenes attracts many researchers' attention, because of its importance to public safety. Existing methods usually exploit visual information to analyze whether any abnormal events have occurred due to only visual sensors are generally equipped in public places. However, when an abnormal event in crowds occurs, sound information may be discriminative to assist the crowd analysis system to determine whether there is an abnormality. Compare with vision information that is easily occluded, audio signals have a certain degree of penetration. Thus, this paper attempt to exploit multi-modal learning for modeling the audio and visual signals simultaneously. To be specific, we design a two-branch network to model different types of information. The first is a typical 3D CNN model to extract temporal appearance features from video clips. The second is an audio CNN for encoding Log Mel-Spectrogram of audio signals. Finally, by fusing the above features, a more accurate prediction will be produced. We conduct the experiments on SHADE dataset, a synthetic audio-visual dataset in surveillance scenes, and find introducing audio signals effectively improves the performance of anomaly events detection and outperforms other state-of-the-art methods. Furthermore, we will release the code and the pre-trained models as soon as possible.


page 1

page 2

page 3

page 7

page 8


Audio-visual scene classification via contrastive event-object alignment and semantic-based fusion

Previous works on scene classification are mainly based on audio or visu...

Joint Detection and Recounting of Abnormal Events by Learning Deep Generic Knowledge

This paper addresses the problem of joint detection and recounting of ab...

Watch, Listen and Tell: Multi-modal Weakly Supervised Dense Event Captioning

Multi-modal learning, particularly among imaging and linguistic modaliti...

HumBug Zooniverse: a crowd-sourced acoustic mosquito dataset

Mosquitoes are the only known vector of malaria, which leads to hundreds...

AED-Net: An Abnormal Event Detection Network

It is challenging to detect the anomaly in crowded scenes for quite a lo...

DCAR: A Discriminative and Compact Audio Representation to Improve Event Detection

This paper presents a novel two-phase method for audio representation, D...

Focus or Not: A Baseline for Anomaly Event Detection On the Open Public Places with Satellite Images

In recent years, monitoring the world wide area with satellite images ha...

Please sign up or login with your details

Forgot password? Click here to reset