Convolutional Transformer based Dual Discriminator Generative Adversarial Networks for Video Anomaly Detection

by   Xinyang Feng, et al.

Detecting abnormal activities in real-world surveillance videos is an important yet challenging task as the prior knowledge about video anomalies is usually limited or unavailable. Despite that many approaches have been developed to resolve this problem, few of them can capture the normal spatio-temporal patterns effectively and efficiently. Moreover, existing works seldom explicitly consider the local consistency at frame level and global coherence of temporal dynamics in video sequences. To this end, we propose Convolutional Transformer based Dual Discriminator Generative Adversarial Networks (CT-D2GAN) to perform unsupervised video anomaly detection. Specifically, we first present a convolutional transformer to perform future frame prediction. It contains three key components, i.e., a convolutional encoder to capture the spatial information of the input video clips, a temporal self-attention module to encode the temporal dynamics, and a convolutional decoder to integrate spatio-temporal features and predict the future frame. Next, a dual discriminator based adversarial training procedure, which jointly considers an image discriminator that can maintain the local consistency at frame-level and a video discriminator that can enforce the global coherence of temporal dynamics, is employed to enhance the future frame prediction. Finally, the prediction error is used to identify abnormal video frames. Thoroughly empirical studies on three public video anomaly detection datasets, i.e., UCSD Ped2, CUHK Avenue, and Shanghai Tech Campus, demonstrate the effectiveness of the proposed adversarial spatio-temporal modeling framework.


page 4

page 7

page 8


Spatio-Temporal-based Context Fusion for Video Anomaly Detection

Video anomaly detection aims to discover abnormal events in videos, and ...

STAN: Spatio-Temporal Adversarial Networks for Abnormal Event Detection

In this paper, we propose a novel abnormal event detection method with s...

Ano-Graph: Learning Normal Scene Contextual Graphs to Detect Video Anomalies

Video anomaly detection has proved to be a challenging task owing to its...

LMVP: Video Predictor with Leaked Motion Information

We propose a Leaked Motion Video Predictor (LMVP) to predict future fram...

Video Anomaly Detection via Prediction Network with Enhanced Spatio-Temporal Memory Exchange

Video anomaly detection is a challenging task because most anomalies are...

Learning Temporal Regularity in Video Sequences

Perceiving meaningful activities in a long video sequence is a challengi...

Multi-Contextual Predictions with Vision Transformer for Video Anomaly Detection

Video Anomaly Detection(VAD) has been traditionally tackled in two main ...

Please sign up or login with your details

Forgot password? Click here to reset