Log In Sign Up

ReMotENet: Efficient Relevant Motion Event Detection for Large-scale Home Surveillance Videos

by   Ruichi Yu, et al.

This paper addresses the problem of detecting relevant motion caused by objects of interest (e.g., person and vehicles) in large scale home surveillance videos. The traditional method usually consists of two separate steps, i.e., detecting moving objects with background subtraction running on the camera, and filtering out nuisance motion events (e.g., trees, cloud, shadow, rain/snow, flag) with deep learning based object detection and tracking running on cloud. The method is extremely slow and therefore not cost effective, and does not fully leverage the spatial-temporal redundancies with a pre-trained off-the-shelf object detector. To dramatically speedup relevant motion event detection and improve its performance, we propose a novel network for relevant motion event detection, ReMotENet, which is a unified, end-to-end data-driven method using spatial-temporal attention-based 3D ConvNets to jointly model the appearance and motion of objects-of-interest in a video. ReMotENet parses an entire video clip in one forward pass of a neural network to achieve significant speedup. Meanwhile, it exploits the properties of home surveillance videos, e.g., relevant motion is sparse both spatially and temporally, and enhances 3D ConvNets with a spatial-temporal attention model and reference-frame subtraction to encourage the network to focus on the relevant moving objects. Experiments demonstrate that our method can achieve comparable or event better performance than the object detection based method but with three to four orders of magnitude speedup (up to 20k times) on GPU devices. Our network is efficient, compact and light-weight. It can detect relevant motion on a 15s surveillance video clip within 4-8 milliseconds on a GPU and a fraction of second (0.17-0.39) on a CPU with a model size of less than 1MB.


page 6

page 8

page 10


Automatic detection of moving objects in video surveillance

This work is in the field of video surveillance including motion detecti...

Spatial-Temporal Memory Networks for Video Object Detection

We introduce Spatial-Temporal Memory Networks (STMN) for video object de...

TempNet: Temporal Attention Towards the Detection of Animal Behaviour in Videos

Recent advancements in cabled ocean observatories have increased the qua...

Fast Object Detection in Compressed Video

Object detection in videos has drawn increasing attention recently since...

Decoupled Spatial-Temporal Transformer for Video Inpainting

Video inpainting aims to fill the given spatiotemporal holes with realis...

Retrieval in Long Surveillance Videos using User Described Motion and Object Attributes

We present a content-based retrieval method for long surveillance videos...