ReMotENet: Efficient Relevant Motion Event Detection for Large-scale Home Surveillance Videos

by   Ruichi Yu, et al.

This paper addresses the problem of detecting relevant motion caused by objects of interest (e.g., person and vehicles) in large scale home surveillance videos. The traditional method usually consists of two separate steps, i.e., detecting moving objects with background subtraction running on the camera, and filtering out nuisance motion events (e.g., trees, cloud, shadow, rain/snow, flag) with deep learning based object detection and tracking running on cloud. The method is extremely slow and therefore not cost effective, and does not fully leverage the spatial-temporal redundancies with a pre-trained off-the-shelf object detector. To dramatically speedup relevant motion event detection and improve its performance, we propose a novel network for relevant motion event detection, ReMotENet, which is a unified, end-to-end data-driven method using spatial-temporal attention-based 3D ConvNets to jointly model the appearance and motion of objects-of-interest in a video. ReMotENet parses an entire video clip in one forward pass of a neural network to achieve significant speedup. Meanwhile, it exploits the properties of home surveillance videos, e.g., relevant motion is sparse both spatially and temporally, and enhances 3D ConvNets with a spatial-temporal attention model and reference-frame subtraction to encourage the network to focus on the relevant moving objects. Experiments demonstrate that our method can achieve comparable or event better performance than the object detection based method but with three to four orders of magnitude speedup (up to 20k times) on GPU devices. Our network is efficient, compact and light-weight. It can detect relevant motion on a 15s surveillance video clip within 4-8 milliseconds on a GPU and a fraction of second (0.17-0.39) on a CPU with a model size of less than 1MB.



There are no comments yet.


page 6

page 8

page 10


Automatic detection of moving objects in video surveillance

This work is in the field of video surveillance including motion detecti...

Spatial-Temporal Memory Networks for Video Object Detection

We introduce Spatial-Temporal Memory Networks (STMN) for video object de...

Object Referring in Videos with Language and Human Gaze

We investigate the problem of object referring (OR) i.e. to localize a t...

Fast Object Detection in Compressed Video

Object detection in videos has drawn increasing attention recently since...

Great Ape Detection in Challenging Jungle Camera Trap Footage via Attention-Based Spatial and Temporal Feature Blending

We propose the first multi-frame video object detection framework traine...

Detecting Temporally Consistent Objects in Videos through Object Class Label Propagation

Object proposals for detecting moving or static video objects need to ad...

Felzenszwalb-Baum-Welch: Event Detection by Changing Appearance

We propose a method which can detect events in videos by modeling the ch...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.