Object Detection in Video with Spatiotemporal Sampling Networks

03/15/2018
by   Gedas Bertasius, et al.
0

We propose a Spatiotemporal Sampling Network (STSN) that uses deformable convolutions across time for object detection in videos. Our STSN performs object detection in a video frame by learning to spatially sample features from the adjacent frames. This naturally renders the approach robust to occlusion or motion blur in individual frames. Our framework does not require additional supervision, as it optimizes sampling locations directly with respect to object detection performance. Our STSN outperforms the state-of-the-art on the ImageNet VID dataset and compared to prior video object detection methods it uses a simpler design, and does not require optical flow data for training. We also show that after training STSN on videos, we can adapt it for object detection in images, by adding and training a single deformable convolutional layer on still-image data. This leads to improvements in accuracy compared to traditional object detection in images.

READ FULL TEXT

page 2

page 9

page 10

page 11

page 14

research
03/28/2019

Road User Detection in Videos

Successive frames of a video are highly redundant, and the most popular ...
research
01/31/2015

Max-Margin Object Detection

Most object detection methods operate by applying a binary classifier to...
research
04/14/2022

Deformable Sprites for Unsupervised Video Decomposition

We describe a method to extract persistent elements of a dynamic scene f...
research
10/22/2021

IVS3D: An Open Source Framework for Intelligent Video Sampling and Preprocessing to Facilitate 3D Reconstruction

The creation of detailed 3D models is relevant for a wide range of appli...
research
12/01/2014

Recovering Spatiotemporal Correspondence between Deformable Objects by Exploiting Consistent Foreground Motion in Video

Given unstructured videos of deformable objects, we automatically recove...
research
08/09/2023

Objects do not disappear: Video object detection by single-frame object location anticipation

Objects in videos are typically characterized by continuous smooth motio...
research
03/02/2020

Learned Enrichment of Top-View Grid Maps Improves Object Detection

We propose an object detector for top-view grid maps which is additional...

Please sign up or login with your details

Forgot password? Click here to reset