D^2Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

11/15/2021
by   Christian Schmidt, et al.
10

Despite receiving significant attention from the research community, the task of segmenting and tracking objects in monocular videos still has much room for improvement. Existing works have simultaneously justified the efficacy of dilated and deformable convolutions for various image-level segmentation tasks. This gives reason to believe that 3D extensions of such convolutions should also yield performance improvements for video-level segmentation tasks. However, this aspect has not yet been explored thoroughly in existing literature. In this paper, we propose Dynamic Dilated Convolutions (D^2Conv3D): a novel type of convolution which draws inspiration from dilated and deformable convolutions and extends them to the 3D (spatio-temporal) domain. We experimentally show that D^2Conv3D can be used to improve the performance of multiple 3D CNN architectures across multiple video segmentation related benchmarks by simply employing D^2Conv3D as a drop-in replacement for standard convolutions. We further show that D^2Conv3D out-performs trivial extensions of existing dilated and deformable convolutions to 3D. Lastly, we set a new state-of-the-art on the DAVIS 2016 Unsupervised Video Object Segmentation benchmark. Code is made publicly available at https://github.com/Schmiddo/d2conv3d .

READ FULL TEXT

page 4

page 5

page 7

page 13

research
04/06/2020

Deformable 3D Convolution for Video Super-Resolution

The spatio-temporal information among video sequences is significant for...
research
04/23/2021

Skip-Convolutions for Efficient Video Processing

We propose Skip-Convolutions to leverage the large amount of redundancie...
research
08/26/2020

Making a Case for 3D Convolutions for Object Segmentation in Videos

The task of object segmentation in videos is usually accomplished by pro...
research
03/12/2022

Deformable VisTR: Spatio temporal deformable attention for video instance segmentation

Video instance segmentation (VIS) task requires classifying, segmenting,...
research
09/25/2022

BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video

Multiple existing benchmarks involve tracking and segmenting objects in ...
research
02/12/2021

Densely Deformable Efficient Salient Object Detection Network

Salient Object Detection (SOD) domain using RGB-D data has lately emerge...
research
02/19/2021

Adaptable Deformable Convolutions for Semantic Segmentation of Fisheye Images in Autonomous Driving Systems

Advanced Driver-Assistance Systems rely heavily on perception tasks such...

Please sign up or login with your details

Forgot password? Click here to reset