Making a Case for 3D Convolutions for Object Segmentation in Videos

08/26/2020
by   Sabarinath Mahadevan, et al.
9

The task of object segmentation in videos is usually accomplished by processing appearance and motion information separately using standard 2D convolutional networks, followed by a learned fusion of the two sources of information. On the other hand, 3D convolutional networks have been successfully applied for video classification tasks, but have not been leveraged as effectively to problems involving dense per-pixel interpretation of videos compared to their 2D convolutional counterparts and lag behind the aforementioned networks in terms of performance. In this work, we show that 3D CNNs can be effectively applied to dense video prediction tasks such as salient object segmentation. We propose a simple yet effective encoder-decoder network architecture consisting entirely of 3D convolutions that can be trained end-to-end using a standard cross-entropy loss. To this end, we leverage an efficient 3D encoder, and propose a 3D decoder architecture, that comprises novel 3D Global Convolution layers and 3D Refinement modules. Our approach outperforms existing state-of-the-arts by a large margin on the DAVIS'16 Unsupervised, FBMS and ViSal dataset benchmarks in addition to being faster, thus showing that our architecture can efficiently learn expressive spatio-temporal features and produce high quality video segmentation masks. Our code and models will be made publicly available.

READ FULL TEXT

page 2

page 5

page 6

page 15

research
11/30/2017

An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos

In this paper, we propose an end-to-end 3D CNN for action detection and ...
research
11/15/2021

D^2Conv3D: Dynamic Dilated Convolutions for Object Segmentation in Videos

Despite receiving significant attention from the research community, the...
research
07/21/2019

An Efficient 3D CNN for Action/Object Segmentation in Video

Convolutional Neural Network (CNN) based image segmentation has made gre...
research
06/30/2023

Obscured Wildfire Flame Detection By Temporal Analysis of Smoke Patterns Captured by Unmanned Aerial Systems

This research paper addresses the challenge of detecting obscured wildfi...
research
05/23/2021

Coarse to Fine Multi-Resolution Temporal Convolutional Network

Temporal convolutional networks (TCNs) are a commonly used architecture ...
research
02/22/2023

Video-SwinUNet: Spatio-temporal Deep Learning Framework for VFSS Instance Segmentation

This paper presents a deep learning framework for medical video segmenta...
research
05/13/2020

Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA

Videos convey rich information. Dynamic spatio-temporal relationships be...

Please sign up or login with your details

Forgot password? Click here to reset