An Efficient 3D CNN for Action/Object Segmentation in Video

07/21/2019
by   Rui Hou, et al.
7

Convolutional Neural Network (CNN) based image segmentation has made great progress in recent years. However, video object segmentation remains a challenging task due to its high computational complexity. Most of the previous methods employ a two-stream CNN framework to handle spatial and motion features separately. In this paper, we propose an end-to-end encoder-decoder style 3D CNN to aggregate spatial and temporal information simultaneously for video object segmentation. To efficiently process video, we propose 3D separable convolution for the pyramid pooling module and decoder, which dramatically reduces the number of operations while maintaining the performance. Moreover, we also extend our framework to video action segmentation by adding an extra classifier to predict the action label for actors in videos. Extensive experiments on several video datasets demonstrate the superior performance of the proposed approach for action and object segmentation compared to the state-of-the-art.

READ FULL TEXT

page 7

page 9

research
03/30/2017

Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos

Deep learning has been demonstrated to achieve excellent results for ima...
research
11/30/2017

An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos

In this paper, we propose an end-to-end 3D CNN for action detection and ...
research
11/24/2017

Deep Extreme Cut: From Extreme Points to Object Segmentation

This paper explores the use of extreme points in an object (left-most, r...
research
08/26/2020

Making a Case for 3D Convolutions for Object Segmentation in Videos

The task of object segmentation in videos is usually accomplished by pro...
research
05/04/2023

ItoV: Efficiently Adapting Deep Learning-based Image Watermarking to Video Watermarking

Robust watermarking tries to conceal information within a cover image/vi...
research
12/08/2019

VM-Net: Mesh Modeling to Assist Segmentation in Volumetric Data

CNN-based volumetric methods that label individual voxels now dominate t...
research
05/22/2017

TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Action segmentation as a milestone towards building automatic systems to...

Please sign up or login with your details

Forgot password? Click here to reset