VideoCapsuleNet: A Simplified Network for Action Detection

05/21/2018
by   Kevin Duarte, et al.
0

The recent advances in Deep Convolutional Neural Networks (DCNNs) have shown extremely good results for video human action classification, however, action detection is still a challenging problem. The current action detection approaches follow a complex pipeline which involves multiple tasks such as tube proposals, optical flow, and tube classification. In this work, we present a more elegant solution for action detection based on the recently developed capsule network. We propose a 3D capsule network for videos, called VideoCapsuleNet: a unified network for action detection which can jointly perform pixel-wise action segmentation along with action classification. The proposed network is a generalization of capsule network from 2D to 3D, which takes a sequence of video frames as input. The 3D generalization drastically increases the number of capsules in the network, making capsule routing computationally expensive. We introduce capsule-pooling in the convolutional capsule layer to address this issue which makes the voting algorithm tractable. The routing-by-agreement in the network inherently models the action representations and various action characteristics are captured by the predicted capsules. This inspired us to utilize the capsules for action localization and the class-specific capsules predicted by the network are used to determine a pixel-wise localization of actions. The localization is further improved by parameterized skip connections with the convolutional capsule layers and the network is trained end-to-end with a classification as well as localization loss. The proposed network achieves sate-of-the-art performance on multiple action detection datasets including UCF-Sports, J-HMDB, and UCF-101 (24 classes) with an impressive 20 improvement on J-HMDB in terms of v-mAP scores.

READ FULL TEXT

page 7

page 11

page 16

page 17

research
12/02/2018

Multi-modal Capsule Routing for Actor and Action Video Segmentation Conditioned on Natural Language Queries

In this paper, we propose an end-to-end capsule network for pixel level ...
research
06/12/2018

Capsule Routing for Sound Event Detection

The detection of acoustic scenes is a challenging problem in which envir...
research
11/30/2017

An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos

In this paper, we propose an end-to-end 3D CNN for action detection and ...
research
03/29/2022

ME-CapsNet: A Multi-Enhanced Capsule Networks with Routing Mechanism

Convolutional Neural Networks need the construction of informative featu...
research
04/11/2021

Deformable Capsules for Object Detection

Capsule networks promise significant benefits over convolutional network...
research
04/09/2019

Assessing Capsule Networks With Biased Data

Machine learning based methods achieves impressive results in object cla...
research
10/26/2018

Capsule-Forensics: Using Capsule Networks to Detect Forged Images and Videos

Recent advances in media generation techniques have made it easier for a...

Please sign up or login with your details

Forgot password? Click here to reset