Deformable Tube Network for Action Detection in Videos

07/03/2019
by   Wei Li, et al.
2

We address the problem of spatio-temporal action detection in videos. Existing methods commonly either ignore temporal context in action recognition and localization, or lack the modelling of flexible shapes of action tubes. In this paper, we propose a two-stage action detector called Deformable Tube Network (DTN), which is composed of a Deformation Tube Proposal Network (DTPN) and a Deformable Tube Recognition Network (DTRN) similar to the Faster R-CNN architecture. In DTPN, a fast proposal linking algorithm (FTL) is introduced to connect region proposals across frames to generate multiple deformable action tube proposals. To perform action detection, we design a 3D convolution network with skip connections for tube classification and regression. Modelling action proposals as deformable tubes explicitly considers the shape of action tubes compared to 3D cuboids. Moreover, 3D convolution based recognition network can learn temporal dynamics sufficiently for action detection. Our experimental results show that we significantly outperform the methods with 3D cuboids and obtain the state-of-the-art results on both UCF-Sports and AVA datasets.

READ FULL TEXT

page 4

page 9

page 11

page 12

research
06/29/2018

YH Technologies at ActivityNet Challenge 2018

This notebook paper presents an overview and comparative analysis of our...
research
11/20/2018

A Proposal-Based Solution to Spatio-Temporal Action Detection in Untrimmed Videos

Existing approaches for spatio-temporal action detection in videos are l...
research
03/05/2023

Deformable Proposal-Aware P2PNet: A Universal Network for Cell Recognition under Point Supervision

Point-based cell recognition, which aims to localize and classify cells ...
research
10/21/2014

Compositional Structure Learning for Action Understanding

The focus of the action understanding literature has predominately been ...
research
12/01/2021

Graph Convolutional Module for Temporal Action Localization in Videos

Temporal action localization has long been researched in computer vision...
research
04/16/2021

Spatiotemporal Deformable Models for Long-Term Complex Activity Detection

Long-term complex activity recognition and localisation can be crucial f...
research
09/09/2019

Gaussian Temporal Awareness Networks for Action Localization

Temporally localizing actions in a video is a fundamental challenge in v...

Please sign up or login with your details

Forgot password? Click here to reset