DeepAI AI Chat
Log In Sign Up

YoTube: Searching Action Proposal via Recurrent and Static Regression Networks

by   Hongyuan Zhu, et al.
Beihang University
Agency for Science, Technology and Research
Peking University
MINES ParisTech

In this paper, we present YoTube-a novel network fusion framework for searching action proposals in untrimmed videos, where each action proposal corresponds to a spatialtemporal video tube that potentially locates one human action. Our method consists of a recurrent YoTube detector and a static YoTube detector, where the recurrent YoTube explores the regression capability of RNN for candidate bounding boxes predictions using learnt temporal dynamics and the static YoTube produces the bounding boxes using rich appearance cues in a single frame. Both networks are trained using rgb and optical flow in order to fully exploit the rich appearance, motion and temporal context, and their outputs are fused to produce accurate and robust proposal boxes. Action proposals are finally constructed by linking these boxes using dynamic programming with a novel trimming method to handle the untrimmed video effectively and efficiently. Extensive experiments on the challenging UCF-101 and UCF-Sports datasets show that our proposed technique obtains superior performance compared with the state-of-the-art.


Searching Action Proposals via Spatial Actionness Estimation and Temporal Path Inference and Tracking

In this paper, we address the problem of searching action proposals in u...

An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos

In this paper, we propose an end-to-end 3D CNN for action detection and ...

Deep Learning for Detecting Multiple Space-Time Action Tubes in Videos

In this work, we propose an approach to the spatiotemporal localisation ...

Tubelets: Unsupervised action proposals from spatiotemporal super-voxels

This paper considers the problem of localizing actions in videos as a se...

Appearance Fusion of Multiple Cues for Video Co-localization

This work addresses a problem named video co-localization that aims at l...

TraMNet - Transition Matrix Network for Efficient Action Tube Proposals

Current state-of-the-art methods solve spatiotemporal action localisatio...

Unsupervised Action Proposal Ranking through Proposal Recombination

Recently, action proposal methods have played an important role in actio...