Improving Action Localization by Progressive Cross-stream Cooperation

05/28/2019
by   Rui Su, et al.
0

Spatio-temporal action localization consists of three levels of tasks: spatial localization, action classification, and temporal segmentation. In this work, we propose a new Progressive Cross-stream Cooperation (PCSC) framework to use both region proposals and features from one stream (i.e. Flow/RGB) to help another stream (i.e. RGB/Flow) to iteratively improve action localization results and generate better bounding boxes in an iterative fashion. Specifically, we first generate a larger set of region proposals by combining the latest region proposals from both streams, from which we can readily obtain a larger set of labelled training samples to help learn better action detection models. Second, we also propose a new message passing approach to pass information from one stream to another stream in order to learn better representations, which also leads to better action detection models. As a result, our iterative framework progressively improves action localization results at the frame level. To improve action localization results at the video level, we additionally propose a new strategy to train class-specific actionness detectors for better temporal segmentation, which can be readily learnt by focusing on "confusing" samples from the same action class. Comprehensive experiments on two benchmark datasets UCF-101-24 and J-HMDB demonstrate the effectiveness of our newly proposed approaches for spatio-temporal action localization in realistic scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/05/2015

Learning to track for spatio-temporal action localization

We propose an effective approach for spatio-temporal action localization...
research
04/19/2019

STEP: Spatio-Temporal Progressive Learning for Video Action Detection

In this paper, we propose Spatio-TEmporal Progressive (STEP) action dete...
research
11/30/2017

An End-to-end 3D Convolutional Neural Network for Action Detection and Segmentation in Videos

In this paper, we propose an end-to-end 3D CNN for action detection and ...
research
08/19/2020

CFAD: Coarse-to-Fine Action Detector for Spatiotemporal Action Localization

Most current pipelines for spatio-temporal action localization connect f...
research
07/19/2017

Detecting Parts for Action Localization

In this paper, we propose a new framework for action localization that t...
research
10/20/2021

GTM: Gray Temporal Model for Video Recognition

Data input modality plays an important role in video action recognition....
research
08/17/2020

Video Region Annotation with Sparse Bounding Boxes

Video analysis has been moving towards more detailed interpretation (e.g...

Please sign up or login with your details

Forgot password? Click here to reset