Unsupervised Action Proposal Ranking through Proposal Recombination

by   Waqas Sultani, et al.
University of Central Florida

Recently, action proposal methods have played an important role in action recognition tasks, as they reduce the search space dramatically. Most unsupervised action proposal methods tend to generate hundreds of action proposals which include many noisy, inconsistent, and unranked action proposals, while supervised action proposal methods take advantage of predefined object detectors (e.g., human detector) to refine and score the action proposals, but they require thousands of manual annotations to train. Given the action proposals in a video, the goal of the proposed work is to generate a few better action proposals that are ranked properly. In our approach, we first divide action proposal into sub-proposal and then use Dynamic Programming based graph optimization scheme to select the optimal combinations of sub-proposals from different proposals and assign each new proposal a score. We propose a new unsupervised image-based actioness detector that leverages web images and employs it as one of the node scores in our graph formulation. Moreover, we capture motion information by estimating the number of motion contours within each action proposal patch. The proposed method is an unsupervised method that neither needs bounding box annotations nor video level labels, which is desirable with the current explosion of large-scale action datasets. Our approach is generic and does not depend on a specific action proposal method. We evaluate our approach on several publicly available trimmed and un-trimmed datasets and obtain better performance compared to several proposal ranking methods. In addition, we demonstrate that properly ranked proposals produce significantly better action detection as compared to state-of-the-art proposal based methods.


page 4

page 11

page 13

page 14

page 15


Automatic Action Annotation in Weakly Labeled Videos

Manual spatio-temporal annotation of human action in videos is laborious...

ProposalCLIP: Unsupervised Open-Category Object Proposal Generation via Exploiting CLIP Cues

Object proposal generation is an important and fundamental task in compu...

Tubelets: Unsupervised action proposals from spatiotemporal super-voxels

This paper considers the problem of localizing actions in videos as a se...

Graph Convolutional Networks for Temporal Action Localization

Most state-of-the-art action localization systems process each action pr...

Learning Temporal Action Proposals With Fewer Labels

Temporal action proposals are a common module in action detection pipeli...

YoTube: Searching Action Proposal via Recurrent and Static Regression Networks

In this paper, we present YoTube-a novel network fusion framework for se...

Object Proposals for Text Extraction in the Wild

Object Proposals is a recent computer vision technique receiving increas...

Please sign up or login with your details

Forgot password? Click here to reset