Learning to Localize Actions from Moments

08/31/2020
by   Fuchen Long, et al.
0

With the knowledge of action moments (i.e., trimmed video clips that each contains an action instance), humans could routinely localize an action temporally in an untrimmed video. Nevertheless, most practical methods still require all training videos to be labeled with temporal annotations (action category and temporal boundary) and develop the models in a fully-supervised manner, despite expensive labeling efforts and inapplicable to new categories. In this paper, we introduce a new design of transfer learning type to learn action localization for a large set of action categories, but only on action moments from the categories of interest and temporal annotations of untrimmed videos from a small set of action classes. Specifically, we present Action Herald Networks (AherNet) that integrate such design into an one-stage action localization framework. Technically, a weight transfer function is uniquely devised to build the transformation between classification of action moments or foreground video segments and action localization in synthetic contextual moments or untrimmed videos. The context of each moment is learnt through the adversarial mechanism to differentiate the generated features from those of background in untrimmed videos. Extensive experiments are conducted on the learning both across the splits of ActivityNet v1.3 and from THUMOS14 to ActivityNet v1.3. Our AherNet demonstrates the superiority even comparing to most fully-supervised action localization methods. More remarkably, we train AherNet to localize actions from 600 categories on the leverage of action moments in Kinetics-600 and temporal annotations from 200 classes in ActivityNet v1.3. Source code and data are available at <https://github.com/FuchenUSTC/AherNet>.

READ FULL TEXT
research
05/24/2021

FineAction: A Fined Video Dataset for Temporal Action Localization

On the existing benchmark datasets, THUMOS14 and ActivityNet, temporal a...
research
10/20/2021

Few-Shot Temporal Action Localization with Query Adaptive Transformer

Existing temporal action localization (TAL) works rely on a large number...
research
03/25/2022

Unsupervised Pre-training for Temporal Action Localization Tasks

Unsupervised video representation learning has made remarkable achieveme...
research
02/16/2022

When Did It Happen? Duration-informed Temporal Localization of Narrated Actions in Vlogs

We consider the task of temporal human action localization in lifestyle ...
research
07/05/2022

MVP: Robust Multi-View Practice for Driving Action Localization

Distracted driving causes thousands of deaths per year, and how to apply...
research
09/09/2019

Gaussian Temporal Awareness Networks for Action Localization

Temporally localizing actions in a video is a fundamental challenge in v...
research
02/10/2016

DAP3D-Net: Where, What and How Actions Occur in Videos?

Action parsing in videos with complex scenes is an interesting but chall...

Please sign up or login with your details

Forgot password? Click here to reset