LAP-Net: Adaptive Features Sampling via Learning Action Progression for Online Action Detection

by   Sanqing Qu, et al.

Online action detection is a task with the aim of identifying ongoing actions from streaming videos without any side information or access to future frames. Recent methods proposed to aggregate fixed temporal ranges of invisible but anticipated future frames representations as supplementary features and achieved promising performance. They are based on the observation that human beings often detect ongoing actions by contemplating the future vision simultaneously. However, we observed that at different action progressions, the optimal supplementary features should be obtained from distinct temporal ranges instead of simply fixed future temporal ranges. To this end, we introduce an adaptive features sampling strategy to overcome the mentioned variable-ranges of optimal supplementary features. Specifically, in this paper, we propose a novel Learning Action Progression Network termed LAP-Net, which integrates an adaptive features sampling strategy. At each time step, this sampling strategy first estimates current action progression and then decide what temporal ranges should be used to aggregate the optimal supplementary features. We evaluated our LAP-Net on three benchmark datasets, TVSeries, THUMOS-14 and HDD. The extensive experiments demonstrate that with our adaptive feature sampling strategy, the proposed LAP-Net can significantly outperform current state-of-the-art methods with a large margin.


page 1

page 3

page 8


Learning to Discriminate Information for Online Action Detection

From a streaming video, online action detection aims to identify actions...

Learning to Discriminate Information for Online Action Detection: Analysis and Application

Online action detection, which aims to identify an ongoing action from a...

Prioritized Subnet Sampling for Resource-Adaptive Supernet Training

A resource-adaptive supernet adjusts its subnets for inference to fit th...

Predicting the Future: A Jointly Learnt Model for Action Anticipation

Inspired by human neurological structures for action anticipation, we pr...

RMS-Net: Regression and Masking for Soccer Event Spotting

The recently proposed action spotting task consists in finding the exact...

Anticipative Video Transformer

We propose Anticipative Video Transformer (AVT), an end-to-end attention...

Online Action Detection in Untrimmed, Streaming Videos - Modeling and Evaluation

The goal of Online Action Detection (OAD) is to detect action in a timel...