TTPP: Temporal Transformer with Progressive Prediction for Efficient Action Anticipation

03/07/2020
by   Wen Wang, et al.
0

Video action anticipation aims to predict future action categories from observed frames. Current state-of-the-art approaches mainly resort to recurrent neural networks to encode history information into hidden states, and predict future actions from the hidden representations. It is well known that the recurrent pipeline is inefficient in capturing long-term information which may limit its performance in predication task. To address this problem, this paper proposes a simple yet efficient Temporal Transformer with Progressive Prediction (TTPP) framework, which repurposes a Transformer-style architecture to aggregate observed features, and then leverages a light-weight network to progressively predict future features and actions. Specifically, predicted features along with predicted probabilities are accumulated into the inputs of subsequent prediction. We evaluate our approach on three action datasets, namely TVSeries, THUMOS-14, and TV-Human-Interaction. Additionally we also conduct a comprehensive study for several popular aggregation and prediction strategies. Extensive results show that TTPP not only outperforms the state-of-the-art methods but also more efficient.

READ FULL TEXT

page 1

page 3

page 9

research
12/16/2019

Predicting the Future: A Jointly Learnt Model for Action Anticipation

Inspired by human neurological structures for action anticipation, we pr...
research
04/28/2022

Temporal Progressive Attention for Early Action Prediction

Early action prediction deals with inferring the ongoing action from par...
research
06/03/2021

Anticipative Video Transformer

We propose Anticipative Video Transformer (AVT), an end-to-end attention...
research
10/20/2022

Rethinking Learning Approaches for Long-Term Action Anticipation

Action anticipation involves predicting future actions having observed t...
research
12/06/2018

Video Action Transformer Network

We introduce the Action Transformer model for recognizing and localizing...
research
06/09/2022

GateHUB: Gated History Unit with Background Suppression for Online Action Detection

Online action detection is the task of predicting the action as soon as ...
research
06/01/2020

Temporal Aggregate Representations for Long Term Video Understanding

Future prediction requires reasoning from current and past observations ...

Please sign up or login with your details

Forgot password? Click here to reset