Action Forecasting with Feature-wise Self-Attention

07/19/2021
by   Yan Bin Ng, et al.
0

We present a new architecture for human action forecasting from videos. A temporal recurrent encoder captures temporal information of input videos while a self-attention model is used to attend on relevant feature dimensions of the input space. To handle temporal variations in observed video data, a feature masking techniques is employed. We classify observed actions accurately using an auxiliary classifier which helps to understand what has happened so far. Then the decoder generates actions for the future based on the output of the recurrent encoder and the self-attention model. Experimentally, we validate each component of our architecture where we see that the impact of self-attention to identify relevant feature dimensions, temporal masking, and observed auxiliary classifier. We evaluate our method on two standard action forecasting benchmarks and obtain state-of-the-art results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/15/2020

GTA: Global Temporal Attention for Video Action Understanding

Self-attention learns pairwise interactions via dot products to model lo...
research
01/26/2022

Self-Attention Neural Bag-of-Features

In this work, we propose several attention formulations for multivariate...
research
01/04/2023

Infomaxformer: Maximum Entropy Transformer for Long Time-Series Forecasting Problem

The Transformer architecture yields state-of-the-art results in many tas...
research
02/16/2022

ActionFormer: Localizing Moments of Actions with Transformers

Self-attention based Transformer models have demonstrated impressive res...
research
11/02/2021

Relational Self-Attention: What's Missing in Attention for Video Understanding

Convolution has been arguably the most important feature transform for m...
research
05/19/2022

Cross-Enhancement Transformer for Action Segmentation

Temporal convolutions have been the paradigm of choice in action segment...
research
10/19/2020

SAINT+: Integrating Temporal Features for EdNet Correctness Prediction

We propose SAINT+, a successor of SAINT which is a Transformer based kno...

Please sign up or login with your details

Forgot password? Click here to reset