Memory-and-Anticipation Transformer for Online Action Understanding

08/15/2023
by   Jiahao Wang, et al.
0

Most existing forecasting systems are memory-based methods, which attempt to mimic human forecasting ability by employing various memory mechanisms and have progressed in temporal modeling for memory dependency. Nevertheless, an obvious weakness of this paradigm is that it can only model limited historical dependence and can not transcend the past. In this paper, we rethink the temporal dependence of event evolution and propose a novel memory-anticipation-based paradigm to model an entire temporal structure, including the past, present, and future. Based on this idea, we present Memory-and-Anticipation Transformer (MAT), a memory-anticipation-based approach, to address the online action detection and anticipation tasks. In addition, owing to the inherent superiority of MAT, it can process online action detection and anticipation tasks in a unified manner. The proposed MAT model is tested on four challenging benchmarks TVSeries, THUMOS'14, HDD, and EPIC-Kitchens-100, for online action detection and anticipation tasks, and it significantly outperforms all existing methods. Code is available at https://github.com/Echo0125/Memory-and-Anticipation-Transformer.

READ FULL TEXT

page 1

page 3

page 9

research
11/08/2022

SimOn: A Simple Framework for Online Temporal Action Localization

Online Temporal Action Localization (On-TAL) aims to immediately provide...
research
11/18/2018

Temporal Recurrent Networks for Online Action Detection

Most work on temporal action detection is formulated in an offline manne...
research
01/11/2019

Anticipation and next action forecasting in video: an end-to-end model with memory

Action anticipation and forecasting in videos do not require a hat-trick...
research
08/22/2022

InstanceFormer: An Online Video Instance Segmentation Framework

Recent transformer-based offline video instance segmentation (VIS) appro...
research
01/20/2022

MeMViT: Memory-Augmented Multiscale Vision Transformer for Efficient Long-Term Video Recognition

While today's video recognition systems parse snapshots or short clips a...
research
06/21/2021

OadTR: Online Action Detection with Transformers

Most recent approaches for online action detection tend to apply Recurre...
research
07/23/2023

ComPtr: Towards Diverse Bi-source Dense Prediction Tasks via A Simple yet General Complementary Transformer

Deep learning (DL) has advanced the field of dense prediction, while gra...

Please sign up or login with your details

Forgot password? Click here to reset