Long Short-Term Transformer for Online Action Detection

07/07/2021
by   Mingze Xu, et al.
18

In this paper, we present Long Short-term TRansformer (LSTR), a new temporal modeling algorithm for online action detection, by employing a long- and short-term memories mechanism that is able to model prolonged sequence data. It consists of an LSTR encoder that is capable of dynamically exploiting coarse-scale historical information from an extensively long time window (e.g., 2048 long-range frames of up to 8 minutes), together with an LSTR decoder that focuses on a short time window (e.g., 32 short-range frames of 8 seconds) to model the fine-scale characterization of the ongoing event. Compared to prior work, LSTR provides an effective and efficient method to model long videos with less heuristic algorithm design. LSTR achieves significantly improved results on standard online action detection benchmarks, THUMOS'14, TVSeries, and HACS Segment, over the existing state-of-the-art approaches. Extensive empirical analysis validates the setup of the long- and short-term memories and the design choices of LSTR.

READ FULL TEXT

page 9

page 11

research
06/30/2021

Long-Short Temporal Modeling for Efficient Action Recognition

Efficient long-short temporal modeling is key for enhancing the performa...
research
08/07/2023

TempFuser: Learning Tactical and Agile Flight Maneuvers in Aerial Dogfights using a Long Short-Term Temporal Fusion Transformer

Aerial dogfights necessitate understanding the tactically changing maneu...
research
03/02/2022

Colar: Effective and Efficient Online Action Detection by Consulting Exemplars

Online action detection has attracted increasing research interests in r...
research
05/26/2023

Detect Any Shadow: Segment Anything for Video Shadow Detection

Segment anything model (SAM) has achieved great success in the field of ...
research
02/21/2023

Memory-augmented Online Video Anomaly Detection

The ability to understand the surrounding scene is of paramount importan...
research
08/30/2022

A Circular Window-based Cascade Transformer for Online Action Detection

Online action detection aims at the accurate action prediction of the cu...
research
03/24/2018

Multi-range Reasoning for Machine Comprehension

We propose MRU (Multi-Range Reasoning Units), a new fast compositional e...

Please sign up or login with your details

Forgot password? Click here to reset