PAT: Position-Aware Transformer for Dense Multi-Label Action Detection

by   Faegheh Sardari, et al.
University of Surrey

We present PAT, a transformer-based network that learns complex temporal co-occurrence action dependencies in a video by exploiting multi-scale temporal features. In existing methods, the self-attention mechanism in transformers loses the temporal positional information, which is essential for robust action detection. To address this issue, we (i) embed relative positional encoding in the self-attention mechanism and (ii) exploit multi-scale temporal relationships by designing a novel non hierarchical network, in contrast to the recent transformer-based approaches that use a hierarchical structure. We argue that joining the self-attention mechanism with multiple sub-sampling processes in the hierarchical approaches results in increased loss of positional information. We evaluate the performance of our proposed approach on two challenging dense multi-label benchmark datasets, and show that PAT improves the current state-of-the-art result by 1.1 MultiTHUMOS datasets, respectively, thereby achieving the new state-of-the-art mAP at 26.5 studies to examine the impact of the different components of our proposed network.


page 1

page 4


Label Attention Network for sequential multi-label classification

Multi-label classification is a natural problem statement for sequential...

Class Semantics-based Attention for Action Detection

Action localization networks are often structured as a feature encoder s...

Transformer-based Detection of Microorganisms on High-Resolution Petri Dish Images

Many medical or pharmaceutical processes have strict guidelines regardin...

AntPivot: Livestream Highlight Detection via Hierarchical Attention Mechanism

In recent days, streaming technology has greatly promoted the developmen...

Frame-Level Multi-Label Playing Technique Detection Using Multi-Scale Network and Self-Attention Mechanism

Instrument playing technique (IPT) is a key element of musical presentat...

MUSE: Multi-Scale Temporal Features Evolution for Knowledge Tracing

Transformer based knowledge tracing model is an extensively studied prob...

Modeling Multi-Label Action Dependencies for Temporal Action Localization

Real-world videos contain many complex actions with inherent relationshi...

Please sign up or login with your details

Forgot password? Click here to reset