SAFCAR: Structured Attention Fusion for Compositional Action Recognition

12/03/2020
by   Tae Soo Kim, et al.
0

We present a general framework for compositional action recognition – i.e. action recognition where the labels are composed out of simpler components such as subjects, atomic-actions and objects. The main challenge in compositional action recognition is that there is a combinatorially large set of possible actions that can be composed using basic components. However, compositionality also provides a structure that can be exploited. To do so, we develop and test a novel Structured Attention Fusion (SAF) self-attention mechanism to combine information from object detections, which capture the time-series structure of an action, with visual cues that capture contextual information. We show that our approach recognizes novel verb-noun compositions more effectively than current state of the art systems, and it generalizes to unseen action categories quite efficiently from only a few labeled examples. We validate our approach on the challenging Something-Else tasks from the Something-Something-V2 dataset. We further show that our framework is flexible and can generalize to a new domain by showing competitive results on the Charades-Fewshot dataset.

READ FULL TEXT

page 1

page 4

research
05/04/2023

Modelling Spatio-Temporal Interactions for Compositional Action Recognition

Humans have the natural ability to recognize actions even if the objects...
research
12/20/2019

Something-Else: Compositional Action Recognition with Spatial-Temporal Interaction Networks

Human action is naturally compositional: humans can easily recognize and...
research
07/04/2022

Disentangled Action Recognition with Knowledge Bases

Action in video usually involves the interaction of human with objects. ...
research
04/01/2021

Motion Guided Attention Fusion to Recognize Interactions from Videos

We present a dual-pathway approach for recognizing fine-grained interact...
research
07/03/2020

Egocentric Action Recognition by Video Attention and Temporal Context

We present the submission of Samsung AI Centre Cambridge to the CVPR2020...
research
03/22/2018

Towards Universal Representation for Unseen Action Recognition

Unseen Action Recognition (UAR) aims to recognise novel action categorie...
research
10/01/2019

Action Anticipation for Collaborative Environments: The Impact of Contextual Information and Uncertainty-Based Prediction

For effectively interacting with humans in collaborative environments, m...

Please sign up or login with your details

Forgot password? Click here to reset