COMPOSER: Compositional Learning of Group Activity in Videos

12/11/2021
by   Honglu Zhou, et al.
14

Group Activity Recognition (GAR) detects the activity performed by a group of actors in a short video clip. The task requires the compositional understanding of scene entities and relational reasoning between them. We approach GAR by modeling the video as a series of tokens that represent the multi-scale semantic concepts in the video. We propose COMPOSER, a Multiscale Transformer based architecture that performs attention-based reasoning over tokens at each scale and learns group activity compositionally. In addition, we only use the keypoint modality which reduces scene biases and improves the generalization ability of the model. We improve the multi-scale representations in COMPOSER by clustering the intermediate scale representations, while maintaining consistent cluster assignments between scales. Finally, we use techniques such as auxiliary prediction and novel data augmentations (e.g., Actor Dropout) to aid model training. We demonstrate the model's strength and interpretability on the challenging Volleyball dataset. COMPOSER achieves a new state-of-the-art 94.5 accuracy with the keypoint-only modality. COMPOSER outperforms the latest GAR methods that rely on RGB signals, and performs favorably compared against methods that exploit multiple modalities. Our code will be available.

READ FULL TEXT

page 8

page 15

page 16

page 17

page 18

page 19

page 20

page 21

research
08/27/2019

Temporal Reasoning Graph for Activity Recognition

Despite great success has been achieved in activity analysis, it still h...
research
03/11/2023

DECOMPL: Decompositional Learning with Attention Pooling for Group Activity Recognition from a Single Volleyball Image

Group Activity Recognition (GAR) aims to detect the activity performed b...
research
12/16/2018

Towards Robust Human Activity Recognition from RGB Video Stream with Limited Labeled Data

Human activity recognition based on video streams has received numerous ...
research
04/23/2019

Learning Actor Relation Graphs for Group Activity Recognition

Modeling relation between actors is important for recognizing group acti...
research
04/05/2022

Detector-Free Weakly Supervised Group Activity Recognition

Group activity recognition is the task of understanding the activity con...
research
08/18/2020

AssembleNet++: Assembling Modality Representations via Attention Connections

We create a family of powerful video models which are able to: (i) learn...
research
10/03/2022

Extending Compositional Attention Networks for Social Reasoning in Videos

We propose a novel deep architecture for the task of reasoning about soc...

Please sign up or login with your details

Forgot password? Click here to reset