Detector-Free Weakly Supervised Group Activity Recognition

04/05/2022
by   Dongkeun Kim, et al.
0

Group activity recognition is the task of understanding the activity conducted by a group of people as a whole in a multi-person video. Existing models for this task are often impractical in that they demand ground-truth bounding box labels of actors even in testing or rely on off-the-shelf object detectors. Motivated by this, we propose a novel model for group activity recognition that depends neither on bounding box labels nor on object detector. Our model based on Transformer localizes and encodes partial contexts of a group activity by leveraging the attention mechanism, and represents a video clip as a set of partial context embeddings. The embedding vectors are then aggregated to form a single group representation that reflects the entire context of an activity while capturing temporal evolution of each partial context. Our method achieves outstanding performance on two benchmarks, Volleyball and NBA datasets, surpassing not only the state of the art trained with the same level of supervision, but also some of existing models relying on stronger supervision.

READ FULL TEXT

page 1

page 3

page 4

page 8

page 14

page 15

research
07/09/2016

Hierarchical Deep Temporal Models for Group Activity Recognition

In this paper we present an approach for classifying the activity perfor...
research
08/09/2021

Pose is all you need: The pose only group activity recognition system (POGARS)

We introduce a novel deep learning based group activity recognition appr...
research
07/31/2018

Attention is All We Need: Nailing Down Object-centric Attention for Egocentric Activity Recognition

In this paper we propose an end-to-end trainable deep neural network mod...
research
06/12/2015

Deep Structured Models For Group Activity Recognition

This paper presents a deep neural-network-based hierarchical graphical m...
research
05/09/2023

Group Activity Recognition via Dynamic Composition and Interaction

Previous group activity recognition approaches were limited to reasoning...
research
12/11/2021

COMPOSER: Compositional Learning of Group Activity in Videos

Group Activity Recognition (GAR) detects the activity performed by a gro...
research
02/06/2015

Visual Recognition by Counting Instances: A Multi-Instance Cardinality Potential Kernel

Many visual recognition problems can be approached by counting instances...

Please sign up or login with your details

Forgot password? Click here to reset