Unified Graph Structured Models for Video Understanding

03/29/2021
by   Anurag Arnab, et al.
9

Accurate video understanding involves reasoning about the relationships between actors, objects and their environment, often over long temporal intervals. In this paper, we propose a message passing graph neural network that explicitly models these spatio-temporal relations and can use explicit representations of objects, when supervision is available, and implicit representations otherwise. Our formulation generalises previous structured models for video understanding, and allows us to study how different design choices in graph structure and representation affect the model's performance. We demonstrate our method on two different tasks requiring relational reasoning in videos – spatio-temporal action detection on AVA and UCF101-24, and video scene graph classification on the recent Action Genome dataset – and achieve state-of-the-art results on all three datasets. Furthermore, we show quantitatively and qualitatively how our method is able to more effectively model relationships between relevant entities in the scene.

READ FULL TEXT

page 2

page 3

page 7

page 9

research
05/17/2019

Neural Message Passing on Hybrid Spatio-Temporal Visual and Symbolic Graphs for Video Understanding

Many problems in video understanding require labeling multiple activitie...
research
03/25/2019

Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph

Visual relationship reasoning is a crucial yet challenging task for unde...
research
12/04/2018

Classifying Collisions with Spatio-Temporal Action Graph Networks

Events defined by the interaction of objects in a scene often are of cri...
research
08/28/2019

Explainable Video Action Reasoning via Prior Knowledge and State Transitions

Human action analysis and understanding in videos is an important and ch...
research
09/13/2022

Semantic2Graph: Graph-based Multi-modal Feature Fusion for Action Segmentation in Videos

Video action segmentation and recognition tasks have been widely applied...
research
08/09/2023

Constructing Holistic Spatio-Temporal Scene Graph for Video Semantic Role Labeling

Video Semantic Role Labeling (VidSRL) aims to detect the salient events ...
research
09/17/2020

Dynamic Regions Graph Neural Networks for Spatio-Temporal Reasoning

Graph Neural Networks are perfectly suited to capture latent interaction...

Please sign up or login with your details

Forgot password? Click here to reset