Videos as Space-Time Region Graphs

06/05/2018
by   Xiaolong Wang, et al.
2

How do humans recognize the action "opening a book" ? We argue that there are two important cues: modeling temporal shape dynamics and modeling functional relationships between humans and objects. In this paper, we propose to represent videos as space-time region graphs which capture these two important cues. Our graph nodes are defined by the object region proposals from different frames in a long range video. These nodes are connected by two types of relations: (i) similarity relations capturing the long range dependencies between correlated objects and (ii) spatial-temporal relations capturing the interactions between nearby objects. We perform reasoning on this graph representation via Graph Convolutional Networks. We achieve state-of-the-art results on both Charades and Something-Something datasets. Especially for Charades, we obtain a huge 4.4 environments.

READ FULL TEXT

page 1

page 7

page 8

research
08/19/2021

Spatio-Temporal Interaction Graph Parsing Networks for Human-Object Interaction Recognition

For a given video-based Human-Object Interaction scene, modeling the spa...
research
04/11/2019

Recurrent Space-time Graphs for Video Understanding

Visual learning in the space-time domain remains a very challenging prob...
research
12/17/2021

Distillation of Human-Object Interaction Contexts for Action Recognition

Modeling spatial-temporal relations is imperative for recognizing human ...
research
09/07/2021

Improving Phenotype Prediction using Long-Range Spatio-Temporal Dynamics of Functional Connectivity

The study of functional brain connectivity (FC) is important for underst...
research
07/04/2022

GraphVid: It Only Takes a Few Nodes to Understand a Video

We propose a concise representation of videos that encode perceptually m...
research
05/22/2023

GEST: the Graph of Events in Space and Time as a Common Representation between Vision and Language

One of the essential human skills is the ability to seamlessly build an ...
research
12/13/2018

Dynamic Graph Modules for Modeling Higher-Order Interactions in Activity Recognition

Video action recognition, as a critical problem towards video understand...

Please sign up or login with your details

Forgot password? Click here to reset