Visual Relationship Forecasting in Videos

07/02/2021
by   Li Mi, et al.
0

Real-world scenarios often require the anticipation of object interactions in unknown future, which would assist the decision-making process of both humans and agents. To meet this challenge, we present a new task named Visual Relationship Forecasting (VRF) in videos to explore the prediction of visual relationships in a reasoning manner. Specifically, given a subject-object pair with H existing frames, VRF aims to predict their future interactions for the next T frames without visual evidence. To evaluate the VRF task, we introduce two video datasets named VRF-AG and VRF-VidOR, with a series of spatio-temporally localized visual relation annotations in a video. These two datasets densely annotate 13 and 35 visual relationships in 1923 and 13447 video clips, respectively. In addition, we present a novel Graph Convolutional Transformer (GCT) framework, which captures both object-level and frame-level dependencies by spatio-temporal Graph Convolution Network and Transformer. Experimental results on both VRF-AG and VRF-VidOR datasets demonstrate that GCT outperforms the state-of-the-art sequence modelling methods on visual relationship forecasting.

READ FULL TEXT

page 3

page 4

page 5

page 6

page 8

research
03/25/2019

Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph

Visual relationship reasoning is a crucial yet challenging task for unde...
research
02/23/2023

Object-Centric Video Prediction via Decoupling of Object Dynamics and Interactions

We propose a novel framework for the task of object-centric video predic...
research
07/17/2020

Visual Relation Grounding in Videos

In this paper, we explore a novel task named visual Relation Grounding i...
research
11/22/2019

Crowd Density Forecasting by Modeling Patch-based Dynamics

Forecasting human activities observed in videos is a long-standing chall...
research
07/08/2020

Spatio-Temporal Scene Graphs for Video Dialog

The Audio-Visual Scene-aware Dialog (AVSD) task requires an agent to ind...
research
05/09/2020

Understanding Dynamic Scenes using Graph Convolution Networks

We present a novel Multi Relational Graph Convolutional Network (MRGCN) ...
research
10/27/2021

Relationship Oriented Affordance Learning through Manipulation Graph Construction

In this paper, we propose Manipulation Relationship Graph (MRG), a novel...

Please sign up or login with your details

Forgot password? Click here to reset