Spatial-Temporal Transformer for Dynamic Scene Graph Generation

07/26/2021
by   Yuren Cong, et al.
0

Dynamic scene graph generation aims at generating a scene graph of the given video. Compared to the task of scene graph generation from images, it is more challenging because of the dynamic relationships between objects and the temporal dependencies between frames allowing for a richer semantic interpretation. In this paper, we propose Spatial-temporal Transformer (STTran), a neural network that consists of two core modules: (1) a spatial encoder that takes an input frame to extract spatial context and reason about the visual relationships within a frame, and (2) a temporal decoder which takes the output of the spatial encoder as input in order to capture the temporal dependencies between frames and infer the dynamic relationships. Furthermore, STTran is flexible to take varying lengths of videos as input without clipping, which is especially important for long videos. Our method is validated on the benchmark dataset Action Genome (AG). The experimental results demonstrate the superior performance of our method in terms of dynamic scene graphs. Moreover, a set of ablative studies is conducted and the effect of each proposed module is justified.

READ FULL TEXT

page 1

page 6

page 8

research
12/18/2021

Exploiting Long-Term Dependencies for Generating Dynamic Scene Graphs

Structured video representation in the form of dynamic scene graphs is a...
research
08/10/2023

Local-Global Information Interaction Debiasing for Dynamic Scene Graph Generation

The task of dynamic scene graph generation (DynSGG) aims to generate sce...
research
11/11/2022

SSGVS: Semantic Scene Graph-to-Video Synthesis

As a natural extension of the image synthesis task, video synthesis has ...
research
05/15/2023

Cross-Modality Time-Variant Relation Learning for Generating Dynamic Scene Graphs

Dynamic scene graphs generated from video clips could help enhance the s...
research
05/20/2021

Profiling Visual Dynamic Complexity Using a Bio-Robotic Approach

Visual dynamic complexity is a ubiquitous, hidden attribute of the visua...
research
07/01/2022

Transforming Image Generation from Scene Graphs

Generating images from semantic visual knowledge is a challenging task, ...
research
09/01/2019

Visual Deprojection: Probabilistic Recovery of Collapsed Dimensions

We introduce visual deprojection: the task of recovering an image or vid...

Please sign up or login with your details

Forgot password? Click here to reset