Generating Visual Spatial Description via Holistic 3D Scene Understanding

05/19/2023
by   Yu Zhao, et al.
0

Visual spatial description (VSD) aims to generate texts that describe the spatial relations of the given objects within images. Existing VSD work merely models the 2D geometrical vision features, thus inevitably falling prey to the problem of skewed spatial understanding of target objects. In this work, we investigate the incorporation of 3D scene features for VSD. With an external 3D scene extractor, we obtain the 3D objects and scene features for input images, based on which we construct a target object-centered 3D spatial scene graph (Go3D-S2G), such that we model the spatial semantics of target objects within the holistic 3D scenes. Besides, we propose a scene subgraph selecting mechanism, sampling topologically-diverse subgraphs from Go3D-S2G, where the diverse local structure features are navigated to yield spatially-diversified text generation. Experimental results on two VSD datasets demonstrate that our framework outperforms the baselines significantly, especially improving on the cases with complex visual spatial relations. Meanwhile, our method can produce more spatially-diversified generation. Code is available at https://github.com/zhaoyucs/VSD.

READ FULL TEXT

page 8

page 14

page 15

page 17

research
09/11/2019

Specifying Object Attributes and Relations in Interactive Scene Generation

We introduce a method for the generation of images from an input scene g...
research
10/04/2020

Holistic static and animated 3D scene generation from diverse text descriptions

We propose a framework for holistic static and animated 3D scene generat...
research
06/29/2018

Factorizable Net: An Efficient Subgraph-based Framework for Scene Graph Generation

Generating scene graph to describe all the relations inside an image gai...
research
11/17/2021

Learning to Compose Visual Relations

The visual world around us can be described as a structured set of objec...
research
08/07/2019

SpatialSense: An Adversarially Crowdsourced Benchmark for Spatial Relation Recognition

Understanding the spatial relations between objects in images is a surpr...
research
05/21/2022

Exploring Concept Contribution Spatially: Hidden Layer Interpretation with Spatial Activation Concept Vector

To interpret deep learning models, one mainstream is to explore the lear...
research
09/20/2020

Deriving Visual Semantics from Spatial Context: An Adaptation of LSA and Word2Vec to generate Object and Scene Embeddings from Images

Embeddings are an important tool for the representation of word meaning....

Please sign up or login with your details

Forgot password? Click here to reset