Learning to Compose Visual Relations

11/17/2021
by   Nan Liu, et al.
22

The visual world around us can be described as a structured set of objects and their associated relations. An image of a room may be conjured given only the description of the underlying objects and their associated relations. While there has been significant work on designing deep neural networks which may compose individual objects together, less work has been done on composing the individual relations between objects. A principal difficulty is that while the placement of objects is mutually independent, their relations are entangled and dependent on each other. To circumvent this issue, existing works primarily compose relations by utilizing a holistic encoder, in the form of text or graphs. In this work, we instead propose to represent each relation as an unnormalized density (an energy-based model), enabling us to compose separate relations in a factorized manner. We show that such a factorized decomposition allows the model to both generate and edit scenes that have multiple sets of relations more faithfully. We further show that decomposition enables our model to effectively understand the underlying relational scene structure. Project page at: https://composevisualrelations.github.io/.

READ FULL TEXT

page 6

page 7

page 16

page 17

page 18

research
07/04/2022

ViRel: Unsupervised Visual Relations Discovery with Graph-level Analogy

Visual relations form the basis of understanding our compositional world...
research
05/19/2023

Generating Visual Spatial Description via Holistic 3D Scene Understanding

Visual spatial description (VSD) aims to generate texts that describe th...
research
11/10/2022

DisPositioNet: Disentangled Pose and Identity in Semantic Image Manipulation

Graph representation of objects and their relations in a scene, known as...
research
06/03/2022

Compositional Visual Generation with Composable Diffusion Models

Large text-guided diffusion models, such as DALLE-2, are able to generat...
research
11/04/2021

Unsupervised Learning of Compositional Energy Concepts

Humans are able to rapidly understand scenes by utilizing concepts extra...
research
07/30/2021

Enhancing Social Relation Inference with Concise Interaction Graph and Discriminative Scene Representation

There has been a recent surge of research interest in attacking the prob...
research
11/17/2014

Relations World: A Possibilistic Graphical Model

We explore the idea of using a "possibilistic graphical model" as the ba...

Please sign up or login with your details

Forgot password? Click here to reset