Mapping Images to Scene Graphs with Permutation-Invariant Structured Prediction

02/15/2018
by   Roei Herzig, et al.
0

Structured prediction is concerned with predicting multiple inter-dependent labels simultaneously. Classical methods like CRF achieve this by maximizing a score function over the set of possible label assignments. Recent extensions use neural networks to either implement the score function or in maximization. The current paper takes an alternative approach, using a neural network to generate the structured output directly, without going through a score function. We take an axiomatic perspective to derive the desired properties and invariances of a such network to certain input permutations, presenting a structural characterization that is provably both necessary and sufficient. We then discuss graph-permutation invariant (GPI) architectures that satisfy this characterization and explain how they can be used for deep structured prediction. We evaluate our approach on the challenging problem of inferring a scene graph from an image, namely, predicting entities and their relations in the image. We obtain state-of-the-art results on the challenging Visual Genome benchmark, outperforming all recent approaches.

READ FULL TEXT
research
03/02/2020

Permutation Invariant Graph Generation via Score-Based Generative Modeling

Learning generative models for graph-structured data is challenging beca...
research
11/30/2022

Iterative Scene Graph Generation with Generative Transformers

Scene graphs provide a rich, structured representation of a scene by enc...
research
06/22/2017

Pixels to Graphs by Associative Embedding

Graphs are a useful abstraction of image content. Not only can graphs re...
research
03/08/2023

Transformer-based Image Generation from Scene Graphs

Graph-structured scene descriptions can be efficiently used in generativ...
research
12/24/2018

Invariant and Equivariant Graph Networks

Invariant and equivariant networks have been successfully used for learn...
research
04/28/2015

Deep Neural Networks Regularization for Structured Output Prediction

A deep neural network model is a powerful framework for learning represe...
research
10/28/2019

Interrupted and cascaded permutation invariant training for speech separation

Permutation Invariant Training (PIT) has long been a stepping stone meth...

Please sign up or login with your details

Forgot password? Click here to reset