Scene Graph Generation from Objects, Phrases and Region Captions

07/31/2017
by   Yikang Li, et al.
0

Object detection, scene graph generation and region captioning, which are three scene understanding tasks at different semantic levels, are tied together: scene graphs are generated on top of objects detected in an image with their pairwise relationship predicted, while region captioning gives a language description of the objects, their attributes, relations, and other context information. In this work, to leverage the mutual connections across semantic levels, we propose a novel neural network model, termed as Multi-level Scene Description Network (denoted as MSDN), to solve the three vision tasks jointly in an end-to-end manner. Objects, phrases, and caption regions are first aligned with a dynamic graph based on their spatial and semantic connections. Then a feature refining structure is used to pass messages across the three levels of semantic tasks through the graph. We benchmark the learned model on three tasks, and show the joint learning across three tasks with our proposed method can bring mutual improvements over previous models. Particularly, on the scene graph generation task, our proposed method outperforms the state-of-art method with more than 3

READ FULL TEXT

page 3

page 8

research
02/09/2021

SG2Caps: Revisiting Scene Graphs for Image Captioning

The mainstream image captioning models rely on Convolutional Neural Netw...
research
01/14/2020

NODIS: Neural Ordinary Differential Scene Understanding

Semantic image understanding is a challenging topic in computer vision. ...
research
07/07/2023

Open-Vocabulary Object Detection via Scene Graph Discovery

In recent years, open-vocabulary (OV) object detection has attracted inc...
research
09/11/2018

Context-Dependent Diffusion Network for Visual Relationship Detection

Visual relationship detection can bridge the gap between computer vision...
research
11/21/2018

An Interpretable Model for Scene Graph Generation

We propose an efficient and interpretable scene graph generator. We cons...
research
04/26/2023

Scene Graph Lossless Compression with Adaptive Prediction for Objects and Relations

The scene graph is a new data structure describing objects and their pai...
research
03/08/2021

Relationship-based Neural Baby Talk

Understanding interactions between objects in an image is an important e...

Please sign up or login with your details

Forgot password? Click here to reset