An Interpretable Model for Scene Graph Generation

11/21/2018
by   Ji Zhang, et al.
0

We propose an efficient and interpretable scene graph generator. We consider three types of features: visual, spatial and semantic, and we use a late fusion strategy such that each feature's contribution can be explicitly investigated. We study the key factors about these features that have the most impact on the performance, and also visualize the learned visual features for relationships and investigate the efficacy of our model. We won the champion of the OpenImages Visual Relationship Detection Challenge on Kaggle, where we outperform the 2nd place by 5% (20% relatively). We believe an accurate scene graph generator is a fundamental stepping stone for higher-level vision-language tasks such as image captioning and visual QA, since it provides a semantic, structured comprehension of an image that is beyond pixels and objects.

READ FULL TEXT
research
09/23/2021

Scene Graph Generation for Better Image Captioning?

We investigate the incorporation of visual relationships into the task o...
research
11/01/2018

Introduction to the 1st Place Winning Model of OpenImages Relationship Detection Challenge

This article describes the model we built that achieved 1st place in the...
research
02/09/2021

SG2Caps: Revisiting Scene Graphs for Image Captioning

The mainstream image captioning models rely on Convolutional Neural Netw...
research
11/19/2014

Affordances Provide a Fundamental Categorization Principle for Visual Scenes

How do we know that a kitchen is a kitchen by looking? Relatively little...
research
09/11/2018

The Visual QA Devil in the Details: The Impact of Early Fusion and Batch Norm on CLEVR

Visual QA is a pivotal challenge for higher-level reasoning, requiring u...
research
07/31/2017

Scene Graph Generation from Objects, Phrases and Region Captions

Object detection, scene graph generation and region captioning, which ar...
research
09/22/2019

Learning Visual Relation Priors for Image-Text Matching and Image Captioning with Neural Scene Graph Generators

Grounding language to visual relations is critical to various language-a...

Please sign up or login with your details

Forgot password? Click here to reset