Learning Visual Commonsense for Robust Scene Graph Generation

06/17/2020
by   Alireza Zareian, et al.
0

Scene graph generation models understand the scene through object and predicate recognition, but are prone to mistakes due to the challenges of perception in the wild. Perception errors often lead to nonsensical compositions in the output scene graph, which do not follow real-world rules and patterns, and can be corrected using commonsense knowledge. We propose the first method to acquire visual commonsense such as affordance and intuitive physics automatically from data, and use that to enhance scene graph generation. To this end, we extend transformers to incorporate the structure of scene graphs, and train our Global-Local Attention Transformer on a scene graph corpus. Once trained, our commonsense model can be applied on any perception model and correct its obvious mistakes, resulting in a more commonsensical scene graph. We show the proposed model learns commonsense better than any alternative, and improves the accuracy of any scene graph generation model. Nevertheless, strong disproportions in real-world datasets could bias commonsense to miscorrect already confident perceptions. We address this problem by devising a fusion module that compares predictions made by the perception and commonsense models, and the confidence of each, to make a hybrid decision. Our full model learns commonsense and knows when to use it, which is shown effective through experiments, resulting in a new state of the art.

READ FULL TEXT

page 2

page 8

page 14

research
12/16/2021

SGEITL: Scene Graph Enhanced Image-Text Learning for Visual Commonsense Reasoning

Answering complex questions about images is an ambitious goal for machin...
research
01/07/2020

Bridging Knowledge Graphs to Generate Scene Graphs

Scene graphs are powerful representations that encode images into their ...
research
08/13/2023

3D Scene Graph Prediction on Point Clouds Using Knowledge Graphs

3D scene graph prediction is a task that aims to concurrently predict ob...
research
10/22/2020

Bilinear Fusion of Commonsense Knowledge with Attention-Based NLI Models

We consider the task of incorporating real-world commonsense knowledge i...
research
02/23/2022

Commonsense Reasoning for Identifying and Understanding the Implicit Need of Help and Synthesizing Assistive Actions

Human-Robot Interaction (HRI) is an emerging subfield of service robotic...
research
10/13/2020

Mathematical Word Problem Generation from Commonsense Knowledge Graph and Equations

There is an increasing interest in the use of automatic mathematical wor...
research
01/30/2023

Pseudo 3D Perception Transformer with Multi-level Confidence Optimization for Visual Commonsense Reasoning

A framework performing Visual Commonsense Reasoning(VCR) needs to choose...

Please sign up or login with your details

Forgot password? Click here to reset