Assisting Scene Graph Generation with Self-Supervision

08/08/2020
by   Sandeep Inuganti, et al.
0

Research in scene graph generation has quickly gained traction in the past few years because of its potential to help in downstream tasks like visual question answering, image captioning, etc. Many interesting approaches have been proposed to tackle this problem. Most of these works have a pre-trained object detection model as a preliminary feature extractor. Therefore, getting object bounding box proposals from the object detection model is relatively cheaper. We take advantage of this ready availability of bounding box annotations produced by the pre-trained detector. We propose a set of three novel yet simple self-supervision tasks and train them as auxiliary multi-tasks to the main model. While comparing, we train the base-model from scratch with these self-supervision tasks, we achieve state-of-the-art results in all the metrics and recall settings. We also resolve some of the confusion between two types of relationships: geometric and possessive, by training the model with the proposed self-supervision losses. We use the benchmark dataset, Visual Genome to conduct our experiments and show our results.

READ FULL TEXT

page 3

page 7

research
03/30/2021

Fully Convolutional Scene Graph Generation

This paper presents a fully convolutional scene graph generation (FCSGG)...
research
02/07/2018

Generating Triples with Adversarial Networks for Scene Graph Construction

Driven by successes in deep learning, computer vision research has begun...
research
05/09/2022

Beyond Bounding Box: Multimodal Knowledge Learning for Object Detection

Multimodal supervision has achieved promising results in many visual lan...
research
11/25/2021

Scene Graph Generation with Geometric Context

Scene Graph Generation has gained much attention in computer vision rese...
research
03/20/2023

Location-Free Scene Graph Generation

Scene Graph Generation (SGG) is a challenging visual understanding task....
research
07/23/2019

Cap2Det: Learning to Amplify Weak Caption Supervision for Object Detection

Learning to localize and name object instances is a fundamental problem ...
research
09/01/2023

Towards Addressing the Misalignment of Object Proposal Evaluation for Vision-Language Tasks via Semantic Grounding

Object proposal generation serves as a standard pre-processing step in V...

Please sign up or login with your details

Forgot password? Click here to reset