Learning to Relate from Captions and Bounding Boxes

12/01/2019
by   Sarthak Garg, et al.
11

In this work, we propose a novel approach that predicts the relationships between various entities in an image in a weakly supervised manner by relying on image captions and object bounding box annotations as the sole source of supervision. Our proposed approach uses a top-down attention mechanism to align entities in captions to objects in the image, and then leverage the syntactic structure of the captions to align the relations. We use these alignments to train a relation classification network, thereby obtaining both grounded captions and dense relationships. We demonstrate the effectiveness of our model on the Visual Genome dataset by achieving a recall@50 of 15 25 successfully predicts relations that are not present in the corresponding captions.

READ FULL TEXT

page 5

page 8

page 9

page 10

research
11/25/2018

Learning to discover and localize visual objects with open vocabulary

To alleviate the cost of obtaining accurate bounding boxes for training ...
research
12/17/2018

Grounded Video Description

Video description is one of the most challenging problems in vision and ...
research
02/01/2021

Inferring spatial relations from textual descriptions of images

Generating an image from its textual description requires both a certain...
research
01/08/2020

Weakly Supervised Visual Semantic Parsing

Scene Graph Generation (SGG) aims to extract entities, predicates and th...
research
05/28/2021

Linguistic Structures as Weak Supervision for Visual Scene Graph Generation

Prior work in scene graph generation requires categorical supervision at...
research
05/19/2022

Training Vision-Language Transformers from Captions Alone

We show that Vision-Language Transformers can be learned without human l...
research
01/26/2023

Paraphrase Acquisition from Image Captions

We propose to use captions from the Web as a previously underutilized re...

Please sign up or login with your details

Forgot password? Click here to reset