Contrastive Learning for Weakly Supervised Phrase Grounding

06/17/2020
by   Tanmay Gupta, et al.
4

Phrase grounding, the problem of associating image regions to caption words, is a crucial component of vision-language tasks. We show that phrase grounding can be learned by optimizing word-region attention to maximize a lower bound on mutual information between images and caption words. Given pairs of images and captions, we maximize compatibility of the attention-weighted regions and the words in the corresponding caption, compared to non-corresponding pairs of images and captions. A key idea is to construct effective negative captions for learning through language model guided word substitutions. Training with our negatives yields a ∼10% absolute gain in accuracy over randomly-sampled negatives from the training data. Our weakly supervised phrase grounding model trained on COCO-Captions shows a healthy gain of 5.7% to achieve 76.7% accuracy on Flickr30K Entities benchmark.

READ FULL TEXT

page 2

page 6

page 14

research
06/19/2022

What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs

Given an input image, and nothing else, our method returns the bounding ...
research
04/20/2021

Detector-Free Weakly Supervised Grounding by Separation

Nowadays, there is an abundance of data involving images and surrounding...
research
07/03/2020

Improving Weakly Supervised Visual Grounding by Contrastive Knowledge Distillation

Weakly supervised phrase grounding aims at learning region-phrase corres...
research
01/15/2023

Generating Templated Caption for Video Grounding

Video grounding aims to locate a moment of interest matching the given q...
research
11/05/2020

Utilizing Every Image Object for Semi-supervised Phrase Grounding

Phrase grounding models localize an object in the image given a referrin...
research
08/30/2023

Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications

We present Catalog Phrase Grounding (CPG), a model that can associate pr...
research
09/01/2019

Phrase Grounding by Soft-Label Chain Conditional Random Field

The phrase grounding task aims to ground each entity mention in a given ...

Please sign up or login with your details

Forgot password? Click here to reset