Utilizing Every Image Object for Semi-supervised Phrase Grounding

11/05/2020
by   Haidong Zhu, et al.
2

Phrase grounding models localize an object in the image given a referring expression. The annotated language queries available during training are limited, which also limits the variations of language combinations that a model can see during training. In this paper, we study the case applying objects without labeled queries for training the semi-supervised phrase grounding. We propose to use learned location and subject embedding predictors (LSEP) to generate the corresponding language embeddings for objects lacking annotated queries in the training set. With the assistance of the detector, we also apply LSEP to train a grounding model on images without any annotation. We evaluate our method based on MAttNet on three public datasets: RefCOCO, RefCOCO+, and RefCOCOg. We show that our predictors allow the grounding system to learn from the objects without labeled queries and improve accuracy by 34.9% relatively with the detection results.

READ FULL TEXT

page 1

page 3

page 8

research
03/16/2022

Pseudo-Q: Generating Pseudo Language Queries for Visual Grounding

Visual grounding, i.e., localizing objects in images according to natura...
research
05/30/2019

Grounding Language Attributes to Objects using Bayesian Eigenobjects

We develop a system to disambiguate objects based on simple physical des...
research
03/11/2018

Knowledge Aided Consistency for Weakly Supervised Phrase Grounding

Given a natural language query, a phrase grounding system aims to locali...
research
06/17/2020

Contrastive Learning for Weakly Supervised Phrase Grounding

Phrase grounding, the problem of associating image regions to caption wo...
research
07/05/2022

Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases

Recent progress on 3D scene understanding has explored visual grounding ...
research
08/30/2023

Catalog Phrase Grounding (CPG): Grounding of Product Textual Attributes in Product Images for e-commerce Vision-Language Applications

We present Catalog Phrase Grounding (CPG), a model that can associate pr...
research
08/20/2019

Zero-Shot Grounding of Objects from Natural Language Queries

A phrase grounding system localizes a particular object in an image refe...

Please sign up or login with your details

Forgot password? Click here to reset