Phrase Grounding by Soft-Label Chain Conditional Random Field

09/01/2019
by   Jiacheng Liu, et al.
0

The phrase grounding task aims to ground each entity mention in a given caption of an image to a corresponding region in that image. Although there are clear dependencies between how different mentions of the same caption should be grounded, previous structured prediction methods that aim to capture such dependencies need to resort to approximate inference or non-differentiable losses. In this paper, we formulate phrase grounding as a sequence labeling task where we treat candidate regions as potential labels, and use neural chain Conditional Random Fields (CRFs) to model dependencies among regions for adjacent mentions. In contrast to standard sequence labeling tasks, the phrase grounding task is defined such that there may be multiple correct candidate regions. To address this multiplicity of gold labels, we define so-called Soft-Label Chain CRFs, and present an algorithm that enables convenient end-to-end training. Our method establishes a new state-of-the-art on phrase grounding on the Flickr30k Entities dataset. Analysis shows that our model benefits both from the entity dependencies captured by the CRF and from the soft-label training regime. Our code is available at <github.com/liujch1998/SoftLabelCCRF>

READ FULL TEXT

page 1

page 9

research
10/23/2022

Extending Phrase Grounding with Pronouns in Visual Dialogues

Conventional phrase grounding aims to localize noun phrases mentioned in...
research
08/17/2016

Scene Labeling Through Knowledge-Based Rules Employing Constrained Integer Linear Programing

Scene labeling task is to segment the image into meaningful regions and ...
research
03/18/2019

Neural Sequential Phrase Grounding (SeqGROUND)

We propose an end-to-end approach for phrase grounding in images. Unlike...
research
06/17/2020

Contrastive Learning for Weakly Supervised Phrase Grounding

Phrase grounding, the problem of associating image regions to caption wo...
research
08/11/2022

PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding

Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to ...
research
07/07/2020

Research on Annotation Rules and Recognition Algorithm Based on Phrase Window

At present, most Natural Language Processing technology is based on the ...
research
12/03/2022

Modeling Label Correlations for Ultra-Fine Entity Typing with Neural Pairwise Conditional Random Field

Ultra-fine entity typing (UFET) aims to predict a wide range of type phr...

Please sign up or login with your details

Forgot password? Click here to reset