Extending Phrase Grounding with Pronouns in Visual Dialogues

10/23/2022
by   Panzhong Lu, et al.
0

Conventional phrase grounding aims to localize noun phrases mentioned in a given caption to their corresponding image regions, which has achieved great success recently. Apparently, sole noun phrase grounding is not enough for cross-modal visual language understanding. Here we extend the task by considering pronouns as well. First, we construct a dataset of phrase grounding with both noun phrases and pronouns to image regions. Based on the dataset, we test the performance of phrase grounding by using a state-of-the-art literature model of this line. Then, we enhance the baseline grounding model with coreference information which should help our task potentially, modeling the coreference structures with graph convolutional networks. Experiments on our dataset, interestingly, show that pronouns are easier to ground than noun phrases, where the possible reason might be that these pronouns are much less ambiguous. Additionally, our final model with coreference information can significantly boost the grounding performance of both noun phrases and pronouns.

READ FULL TEXT
research
03/18/2019

Neural Sequential Phrase Grounding (SeqGROUND)

We propose an end-to-end approach for phrase grounding in images. Unlike...
research
09/06/2023

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

Key to tasks that require reasoning about natural language in visual con...
research
11/17/2018

Open-vocabulary Phrase Detection

Most existing work that grounds natural language phrases in images start...
research
07/05/2022

Toward Explainable and Fine-Grained 3D Grounding through Referring Textual Phrases

Recent progress on 3D scene understanding has explored visual grounding ...
research
04/13/2021

Disentangled Motif-aware Graph Learning for Phrase Grounding

In this paper, we propose a novel graph learning framework for phrase gr...
research
03/14/2023

Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment

Medical phrase grounding (MPG) aims to locate the most relevant region i...
research
09/01/2019

Phrase Grounding by Soft-Label Chain Conditional Random Field

The phrase grounding task aims to ground each entity mention in a given ...

Please sign up or login with your details

Forgot password? Click here to reset