Who are you referring to? Weakly supervised coreference resolution with multimodal grounding

11/26/2022
by   Arushi Goel, et al.
0

Coreference resolution aims at identifying words and phrases which refer to same entity in a text, a core tool in natural language processing. In this paper, we propose a novel task, resolving coreferences in multimodal data, long-form textual descriptions of visual scenes. Most existing image-text datasets only contain short sentences without coreferent expressions, or coreferences are not annotated. To this end, we first introduce a new dataset, Flickr30k-Coref in which coreference chains and bounding box localization of these chains are annotated. We propose a new technique that learns to identify coreference chains through weakly supervised grounding from image-text pairs and a regularization using prior linguistic knowledge. Our model yields large performance gains over prior work in coreference resolution and weakly supervised grounding of long-form text descriptions.

READ FULL TEXT

page 4

page 8

page 12

page 13

page 14

research
05/18/2023

Weakly-Supervised Visual-Textual Grounding with Semantic Prior Refinement

Using only image-sentence pairs, weakly-supervised visual-textual ground...
research
10/12/2020

MAF: Multimodal Alignment Framework for Weakly-Supervised Phrase Grounding

Phrase localization is a task that studies the mapping from textual phra...
research
09/18/2021

Weakly Supervised Explainable Phrasal Reasoning with Neural Fuzzy Logic

Natural language inference (NLI) aims to determine the logical relations...
research
04/07/2019

Modularized Textual Grounding for Counterfactual Resilience

Computer Vision applications often require a textual grounding module wi...
research
01/19/2021

Grounding Language to Entities and Dynamics for Generalization in Reinforcement Learning

In this paper, we consider the problem of leveraging textual description...
research
07/26/2019

Weakly Supervised Domain Detection

In this paper we introduce domain detection as a new natural language pr...
research
03/29/2018

Unsupervised Textual Grounding: Linking Words to Image Concepts

Textual grounding, i.e., linking words to objects in images, is a challe...

Please sign up or login with your details

Forgot password? Click here to reset