Panoptic Narrative Grounding

09/10/2021
by   C. González, et al.
5

This paper proposes Panoptic Narrative Grounding, a spatially fine and general formulation of the natural language visual grounding problem. We establish an experimental framework for the study of this new task, including new ground truth and metrics, and we propose a strong baseline method to serve as stepping stone for future work. We exploit the intrinsic semantic richness in an image by including panoptic categories, and we approach visual grounding at a fine-grained level by using segmentations. In terms of ground truth, we propose an algorithm to automatically transfer Localized Narratives annotations to specific regions in the panoptic segmentations of the MS COCO dataset. To guarantee the quality of our annotations, we take advantage of the semantic structure contained in WordNet to exclusively incorporate noun phrases that are grounded to a meaningfully related panoptic segmentation region. The proposed baseline achieves a performance of 55.4 absolute Average Recall points. This result is a suitable foundation to push the envelope further in the development of methods for Panoptic Narrative Grounding.

READ FULL TEXT

page 4

page 5

page 6

page 8

research
09/06/2023

A Joint Study of Phrase Grounding and Task Performance in Vision and Language Models

Key to tasks that require reasoning about natural language in visual con...
research
11/07/2020

Text-to-Image Generation Grounded by Fine-Grained User Attention

Localized Narratives is a dataset with detailed natural language descrip...
research
12/31/2021

Deconfounded Visual Grounding

We focus on the confounding bias between language and location in the vi...
research
05/01/2018

Weakly Supervised Attention Learning for Textual Phrases Grounding

Grounding textual phrases in visual content is a meaningful yet challeng...
research
10/30/2020

Domain-Specific Lexical Grounding in Noisy Visual-Textual Documents

Images can give us insights into the contextual meanings of words, but c...
research
02/22/2023

Connecting Vision and Language with Video Localized Narratives

We propose Video Localized Narratives, a new form of multimodal video an...
research
12/13/2019

Grounding-Tracking-Integration

In this paper, we study tracking by language that localizes the target b...

Please sign up or login with your details

Forgot password? Click here to reset