Neural Sequential Phrase Grounding (SeqGROUND)

03/18/2019
by   Pelin Dogan, et al.
8

We propose an end-to-end approach for phrase grounding in images. Unlike prior methods that typically attempt to ground each phrase independently by building an image-text embedding, our architecture formulates grounding of multiple phrases as a sequential and contextual process. Specifically, we encode region proposals and all phrases into two stacks of LSTM cells, along with so-far grounded phrase-region pairs. These LSTM stacks collectively capture context for grounding of the next phrase. The resulting architecture, which we call SeqGROUND, supports many-to-many matching by allowing an image region to be matched to multiple phrases and vice versa. We show competitive performance on the Flickr30K benchmark dataset and, through ablation studies, validate the efficacy of sequential grounding as well as individual design choices in our model architecture.

READ FULL TEXT

page 1

page 5

page 8

research
10/23/2022

Extending Phrase Grounding with Pronouns in Visual Dialogues

Conventional phrase grounding aims to localize noun phrases mentioned in...
research
11/22/2017

Conditional Image-Text Embedding Networks

This paper presents an approach for grounding phrases in images which jo...
research
08/11/2022

PPMN: Pixel-Phrase Matching Network for One-Stage Panoptic Narrative Grounding

Panoptic Narrative Grounding (PNG) is an emerging task whose goal is to ...
research
03/14/2023

Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment

Medical phrase grounding (MPG) aims to locate the most relevant region i...
research
04/12/2022

Position-aware Location Regression Network for Temporal Video Grounding

The key to successful grounding for video surveillance is to understand ...
research
04/13/2021

Disentangled Motif-aware Graph Learning for Phrase Grounding

In this paper, we propose a novel graph learning framework for phrase gr...
research
09/01/2019

Phrase Grounding by Soft-Label Chain Conditional Random Field

The phrase grounding task aims to ground each entity mention in a given ...

Please sign up or login with your details

Forgot password? Click here to reset