Semi Supervised Phrase Localization in a Bidirectional Caption-Image Retrieval Framework

08/08/2019
by   Deepan Das, et al.
5

We introduce a novel deep neural network architecture that links visual regions to corresponding textual segments including phrases and words. To accomplish this task, our architecture makes use of the rich semantic information available in a joint embedding space of multi-modal data. From this joint embedding space, we extract the associative localization maps that develop naturally, without explicitly providing supervision during training for the localization task. The joint space is learned using a bidirectional ranking objective that is optimized using a N-Pair loss formulation. This training mechanism demonstrates the idea that localization information is learned inherently while optimizing a Bidirectional Retrieval objective. The model's retrieval and localization performance is evaluated on MSCOCO and Flickr30K Entities datasets. This architecture outperforms the state of the art results in the semi-supervised phrase localization setting.

READ FULL TEXT

page 1

page 3

page 5

page 7

research
04/05/2018

Finding beans in burgers: Deep semantic-visual embedding with localization

Several works have proposed to learn a two-path neural network that maps...
research
03/27/2019

Align2Ground: Weakly Supervised Phrase Grounding Guided by Image-Caption Alignment

We address the problem of grounding free-form textual phrases by using w...
research
11/28/2018

Semi-supervised learning with Bidirectional GANs

In this work we introduce a novel approach to train Bidirectional Genera...
research
08/20/2019

Phrase Localization Without Paired Training Examples

Localizing phrases in images is an important part of image understanding...
research
04/29/2019

A dual branch deep neural network for classification and detection in mammograms

In this paper, we propose a novel deep learning architecture for joint c...
research
03/10/2020

A CNN-based Patent Image Retrieval Method for Design Ideation

The patent database is often used in searches of inspirational stimuli f...
research
11/19/2015

Learning Deep Structure-Preserving Image-Text Embeddings

This paper proposes a method for learning joint embeddings of images and...

Please sign up or login with your details

Forgot password? Click here to reset