Learning Deep Structure-Preserving Image-Text Embeddings

11/19/2015
by   Liwei Wang, et al.
0

This paper proposes a method for learning joint embeddings of images and text using a two-branch neural network with multiple layers of linear projections followed by nonlinearities. The network is trained using a large margin objective that combines cross-view ranking constraints with within-view neighborhood structure preservation constraints inspired by metric learning literature. Extensive experiments show that our approach gains significant improvements in accuracy for image-to-text and text-to-image retrieval. Our method achieves new state-of-the-art results on the Flickr30K and MSCOCO image-sentence datasets and shows promise on the new task of phrase localization on the Flickr30K Entities dataset.

READ FULL TEXT
research
04/11/2017

Learning Two-Branch Neural Networks for Image-Text Matching Tasks

This paper investigates two-branch neural networks for image-text matchi...
research
06/22/2015

Modality-dependent Cross-media Retrieval

In this paper, we investigate the cross-media retrieval between images a...
research
03/30/2021

Thinking Fast and Slow: Efficient Text-to-Visual Retrieval with Transformers

Our objective is language-based search of large-scale image and video da...
research
08/23/2018

Webly Supervised Joint Embedding for Cross-Modal Image-Text Retrieval

Cross-modal retrieval between visual data and natural language descripti...
research
04/03/2019

Point Cloud Oversegmentation with Graph-Structured Deep Metric Learning

We propose a new supervized learning framework for oversegmenting 3D poi...
research
08/08/2019

Semi Supervised Phrase Localization in a Bidirectional Caption-Image Retrieval Framework

We introduce a novel deep neural network architecture that links visual ...
research
03/30/2023

Adaptive Cross Batch Normalization for Metric Learning

Metric learning is a fundamental problem in computer vision whereby a mo...

Please sign up or login with your details

Forgot password? Click here to reset