Finding beans in burgers: Deep semantic-visual embedding with localization

04/05/2018
by   Martin Engilberge, et al.
0

Several works have proposed to learn a two-path neural network that maps images and texts, respectively, to a same shared Euclidean space where geometry captures useful semantic relationships. Such a multi-modal embedding can be trained and used for various tasks, notably image captioning. In the present work, we introduce a new architecture of this type, with a visual path that leverages recent space-aware pooling mechanisms. Combined with a textual path which is jointly trained from scratch, our semantic-visual embedding offers a versatile model. Once trained under the supervision of captioned images, it yields new state-of-the-art performance on cross-modal retrieval. It also allows the localization of new concepts from the embedding space into any input image, delivering state-of-the-art result on the visual grounding of phrases.

READ FULL TEXT

page 1

page 7

page 8

research
10/31/2018

Semantic Modeling of Textual Relationships in Cross-Modal Retrieval

Feature modeling of different modalities is a basic problem in current r...
research
08/08/2019

Semi Supervised Phrase Localization in a Bidirectional Caption-Image Retrieval Framework

We introduce a novel deep neural network architecture that links visual ...
research
03/19/2023

Multi-modal reward for visual relationships-based image captioning

Deep neural networks have achieved promising results in automatic image ...
research
01/11/2020

MHSAN: Multi-Head Self-Attention Network for Visual Semantic Embedding

Visual-semantic embedding enables various tasks such as image-text retri...
research
11/09/2020

Learning the Best Pooling Strategy for Visual Semantic Embedding

Visual Semantic Embedding (VSE) is a dominant approach for vision-langua...
research
08/05/2021

Structured Multi-modal Feature Embedding and Alignment for Image-Sentence Retrieval

The current state-of-the-art image-sentence retrieval methods implicitly...
research
10/02/2019

Global visual localization in LiDAR-maps through shared 2D-3D embedding space

Global localization is an important and widely studied problem for many ...

Please sign up or login with your details

Forgot password? Click here to reset