Target-Oriented Deformation of Visual-Semantic Embedding Space

10/15/2019
by   Takashi Matsubara, et al.
0

Multimodal embedding is a crucial research topic for cross-modal understanding, data mining, and translation. Many studies have attempted to extract representations from given entities and align them in a shared embedding space. However, because entities in different modalities exhibit different abstraction levels and modality-specific information, it is insufficient to embed related entities close to each other. In this study, we propose the Target-Oriented Deformation Network (TOD-Net), a novel module that continuously deforms the embedding space into a new space under a given condition, thereby adjusting similarities between entities. Unlike methods based on cross-modal attention, TOD-Net is a post-process applied to the embedding space learned by existing embedding systems and improves their performances of retrieval. In particular, when combined with cutting-edge models, TOD-Net gains the state-of-the-art cross-modal retrieval model associated with the MSCOCO dataset. Qualitative analysis reveals that TOD-Net successfully emphasizes entity-specific concepts and retrieves diverse targets via handling higher levels of diversity than existing models.

READ FULL TEXT

page 1

page 6

research
06/10/2021

Cross-Modal Discrete Representation Learning

Recent advances in representation learning have demonstrated an ability ...
research
09/30/2019

Diachronic Cross-modal Embeddings

Understanding the semantic shifts of multimodal information is only poss...
research
02/04/2017

Simple to Complex Cross-modal Learning to Rank

The heterogeneity-gap between different modalities brings a significant ...
research
11/18/2019

Modality To Modality Translation: An Adversarial Representation Learning and Graph Fusion Network for Multimodal Fusion

Learning joint embedding space for various modalities is of vital import...
research
04/02/2020

MCEN: Bridging Cross-Modal Gap between Cooking Recipes and Dish Images with Latent Variable Model

Nowadays, driven by the increasing concern on diet and health, food comp...
research
01/19/2021

Joint Learning of 3D Shape Retrieval and Deformation

We propose a novel technique for producing high-quality 3D models that m...
research
12/09/2021

Self-Supervised Image-to-Text and Text-to-Image Synthesis

A comprehensive understanding of vision and language and their interrela...

Please sign up or login with your details

Forgot password? Click here to reset