HUSE: Hierarchical Universal Semantic Embeddings

11/14/2019
by   Pradyumna Narayana, et al.
0

There is a recent surge of interest in cross-modal representation learning corresponding to images and text. The main challenge lies in mapping images and text to a shared latent space where the embeddings corresponding to a similar semantic concept lie closer to each other than the embeddings corresponding to different semantic concepts, irrespective of the modality. Ranking losses are commonly used to create such shared latent space – however, they do not impose any constraints on inter-class relationships resulting in neighboring clusters to be completely unrelated. The works in the domain of visual semantic embeddings address this problem by first constructing a semantic embedding space based on some external knowledge and projecting image embeddings onto this fixed semantic embedding space. These works are confined only to image domain and constraining the embeddings to a fixed space adds additional burden on learning. This paper proposes a novel method, HUSE, to learn cross-modal representation with semantic information. HUSE learns a shared latent space where the distance between any two universal embeddings is similar to the distance between their corresponding class embeddings in the semantic embedding space. HUSE also uses a classification objective with a shared classification layer to make sure that the image and text embeddings are in the same shared latent space. Experiments on UPMC Food-101 show our method outperforms previous state-of-the-art on retrieval, hierarchical precision and classification results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2017

Conditional generation of multi-modal data using constrained embedding space mapping

We present a conditional generative model that maps low-dimensional embe...
research
09/14/2019

Joint Wasserstein Autoencoders for Aligning Multimodal Embeddings

One of the key challenges in learning joint embeddings of multiple modal...
research
11/12/2018

Agent Embeddings: A Latent Representation for Pole-Balancing Networks

We show that it is possible to reduce a high-dimensional object like a n...
research
03/27/2020

CurlingNet: Compositional Learning between Images and Text for Fashion IQ Data

We present an approach named CurlingNet that can measure the semantic di...
research
07/01/2022

(Un)likelihood Training for Interpretable Embedding

Cross-modal representation learning has become a new normal for bridging...
research
12/11/2017

Learning Modality-Invariant Representations for Speech and Images

In this paper, we explore the unsupervised learning of a semantic embedd...
research
11/28/2022

CLIP2GAN: Towards Bridging Text with the Latent Space of GANs

In this work, we are dedicated to text-guided image generation and propo...

Please sign up or login with your details

Forgot password? Click here to reset