Image search using multilingual texts: a cross-modal learning approach between image and text

03/27/2019
by   Maxime Portaz, et al.
0

Multilingual (or cross-lingual) embeddings represent several languages in a unique vector space. Using a common embedding space enables for a shared semantic between words from different languages. In this paper, we propose to embed images and texts into a unique distributional vector space, enabling to search images by using text queries expressing information needs related to the (visual) content of images, as well as using image similarity. Our framework forces the representation of an image to be similar to the representation of the text that describes it. Moreover, by using multilingual embeddings we ensure that words from two different languages have close descriptors and thus are attached to similar images. We provide experimental evidence of the efficiency of our approach by experimenting it on two datasets: Common Objects in COntext (COCO) [19] and Multi30K [7].

READ FULL TEXT

page 3

page 5

page 6

research
03/27/2019

Image search using multilingual texts: a cross-modal learning approach between image and text Maxime Portaz Qwant Research

Multilingual (or cross-lingual) embeddings represent several languages i...
research
05/15/2020

Cross-lingual Transfer of Twitter Sentiment Models Using a Common Vector Space

Word embeddings represent words in a numeric space in such a way that se...
research
10/08/2019

Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task

In this paper, we propose a new approach to learn multimodal multilingua...
research
02/18/2023

RetVec: Resilient and Efficient Text Vectorizer

This paper describes RetVec, a resilient multilingual embedding scheme d...
research
03/27/2020

Unsupervised Cross-Modal Audio Representation Learning from Unstructured Multilingual Text

We present an approach to unsupervised audio representation learning. Ba...
research
04/19/2016

Using Apache Lucene to Search Vector of Locally Aggregated Descriptors

Surrogate Text Representation (STR) is a profitable solution to efficien...
research
10/18/2019

Towards Learning Cross-Modal Perception-Trace Models

Representation learning is a key element of state-of-the-art deep learni...

Please sign up or login with your details

Forgot password? Click here to reset