Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task

10/08/2019
by   Alireza Mohammadshahi, et al.
0

In this paper, we propose a new approach to learn multimodal multilingual embeddings for matching images and their relevant captions in two languages. We combine two existing objective functions to make images and captions close in a joint embedding space while adapting the alignment of word embeddings between existing languages in our model. We show that our approach enables better generalization, achieving state-of-the-art performance in text-to-image and image-to-text retrieval task, and caption-caption similarity task. Two multimodal multilingual datasets are used for evaluation: Multi30k with German and English captions and Microsoft-COCO with English and Japanese captions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2022

C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

Multilingual text-video retrieval methods have improved significantly in...
research
10/13/2015

Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning

Recently there has been a lot of interest in learning common representat...
research
06/03/2017

Order embeddings and character-level convolutions for multimodal alignment

With the novel and fast advances in the area of deep neural networks, se...
research
09/04/2023

NLLB-CLIP – train performant multilingual image retrieval model on a budget

Today, the exponential rise of large models developed by academic and in...
research
03/27/2019

Image search using multilingual texts: a cross-modal learning approach between image and text

Multilingual (or cross-lingual) embeddings represent several languages i...
research
03/27/2019

Image search using multilingual texts: a cross-modal learning approach between image and text Maxime Portaz Qwant Research

Multilingual (or cross-lingual) embeddings represent several languages i...
research
09/30/2019

Multi-Head Attention with Diversity for Learning Grounded Multilingual Multimodal Representations

With the aim of promoting and understanding the multilingual version of ...

Please sign up or login with your details

Forgot password? Click here to reset