Towards Zero-shot Cross-lingual Image Retrieval

11/24/2020
by   Pranav Aggarwal, et al.
0

There has been a recent spike in interest in multi-modal Language and Vision problems. On the language side, most of these models primarily focus on English since most multi-modal datasets are monolingual. We try to bridge this gap with a zero-shot approach for learning multi-modal representations using cross-lingual pre-training on the text side. We present a simple yet practical approach for building a cross-lingual image retrieval model which trains on a monolingual training dataset but can be used in a zero-shot cross-lingual fashion during inference. We also introduce a new objective function which tightens the text embedding clusters by pushing dissimilar texts from each other. Finally, we introduce a new 1K multi-lingual MSCOCO2014 caption test dataset (XTD10) in 7 languages that we collected using a crowdsourcing platform. We use this as the test set for evaluating zero-shot model performance across languages. XTD10 dataset is made publicly available here: https://github.com/adobe-research/Cross-lingual-Test-Dataset-XTD10

READ FULL TEXT
research
09/15/2021

Towards Zero-shot Cross-lingual Image Retrieval and Tagging

There has been a recent spike in interest in multi-modal Language and Vi...
research
05/24/2023

Meta-Learning For Vision-and-Language Cross-lingual Transfer

Current pre-trained vison-language models (PVLMs) achieve excellent perf...
research
06/01/2022

Cross-View Language Modeling: Towards Unified Cross-Lingual Cross-Modal Pre-training

In this paper, we introduce Cross-View Language Modeling, a simple and e...
research
07/25/2023

Combating the Curse of Multilinguality in Cross-Lingual WSD by Aligning Sparse Contextualized Word Representations

In this paper, we advocate for using large pre-trained monolingual langu...
research
07/31/2023

Bridging the Gap: Exploring the Capabilities of Bridge-Architectures for Complex Visual Reasoning Tasks

In recent times there has been a surge of multi-modal architectures base...
research
08/19/2021

Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval

We present Mr. TyDi, a multi-lingual benchmark dataset for mono-lingual ...
research
12/20/2014

Improving zero-shot learning by mitigating the hubness problem

The zero-shot paradigm exploits vector-based word representations extrac...

Please sign up or login with your details

Forgot password? Click here to reset