Towards Zero-shot Cross-lingual Image Retrieval and Tagging

09/15/2021
by   Pranav Aggarwal, et al.
9

There has been a recent spike in interest in multi-modal Language and Vision problems. On the language side, most of these models primarily focus on English since most multi-modal datasets are monolingual. We try to bridge this gap with a zero-shot approach for learning multi-modal representations using cross-lingual pre-training on the text side. We present a simple yet practical approach for building a cross-lingual image retrieval model which trains on a monolingual training dataset but can be used in a zero-shot cross-lingual fashion during inference. We also introduce a new objective function which tightens the text embedding clusters by pushing dissimilar texts away from each other. For evaluation, we introduce a new 1K multi-lingual MSCOCO2014 caption test dataset (XTD10) in 7 languages that we collected using a crowdsourcing platform. We use this as the test set for zero-shot model performance across languages. We also demonstrate how a cross-lingual model can be used for downstream tasks like multi-lingual image tagging in a zero shot manner. XTD10 dataset is made publicly available here: https://github.com/adobe-research/Cross-lingual-Test-Dataset-XTD10.

READ FULL TEXT

page 4

page 5

research
11/24/2020

Towards Zero-shot Cross-lingual Image Retrieval

There has been a recent spike in interest in multi-modal Language and Vi...
research
12/20/2014

Improving zero-shot learning by mitigating the hubness problem

The zero-shot paradigm exploits vector-based word representations extrac...
research
03/14/2022

A Neural Pairwise Ranking Model for Readability Assessment

Automatic Readability Assessment (ARA), the task of assigning a reading ...
research
01/03/2020

Two-Level Transformer and Auxiliary Coherence Modeling for Improved Text Segmentation

Breaking down the structure of long texts into semantically coherent seg...
research
05/09/2023

Boosting Zero-shot Cross-lingual Retrieval by Training on Artificially Code-Switched Data

Transferring information retrieval (IR) models from a high-resource lang...
research
07/25/2023

Combating the Curse of Multilinguality in Cross-Lingual WSD by Aligning Sparse Contextualized Word Representations

In this paper, we advocate for using large pre-trained monolingual langu...
research
08/19/2021

Mr. TyDi: A Multi-lingual Benchmark for Dense Retrieval

We present Mr. TyDi, a multi-lingual benchmark dataset for mono-lingual ...

Please sign up or login with your details

Forgot password? Click here to reset