C2KD: Cross-Lingual Cross-Modal Knowledge Distillation for Multilingual Text-Video Retrieval

10/07/2022
by   Andrew Rouditchenko, et al.
0

Multilingual text-video retrieval methods have improved significantly in recent years, but the performance for other languages lags behind English. We propose a Cross-Lingual Cross-Modal Knowledge Distillation method to improve multilingual text-video retrieval. Inspired by the fact that English text-video retrieval outperforms other languages, we train a student model using input text in different languages to match the cross-modal predictions from teacher models using input text in English. We propose a cross entropy based objective which forces the distribution over the student's text-video similarity scores to be similar to those of the teacher models. We introduce a new multilingual video dataset, Multi-YouCook2, by translating the English captions in the YouCook2 video dataset to 8 other languages. Our method improves multilingual text-video retrieval performance on Multi-YouCook2 and several other datasets such as Multi-MSRVTT and VATEX. We also conducted an analysis on the effectiveness of different multilingual text models as teachers.

READ FULL TEXT
research
12/15/2021

Learning Cross-Lingual IR from an English Retriever

We present a new cross-lingual information retrieval (CLIR) model traine...
research
10/08/2019

Aligning Multilingual Word Embeddings for Cross-Modal Retrieval Task

In this paper, we propose a new approach to learn multimodal multilingua...
research
05/15/2023

Soft Prompt Decoding for Multilingual Dense Retrieval

In this work, we explore a Multilingual Information Retrieval (MLIR) tas...
research
08/19/2023

AltDiffusion: A Multilingual Text-to-Image Diffusion Model

Large Text-to-Image(T2I) diffusion models have shown a remarkable capabi...
research
05/21/2022

Retrieval-Augmented Multilingual Keyphrase Generation with Retriever-Generator Iterative Training

Keyphrase generation is the task of automatically predicting keyphrases ...
research
09/11/2023

Dual-view Curricular Optimal Transport for Cross-lingual Cross-modal Retrieval

Current research on cross-modal retrieval is mostly English-oriented, as...
research
08/06/2021

Transferring Knowledge Distillation for Multilingual Social Event Detection

Recently published graph neural networks (GNNs) show promising performan...

Please sign up or login with your details

Forgot password? Click here to reset