Improving video retrieval using multilingual knowledge transfer

08/24/2022
by   Avinash Madasu, et al.
4

Video retrieval has seen tremendous progress with the development of vision-language models. However, further improving these models require additional labelled data which is a huge manual effort. In this paper, we propose a framework MKTVR, that utilizes knowledge transfer from a multilingual model to boost the performance of video retrieval. We first use state-of-the-art machine translation models to construct pseudo ground-truth multilingual video-text pairs. We then use this data to learn a video-text representation where English and non-English text queries are represented in a common embedding space based on pretrained multilingual models. We evaluate our proposed approach on four English video retrieval datasets such as MSRVTT, MSVD, DiDeMo and Charades. Experimental results demonstrate that our approach achieves state-of-the-art results on all datasets outperforming previous models. Finally, we also evaluate our model on a multilingual video-retrieval dataset encompassing six languages and show that our model outperforms previous multilingual video retrieval models in a zero-shot setting.

READ FULL TEXT
research
03/16/2021

Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Models

This paper studies zero-shot cross-lingual transfer of vision-language m...
research
12/30/2019

Teaching a New Dog Old Tricks: Resurrecting Multilingual Retrieval Using Zero-shot Learning

While billions of non-English speaking users rely on search engines ever...
research
07/06/2023

MultiVENT: Multilingual Videos of Events with Aligned Natural Text

Everyday news coverage has shifted from traditional broadcasts towards a...
research
03/19/2021

MDMMT: Multidomain Multimodal Transformer for Video Retrieval

We present a new state-of-the-art on the text to video retrieval task on...
research
06/20/2023

MSVD-Indonesian: A Benchmark for Multimodal Video-Text Tasks in Indonesian

Multimodal learning on video and text data has been receiving growing at...
research
05/15/2023

Soft Prompt Decoding for Multilingual Dense Retrieval

In this work, we explore a Multilingual Information Retrieval (MLIR) tas...
research
07/28/2023

Multilingual Lexical Simplification via Paraphrase Generation

Lexical simplification (LS) methods based on pretrained language models ...

Please sign up or login with your details

Forgot password? Click here to reset