Meta-Learning For Vision-and-Language Cross-lingual Transfer

05/24/2023
by   Hanxu Hu, et al.
0

Current pre-trained vison-language models (PVLMs) achieve excellent performance on a range of multi-modal datasets. Recent work has aimed at building multilingual models, and a range of novel multilingual multi-modal datasets have been proposed. Current PVLMs typically perform poorly on these datasets when used for multi-modal zero-shot or few-shot cross-lingual transfer, especially for low-resource languages. To alleviate this problem, we propose a novel meta-learning fine-tuning framework. Our framework makes current PVLMs rapidly adaptive to new languages in vision-language scenarios by designing MAML in a cross-lingual multi-modal manner. Experiments show that our method boosts the performance of current state-of-the-art PVLMs in both zero-shot and few-shot cross-lingual transfer on a range of vision-language understanding tasks and datasets (XVNLI, xGQA, MaRVL, xFlicker Co

READ FULL TEXT

page 8

page 12

research
05/02/2023

Parameter-Efficient Cross-lingual Transfer of Vision and Language Models via Translation-based Alignment

Pre-trained vision and language models such as CLIP have witnessed remar...
research
03/19/2022

Meta-X_NLG: A Meta-Learning Approach Based on Language Clustering for Zero-Shot Cross-Lingual Transfer and Generation

Recently, the NLP community has witnessed a rapid advancement in multili...
research
11/24/2020

Towards Zero-shot Cross-lingual Image Retrieval

There has been a recent spike in interest in multi-modal Language and Vi...
research
02/25/2020

BERT Can See Out of the Box: On the Cross-modal Transferability of Text Representations

Pre-trained language models such as BERT have recently contributed to si...
research
02/15/2022

Delving Deeper into Cross-lingual Visual Question Answering

Visual question answering (VQA) is one of the crucial vision-and-languag...
research
10/21/2022

On the Calibration of Massively Multilingual Language Models

Massively Multilingual Language Models (MMLMs) have recently gained popu...
research
05/30/2023

Translation-Enhanced Multilingual Text-to-Image Generation

Research on text-to-image generation (TTI) still predominantly focuses o...

Please sign up or login with your details

Forgot password? Click here to reset