Improving the Cross-Lingual Generalisation in Visual Question Answering

09/07/2022
by   Farhad Nooralahzadeh, et al.
0

While several benefits were realized for multilingual vision-language pretrained models, recent benchmarks across various tasks and languages showed poor cross-lingual generalisation when multilingually pre-trained vision-language models are applied to non-English data, with a large gap between (supervised) English performance and (zero-shot) cross-lingual transfer. In this work, we explore the poor performance of these models on a zero-shot cross-lingual visual question answering (VQA) task, where models are fine-tuned on English visual-question data and evaluated on 7 typologically diverse languages. We improve cross-lingual transfer with three strategies: (1) we introduce a linguistic prior objective to augment the cross-entropy loss with a similarity-based loss to guide the model during training, (2) we learn a task-specific subnetwork that improves cross-lingual generalisation and reduces variance without model modification, (3) we augment training examples using synthetic code-mixing to promote alignment of embeddings between source and target languages. Our experiments on xGQA using the pretrained multilingual multimodal transformers UC2 and M3P demonstrate the consistent effectiveness of the proposed fine-tuning strategy for 7 languages, outperforming existing transfer methods with sparse models. Code and data to reproduce our findings are publicly available.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2022

Prompt-Tuning Can Be Much Better Than Fine-Tuning on Cross-lingual Understanding With Multilingual Language Models

Pre-trained multilingual language models show significant performance ga...
research
02/15/2022

Delving Deeper into Cross-lingual Visual Question Answering

Visual question answering (VQA) is one of the crucial vision-and-languag...
research
09/13/2021

xGQA: Cross-Lingual Visual Question Answering

Recent advances in multimodal vision and language modeling have predomin...
research
10/13/2020

Model Selection for Cross-Lingual Transfer using a Learned Scoring Function

Transformers that are pre-trained on multilingual text corpora, such as,...
research
06/02/2023

Distilling Efficient Language-Specific Models for Cross-Lingual Transfer

Massively multilingual Transformers (MMTs), such as mBERT and XLM-R, are...
research
09/13/2021

A Massively Multilingual Analysis of Cross-linguality in Shared Embedding Space

In cross-lingual language models, representations for many different lan...
research
05/16/2023

xPQA: Cross-Lingual Product Question Answering across 12 Languages

Product Question Answering (PQA) systems are key in e-commerce applicati...

Please sign up or login with your details

Forgot password? Click here to reset