Curriculum Script Distillation for Multilingual Visual Question Answering

01/17/2023
by   Khyathi Raghavi Chandu, et al.
0

Pre-trained models with dual and cross encoders have shown remarkable success in propelling the landscape of several tasks in vision and language in Visual Question Answering (VQA). However, since they are limited by the requirements of gold annotated data, most of these advancements do not see the light of day in other languages beyond English. We aim to address this problem by introducing a curriculum based on the source and target language translations to finetune the pre-trained models for the downstream task. Experimental results demonstrate that script plays a vital role in the performance of these models. Specifically, we show that target languages that share the same script perform better ( 6 languages perform better than their counterparts ( 5-12

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/23/2023

VLSP2022-EVJVQA Challenge: Multilingual Visual Question Answering

Visual Question Answering (VQA) is a challenging task of natural languag...
research
11/05/2020

EXAMS: A Multi-Subject High School Examinations Dataset for Cross-Lingual and Multilingual Question Answering

We propose EXAMS – a new benchmark dataset for cross-lingual and multili...
research
07/28/2023

BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering

Visual Question Answering (VQA) is an intricate and demanding task that ...
research
06/07/2022

cViL: Cross-Lingual Training of Vision-Language Models using Knowledge Distillation

Vision-and-language tasks are gaining popularity in the research communi...
research
04/30/2021

Chop Chop BERT: Visual Question Answering by Chopping VisualBERT's Heads

Vision-and-Language (VL) pre-training has shown great potential on many ...
research
10/26/2022

Compressing And Debiasing Vision-Language Pre-Trained Models for Visual Question Answering

Despite the excellent performance of large-scale vision-language pre-tra...
research
08/10/2021

BERTHop: An Effective Vision-and-Language Model for Chest X-ray Disease Diagnosis

Vision-and-language(V L) models take image and text as input and learn...

Please sign up or login with your details

Forgot password? Click here to reset