VLSP2022-EVJVQA Challenge: Multilingual Visual Question Answering

02/23/2023
by   Ngan Luu-Thuy Nguyen, et al.
0

Visual Question Answering (VQA) is a challenging task of natural language processing (NLP) and computer vision (CV), attracting significant attention from researchers. English is a resource-rich language that has witnessed various developments in datasets and models for visual question answering. Visual question answering in other languages also would be developed for resources and models. In addition, there is no multilingual dataset targeting the visual content of a particular country with its own objects and cultural characteristics. To address the weakness, we provide the research community with a benchmark dataset named EVJVQA, including 33,000+ pairs of question-answer over three languages: Vietnamese, English, and Japanese, on approximately 5,000 images taken from Vietnam for evaluating multilingual VQA systems or models. EVJVQA is used as a benchmark dataset for the challenge of multilingual visual question answering at the 9th Workshop on Vietnamese Language and Speech Processing (VLSP 2022). This task attracted 62 participant teams from various universities and organizations. In this article, we present details of the organization of the challenge, an overview of the methods employed by shared-task participants, and the results. The highest performances are 0.4392 in F1-score and 0.4009 in BLUE on the private test set. The multilingual QA systems proposed by the top 2 teams use ViT for the pre-trained vision model and mT5 for the pre-trained language model, a powerful pre-trained language model based on the transformer architecture. EVJVQA is a challenging dataset that motivates NLP and CV researchers to further explore the multilingual models or systems for visual question answering systems.

READ FULL TEXT

page 3

page 11

page 12

page 16

page 17

page 18

page 19

page 20

research
03/22/2022

VLSP 2021 Shared Task: Vietnamese Machine Reading Comprehension

One of the emerging research trends in natural language understanding is...
research
07/28/2023

BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering

Visual Question Answering (VQA) is an intricate and demanding task that ...
research
03/22/2023

Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering

Visual Question Answering (VQA) is a task that requires computers to giv...
research
01/17/2023

Curriculum Script Distillation for Multilingual Visual Question Answering

Pre-trained models with dual and cross encoders have shown remarkable su...
research
10/21/2022

LittleBird: Efficient Faster Longer Transformer for Question Answering

BERT has shown a lot of sucess in a wide variety of NLP tasks. But it ha...
research
07/12/2022

A Novel DeBERTa-based Model for Financial Question Answering Task

As a rising star in the field of natural language processing, question a...
research
07/06/2023

UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering

In recent years, artificial intelligence has played an important role in...

Please sign up or login with your details

Forgot password? Click here to reset