Integrating Image Features with Convolutional Sequence-to-sequence Network for Multilingual Visual Question Answering

03/22/2023
by   Triet Minh Thai, et al.
0

Visual Question Answering (VQA) is a task that requires computers to give correct answers for the input questions based on the images. This task can be solved by humans with ease but is a challenge for computers. The VLSP2022-EVJVQA shared task carries the Visual Question Answering task in the multilingual domain on a newly released dataset: UIT-EVJVQA, in which the questions and answers are written in three different languages: English, Vietnamese and Japanese. We approached the challenge as a sequence-to-sequence learning task, in which we integrated hints from pre-trained state-of-the-art VQA models and image features with Convolutional Sequence-to-Sequence network to generate the desired answers. Our results obtained up to 0.3442 by F1 score on the public test set, 0.4210 on the private test set, and placed 3rd in the competition.

READ FULL TEXT

page 2

page 13

page 14

research
02/23/2023

VLSP2022-EVJVQA Challenge: Multilingual Visual Question Answering

Visual Question Answering (VQA) is a challenging task of natural languag...
research
07/06/2023

UIT-Saviors at MEDVQA-GI 2023: Improving Multimodal Learning with Image Enhancement for Gastrointestinal Visual Question Answering

In recent years, artificial intelligence has played an important role in...
research
07/28/2023

BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering

Visual Question Answering (VQA) is an intricate and demanding task that ...
research
01/28/2023

BinaryVQA: A Versatile Test Set to Evaluate the Out-of-Distribution Generalization of VQA Models

We introduce a new test set for visual question answering (VQA) called B...
research
09/29/2020

Sequence-to-Sequence Learning for Indonesian Automatic Question Generator

Automatic question generation is defined as the task of automating the c...
research
06/04/2021

Visual Question Rewriting for Increasing Response Rate

When a human asks questions online, or when a conversational virtual age...
research
06/20/2016

DualNet: Domain-Invariant Network for Visual Question Answering

Visual question answering (VQA) task not only bridges the gap between im...

Please sign up or login with your details

Forgot password? Click here to reset