Learning Rich Image Region Representation for Visual Question Answering

10/29/2019
by   Bei Liu, et al.
0

We propose to boost VQA by leveraging more powerful feature extractors by improving the representation ability of both visual and text features and the ensemble of models. For visual feature, some detection techniques are used to improve the detector. For text feature, we adopt BERT as the language model and find that it can significantly improve VQA performance. Our solution won the second place in the VQA Challenge 2019.

READ FULL TEXT

page 1

page 2

page 3

research
10/28/2020

Leveraging Visual Question Answering to Improve Text-to-Image Synthesis

Generating images from textual descriptions has recently attracted a lot...
research
06/02/2022

REVIVE: Regional Visual Representation Matters in Knowledge-Based Visual Question Answering

This paper revisits visual representation in knowledge-based visual ques...
research
05/04/2022

All You May Need for VQA are Image Captions

Visual Question Answering (VQA) has benefited from increasingly sophisti...
research
09/27/2021

VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering

We present VQA-MHUG - a novel 49-participant dataset of multimodal human...
research
08/19/2023

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

Vision Language Models (VLMs), which extend Large Language Models (LLM) ...
research
03/09/2023

VQA-based Robotic State Recognition Optimized with Genetic Algorithm

State recognition of objects and environment in robots has been conducte...
research
06/22/2022

Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer

Visual question answering (VQA) in surgery is largely unexplored. Expert...

Please sign up or login with your details

Forgot password? Click here to reset