Multimodal Inverse Cloze Task for Knowledge-based Visual Question Answering

01/11/2023
by   Paul Lerner, et al.
0

We present a new pre-training method, Multimodal Inverse Cloze Task, for Knowledge-based Visual Question Answering about named Entities (KVQAE). KVQAE is a recently introduced task that consists in answering questions about named entities grounded in a visual context using a Knowledge Base. Therefore, the interaction between the modalities is paramount to retrieve information and must be captured with complex fusion models. As these models require a lot of training data, we design this pre-training task from existing work in textual Question Answering. It consists in considering a sentence as a pseudo-question and its context as a pseudo-relevant passage and is extended by considering images near texts in multimodal documents. Our method is applicable to different neural network architectures and leads to a 9 relative-F1 gain for retrieval and reading comprehension, respectively, over a no-pre-training baseline.

READ FULL TEXT
research
11/02/2019

How to Pre-Train Your Model? Comparison of Different Pre-Training Models for Biomedical Question Answering

Using deep learning models on small scale datasets would result in overf...
research
06/28/2023

Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering

This paper studies a category of visual question answering tasks, in whi...
research
11/29/2018

Visual Question Answering as Reading Comprehension

Visual question answering (VQA) demands simultaneous comprehension of bo...
research
04/12/2023

CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes

Training models to apply linguistic knowledge and visual concepts from 2...
research
09/04/2018

Straight to the Facts: Learning Knowledge Base Retrieval for Factual Visual Question Answering

Question answering is an important task for autonomous agents and virtua...
research
08/31/2023

Separate and Locate: Rethink the Text in Text-based Visual Question Answering

Text-based Visual Question Answering (TextVQA) aims at answering questio...
research
04/22/2018

Named Entities troubling your Neural Methods? Build NE-Table: A neural approach for handling Named Entities

Many natural language processing tasks require dealing with Named Entiti...

Please sign up or login with your details

Forgot password? Click here to reset