Passage Retrieval for Outside-Knowledge Visual Question Answering

05/09/2021
by   Chen Qu, et al.
0

In this work, we address multi-modal information needs that contain text questions and images by focusing on passage retrieval for outside-knowledge visual question answering. This task requires access to outside knowledge, which in our case we define to be a large unstructured passage collection. We first conduct sparse retrieval with BM25 and study expanding the question with object names and image captions. We verify that visual clues play an important role and captions tend to be more informative than object names in sparse retrieval. We then construct a dual-encoder dense retriever, with the query encoder being LXMERT, a multi-modal pre-trained transformer. We further show that dense retrieval significantly outperforms sparse retrieval that uses object expansion. Moreover, dense retrieval matches the performance of sparse retrieval that leverages human-generated captions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/28/2023

Pre-Training Multi-Modal Dense Retrievers for Outside-Knowledge Visual Question Answering

This paper studies a category of visual question answering tasks, in whi...
research
04/16/2021

Cross-Modal Retrieval Augmentation for Multi-Modal Classification

Recent advances in using retrieval components over external knowledge so...
research
08/22/2022

Revising Image-Text Retrieval via Multi-Modal Entailment

An outstanding image-text retrieval model depends on high-quality labele...
research
09/19/2019

Look, Read and Enrich. Learning from Scientific Figures and their Captions

Compared to natural images, understanding scientific figures is particul...
research
08/09/2021

Multi-modal Retrieval of Tables and Texts Using Tri-encoder Models

Open-domain extractive question answering works well on textual data by ...
research
09/17/2020

Generation-Augmented Retrieval for Open-domain Question Answering

Conventional sparse retrieval methods such as TF-IDF and BM25 are simple...
research
06/14/2019

Improving Visual Question Answering by Referring to Generated Paragraph Captions

Paragraph-style image captions describe diverse aspects of an image as o...

Please sign up or login with your details

Forgot password? Click here to reset