Enhancing Multi-modal and Multi-hop Question Answering via Structured Knowledge and Unified Retrieval-Generation

12/16/2022
by   Qian Yang, et al.
0

Multi-modal and multi-hop question answering aims to answer a question based on multiple input sources from different modalities. Previous methods retrieve the evidence separately and feed the retrieved evidence to a language model to generate the corresponding answer. However, these methods fail to build connections between candidates and thus cannot model the inter-dependent relation during retrieval. Moreover, the reasoning process over multi-modality candidates can be unbalanced without building alignments between different modalities. To address this limitation, we propose a Structured Knowledge and Unified Retrieval Generation based method (SKURG). We align the sources from different modalities via the shared entities and map them into a shared semantic space via structured knowledge. Then, we utilize a unified retrieval-generation decoder to integrate intermediate retrieval results for answer generation and adaptively determine the number of retrieval steps. We perform experiments on two multi-modal and multi-hop datasets: WebQA and MultimodalQA. The results demonstrate that SKURG achieves state-of-the-art performance on both retrieval and answer generation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2018

Holistic Multi-modal Memory Network for Movie Question Answering

Answering questions according to multi-modal context is a challenging pr...
research
12/02/2022

UniKGQA: Unified Retrieval and Reasoning for Solving Multi-hop Question Answering Over Knowledge Graph

Multi-hop Question Answering over Knowledge Graph (KGQA) aims to find th...
research
08/03/2018

Visual Reasoning with Multi-hop Feature Modulation

Recent breakthroughs in computer vision and natural language processing ...
research
06/29/2023

Unified Language Representation for Question Answering over Text, Tables, and Images

When trying to answer complex questions, people often rely on multiple s...
research
09/01/2022

Universal Multi-Modality Retrieval with One Unified Embedding Space

This paper presents Vision-Language Universal Search (VL-UnivSearch), wh...
research
10/21/2020

Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies

Many recent datasets contain a variety of different data modalities, for...
research
10/25/2022

Rich Knowledge Sources Bring Complex Knowledge Conflicts: Recalibrating Models to Reflect Conflicting Evidence

Question answering models can use rich knowledge sources – up to one hun...

Please sign up or login with your details

Forgot password? Click here to reset