HybridQA: A Dataset of Multi-Hop Question Answering over Tabular and Textual Data

04/15/2020
by   Wenhu Chen, et al.
1

Existing question answering datasets focus on dealing with homogeneous information, based either only on text or KB/Table information alone. However, as human knowledge is distributed over heterogeneous forms, using homogeneous information might lead to severe coverage problems. To fill in the gap, we present , a new large-scale question-answering dataset that requires reasoning on heterogeneous information. Each question is aligned with a structured Wikipedia table and multiple free-form corpora linked with the entities in the table. The questions are designed to aggregate both tabular information and text information, i.e. lack of either form would render the question unanswerable. We test with three different models: 1) table-only model. 2) text-only model. 3) a hybrid model which combines both table and textual information to build a reasoning path towards the answer. The experimental results show that the first two baselines obtain compromised scores below 20%, while significantly boosts EM score to over 50%, which proves the necessity to aggregate both structure and unstructured information in . However, 's score is still far behind human performance, hence we believe to an ideal and challenging benchmark to study question answering under heterogeneous information. The dataset and code are available at <https://github.com/wenhuchen/HybridQA>.

READ FULL TEXT

page 2

page 4

page 12

research
05/01/2020

Diverse Visuo-Lingustic Question Answering (DVLQA) Challenge

Existing question answering datasets mostly contain homogeneous contexts...
research
10/20/2020

Open Question Answering over Tables and Text

In open question answering (QA), the answer to a question is produced by...
research
04/13/2021

Multi-Step Reasoning Over Unstructured Text with Beam Dense Retrieval

Complex question answering often requires finding a reasoning chain that...
research
05/05/2023

Multi-View Graph Representation Learning for Answering Hybrid Numerical Reasoning Question

Hybrid question answering (HybridQA) over the financial report contains ...
research
09/15/2021

Topic Transferable Table Question Answering

Weakly-supervised table question-answering(TableQA) models have achieved...
research
10/15/2022

UniRPG: Unified Discrete Reasoning over Table and Text as Program Generation

Question answering requiring discrete reasoning, e.g., arithmetic comput...
research
05/21/2023

TheoremQA: A Theorem-driven Question Answering dataset

The recent LLMs like GPT-4 and PaLM-2 have made tremendous progress in s...

Please sign up or login with your details

Forgot password? Click here to reset