HiTab: A Hierarchical Table Dataset for Question Answering and Natural Language Generation

08/15/2021
by   Zhoujun Cheng, et al.
0

Tables are often created with hierarchies, but existing works on table reasoning mainly focus on flat tables and neglect hierarchical tables. Hierarchical tables challenge existing methods by hierarchical indexing, as well as implicit relationships of calculation and semantics. This work presents HiTab, a free and open dataset to study question answering (QA) and natural language generation (NLG) over hierarchical tables. HiTab is a cross-domain dataset constructed from a wealth of statistical reports (analyses) and Wikipedia pages, and has unique characteristics: (1) nearly all tables are hierarchical, and (2) both target sentences for NLG and questions for QA are revised from original, meaningful, and diverse descriptive sentences authored by analysts and professions of reports. (3) to reveal complex numerical reasoning in statistical analyses, we provide fine-grained annotations of entity and quantity alignment. HiTab provides 10,686 QA pairs and descriptive sentences with well-annotated quantity and entity alignment on 3,597 tables with broad coverage of table hierarchies and numerical reasoning types. Targeting hierarchical structure, we devise a novel hierarchy-aware logical form for symbolic reasoning over tables, which shows high effectiveness. Targeting complex numerical reasoning, we propose partially supervised training given annotations of entity and quantity alignment, which helps models to largely reduce spurious predictions in the QA task. In the NLG task, we find that entity and quantity alignment also helps NLG models to generate better results in a conditional generation setting. Experiment results of state-of-the-art baselines suggest that this dataset presents a strong challenge and a valuable benchmark for future research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2019

Question Answering via Web Extracted Tables and Pipelined Models

In this paper, we describe a dataset and baseline result for a question ...
research
09/19/2023

Enhancing Open-Domain Table Question Answering via Syntax- and Structure-aware Dense Retrieval

Open-domain table question answering aims to provide answers to a questi...
research
05/24/2023

TACR: A Table-alignment-based Cell-selection and Reasoning Model for Hybrid Question-Answering

Hybrid Question-Answering (HQA), which targets reasoning over tables and...
research
07/08/2022

OmniTab: Pretraining with Natural and Synthetic Data for Few-shot Table-based Question Answering

The information in tables can be an important complement to text, making...
research
05/17/2021

TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance

Hybrid data combining both tabular and textual content (e.g., financial ...
research
05/12/2023

Open-WikiTable: Dataset for Open Domain Question Answering with Complex Reasoning over Table

Despite recent interest in open domain question answering (ODQA) over ta...
research
11/23/2022

DyRRen: A Dynamic Retriever-Reranker-Generator Model for Numerical Reasoning over Tabular and Textual Data

Numerical reasoning over hybrid data containing tables and long texts ha...

Please sign up or login with your details

Forgot password? Click here to reset