Long Text and Multi-Table Summarization: Dataset and Method

02/08/2023
by   Shuaiqi Liu, et al.
0

Automatic document summarization aims to produce a concise summary covering the input document's salient information. Within a report document, the salient information can be scattered in the textual and non-textual content. However, existing document summarization datasets and methods usually focus on the text and filter out the non-textual content. Missing tabular data can limit produced summaries' informativeness, especially when summaries require covering quantitative descriptions of critical metrics in tables. Existing datasets and methods cannot meet the requirements of summarizing long text and multiple tables in each report. To deal with the scarcity of available data, we propose FINDSum, the first large-scale dataset for long text and multi-table summarization. Built on 21,125 annual reports from 3,794 companies, it has two subsets for summarizing each company's results of operations and liquidity. To summarize the long text and dozens of tables in each report, we present three types of summarization methods. Besides, we propose a set of evaluation metrics to assess the usage of numerical information in produced summaries. Dataset analyses and experimental results indicate the importance of jointly considering input textual and tabular data when summarizing report documents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/09/2023

Generating a Structured Summary of Numerous Academic Papers: Dataset and Method

Writing a survey paper on one research topic usually needs to cover the ...
research
05/31/2021

Bringing Structure into Summaries: a Faceted Summarization Dataset for Long Scientific Documents

Faceted summarization provides briefings of a document from different pe...
research
05/24/2023

AWESOME: GPU Memory-constrained Long Document Summarization using Memory Mechanism and Global Salient Content

Long document summarization systems are critical for domains with length...
research
12/09/2010

MUDOS-NG: Multi-document Summaries Using N-gram Graphs (Tech Report)

This report describes the MUDOS-NG summarization system, which applies a...
research
10/22/2018

Beyond ROUGE Scores in Algorithmic Summarization: Creating Fairness-Preserving Textual Summaries

As the amount of textual information grows rapidly, text summarization a...
research
10/22/2018

Fairness-Preserving Text Summarzation

As the amount of textual information grows rapidly, text summarization a...
research
01/29/2021

Fairness for Whom? Understanding the Reader's Perception of Fairness in Text Summarization

With the surge in user-generated textual information, there has been a r...

Please sign up or login with your details

Forgot password? Click here to reset