TGSum: Build Tweet Guided Multi-Document Summarization Dataset

11/26/2015
by   Ziqiang Cao, et al.
0

The development of summarization research has been significantly hampered by the costly acquisition of reference summaries. This paper proposes an effective way to automatically collect large scales of news-related multi-document summaries with reference to social media's reactions. We utilize two types of social labels in tweets, i.e., hashtags and hyper-links. Hashtags are used to cluster documents into different topic sets. Also, a tweet with a hyper-link often highlights certain key points of the corresponding document. We synthesize a linked document cluster to form a reference summary which can cover most key points. To this aim, we adopt the ROUGE metrics to measure the coverage ratio, and develop an Integer Linear Programming solution to discover the sentence set reaching the upper bound of ROUGE. Since we allow summary sentences to be selected from both documents and high-quality tweets, the generated reference summaries could be abstractive. Both informativeness and readability of the collected summaries are verified by manual judgment. In addition, we train a Support Vector Regression summarizer on DUC generic multi-document summarization benchmarks. With the collected data as extra training resource, the performance of the summarizer improves a lot on all the test sets. We release this dataset for further research.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/12/2000

Centroid-based summarization of multiple documents: sentence extraction, utility-based evaluation, and user studies

We present a multi-document summarizer, called MEAD, which generates sum...
research
07/08/2015

Multi-Document Summarization via Discriminative Summary Reranking

Existing multi-document summarization systems usually rely on a specific...
research
01/06/2017

Enumeration of Extractive Oracle Summaries

To analyze the limitations and the future directions of the extractive s...
research
04/11/2023

LBMT team at VLSP2022-Abmusu: Hybrid method with text correlation and generative models for Vietnamese multi-document summarization

Multi-document summarization is challenging because the summaries should...
research
09/01/2020

SuperPAL: Supervised Proposition ALignment for Multi-Document Summarization and Derivative Sub-Tasks

Multi-document summarization (MDS) is a challenging task, often decompos...
research
06/26/2023

Vietnamese multi-document summary using subgraph selection approach – VLSP 2022 AbMuSu Shared Task

Document summarization is a task to generate afluent, condensed summary ...
research
06/28/2023

Leveraging GPT-4 for Food Effect Summarization to Enhance Product-Specific Guidance Development via Iterative Prompting

Food effect summarization from New Drug Application (NDA) is an essentia...

Please sign up or login with your details

Forgot password? Click here to reset