Global Voices: Crossing Borders in Automatic News Summarization

10/01/2019
by   Khanh Nguyen, et al.
0

We construct Global Voices, a multilingual dataset for evaluating cross-lingual summarization methods. We extract social-network descriptions of Global Voices news articles to cheaply collect evaluation data for into-English and from-English summarization in 15 languages. Especially, for the into-English summarization task, we crowd-source a high-quality evaluation dataset based on guidelines that emphasize accuracy, coverage, and understandability. To ensure the quality of this dataset, we collect human ratings to filter out bad summaries, and conduct a survey on humans, which shows that the remaining summaries are preferred over the social-network summaries. We study the effect of translation quality in cross-lingual summarization, comparing a translate-then-summarize approach with several baselines. Our results highlight the limitations of the ROUGE metric that are overlooked in monolingual summarization.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/31/2019

NCLS: Neural Cross-Lingual Summarization

Cross-lingual summarization (CLS) is the task to produce a summary in on...
research
12/08/2020

Cross-lingual Approach to Abstractive Summarization

Automatic text summarization extracts important information from texts a...
research
05/30/2022

X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents

The number of scientific publications nowadays is rapidly increasing, ca...
research
06/22/2023

Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation

While summarization has been extensively researched in natural language ...
research
03/07/2023

CroCoSum: A Benchmark Dataset for Cross-Lingual Code-Switched Summarization

Cross-lingual summarization (CLS) has attracted increasing interest in r...
research
04/21/2022

Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive Approach Using Transformers

Relevant and timely information collected from social media during crise...
research
10/24/2022

EUR-Lex-Sum: A Multi- and Cross-lingual Dataset for Long-form Summarization in the Legal Domain

Existing summarization datasets come with two main drawbacks: (1) They t...

Please sign up or login with your details

Forgot password? Click here to reset