X-SCITLDR: Cross-Lingual Extreme Summarization of Scholarly Documents

05/30/2022
by   Sotaro Takeshita, et al.
12

The number of scientific publications nowadays is rapidly increasing, causing information overload for researchers and making it hard for scholars to keep up to date with current trends and lines of work. Consequently, recent work on applying text mining technologies for scholarly publications has investigated the application of automatic text summarization technologies, including extreme summarization, for this domain. However, previous work has concentrated only on monolingual settings, primarily in English. In this paper, we fill this research gap and present an abstractive cross-lingual summarization dataset for four different languages in the scholarly domain, which enables us to train and evaluate models that process English papers and generate summaries in German, Italian, Chinese and Japanese. We present our new X-SCITLDR dataset for multilingual summarization and thoroughly benchmark different models based on a state-of-the-art multilingual pre-trained model, including a two-stage `summarize and translate' approach and a direct cross-lingual model. We additionally explore the benefits of intermediate-stage training using English monolingual summarization and machine translation as intermediate tasks and analyze performance in zero- and few-shot scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/16/2023

Towards Unifying Multi-Lingual and Cross-Lingual Summarization

To adapt text summarization to the multilingual world, previous work pro...
research
04/28/2022

Neural Label Search for Zero-Shot Multi-Lingual Extractive Summarization

In zero-shot multilingual extractive text summarization, a model is typi...
research
11/07/2021

Cross-Lingual Citations in English Papers: A Large-Scale Analysis of Prevalence, Usage, and Impact

Citation information in scholarly data is an important source of insight...
research
10/01/2019

Global Voices: Crossing Borders in Automatic News Summarization

We construct Global Voices, a multilingual dataset for evaluating cross-...
research
04/15/2021

A Survey of Recent Abstract Summarization Techniques

This paper surveys several recent abstract summarization methods: T5, Pe...
research
03/08/2022

A Variational Hierarchical Model for Neural Cross-Lingual Summarization

The goal of the cross-lingual summarization (CLS) is to convert a docume...
research
06/22/2023

Cross-lingual Cross-temporal Summarization: Dataset, Models, Evaluation

While summarization has been extensively researched in natural language ...

Please sign up or login with your details

Forgot password? Click here to reset