Revisiting Cross-Lingual Summarization: A Corpus-based Study and A New Benchmark with Improved Annotation

07/08/2023
by   Yulong Chen, et al.
0

Most existing cross-lingual summarization (CLS) work constructs CLS corpora by simply and directly translating pre-annotated summaries from one language to another, which can contain errors from both summarization and translation processes. To address this issue, we propose ConvSumX, a cross-lingual conversation summarization benchmark, through a new annotation schema that explicitly considers source input context. ConvSumX consists of 2 sub-tasks under different real-world scenarios, with each covering 3 language directions. We conduct thorough analysis on ConvSumX and 3 widely-used manually annotated CLS corpora and empirically find that ConvSumX is more faithful towards input text. Additionally, based on the same intuition, we propose a 2-Step method, which takes both conversation and summary as input to simulate human annotation process. Experimental results show that 2-Step method surpasses strong baselines on ConvSumX under both automatic and human evaluation. Analysis shows that both source input text and summary are crucial for modeling cross-lingual summaries.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2023

SimCSum: Joint Learning of Simplification and Cross-lingual Summarization for Cross-lingual Science Journalism

Cross-lingual science journalism generates popular science stories of sc...
research
05/01/2022

The Cross-lingual Conversation Summarization Challenge

We propose the shared task of cross-lingual conversation summarization, ...
research
06/01/2021

ConvoSumm: Conversation Summarization Benchmark and Improved Abstractive Summarization with Argument Mining

While online conversations can cover a vast amount of information in man...
research
02/28/2023

Cross-Lingual Summarization via ChatGPT

Given a document in a source language, cross-lingual summarization (CLS)...
research
03/05/2022

ClueGraphSum: Let Key Clues Guide the Cross-Lingual Abstractive Summarization

Cross-Lingual Summarization (CLS) is the task to generate a summary in o...
research
10/24/2022

EUR-Lex-Sum: A Multi- and Cross-lingual Dataset for Long-form Summarization in the Legal Domain

Existing summarization datasets come with two main drawbacks: (1) They t...
research
03/08/2022

A Variational Hierarchical Model for Neural Cross-Lingual Summarization

The goal of the cross-lingual summarization (CLS) is to convert a docume...

Please sign up or login with your details

Forgot password? Click here to reset