Making Science Simple: Corpora for the Lay Summarisation of Scientific Literature

10/18/2022
by   Tomas Goldsack, et al.
0

Lay summarisation aims to jointly summarise and simplify a given text, thus making its content more comprehensible to non-experts. Automatic approaches for lay summarisation can provide significant value in broadening access to scientific literature, enabling a greater degree of both interdisciplinary knowledge sharing and public understanding when it comes to research findings. However, current corpora for this task are limited in their size and scope, hindering the development of broadly applicable data-driven approaches. Aiming to rectify these issues, we present two novel lay summarisation datasets, PLOS (large-scale) and eLife (medium-scale), each of which contains biomedical journal articles alongside expert-written lay summaries. We provide a thorough characterisation of our lay summaries, highlighting differing levels of readability and abstractiveness between datasets that can be leveraged to support the needs of different applications. Finally, we benchmark our datasets using mainstream summarisation approaches and perform a manual evaluation with domain experts, demonstrating their utility and casting light on the key challenges of this task.

READ FULL TEXT

page 4

page 6

page 12

research
06/10/2019

BIGPATENT: A Large-Scale Dataset for Abstractive and Coherent Summarization

Most existing text summarization datasets are compiled from the news dom...
research
08/29/2019

A Summarization System for Scientific Documents

We present a novel system providing summaries for Computer Science publi...
research
03/29/2022

LDKP: A Dataset for Identifying Keyphrases from Long Scientific Documents

Identifying keyphrases (KPs) from text documents is a fundamental task i...
research
05/10/2023

Summarizing, Simplifying, and Synthesizing Medical Evidence Using GPT-3 (with Varying Success)

Large language models, particularly GPT-3, are able to produce high qual...
research
05/19/2023

DMDD: A Large-Scale Dataset for Dataset Mentions Detection

The recognition of dataset names is a critical task for automatic inform...
research
10/07/2021

GeSERA: General-domain Summary Evaluation by Relevance Analysis

We present GeSERA, an open-source improved version of SERA for evaluatin...
research
07/07/2023

Text Simplification of Scientific Texts for Non-Expert Readers

Reading levels are highly individual and can depend on a text's language...

Please sign up or login with your details

Forgot password? Click here to reset