Paragraph-level Simplification of Medical Texts

04/12/2021
by   Ashwin Devaraj, et al.
0

We consider the problem of learning to simplify medical texts. This is important because most reliable, up-to-date information in biomedicine is dense with jargon and thus practically inaccessible to the lay audience. Furthermore, manual simplification does not scale to the rapidly growing body of biomedical literature, motivating the need for automated approaches. Unfortunately, there are no large-scale resources available for this task. In this work we introduce a new corpus of parallel texts in English comprising technical and lay summaries of all published evidence pertaining to different clinical topics. We then propose a new metric based on likelihood scores from a masked language model pretrained on scientific texts. We show that this automated measure better differentiates between technical and lay summaries than existing heuristics. We introduce and evaluate baseline encoder-decoder Transformer models for simplification and propose a novel augmentation to these in which we explicitly penalize the decoder for producing "jargon" terms; we find that this yields improvements over baselines in terms of readability.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/27/2020

Mind The Facts: Knowledge-Boosted Coherent Abstractive Text Summarization

Neural models have become successful at producing abstractive summaries ...
research
10/10/2022

Readability Controllable Biomedical Document Summarization

Different from general documents, it is recognised that the ease with wh...
research
05/21/2023

Multilingual Simplification of Medical Texts

Automated text simplification aims to produce simple versions of complex...
research
10/18/2020

Chart-to-Text: Generating Natural Language Descriptions for Charts by Adapting the Transformer Model

Information visualizations such as bar charts and line charts are very p...
research
02/05/2016

Utilização de Grafos e Matriz de Similaridade na Sumarização Automática de Documentos Baseada em Extração de Frases

The internet increased the amount of information available. However, the...
research
06/21/2019

SurfCon: Synonym Discovery on Privacy-Aware Clinical Data

Unstructured clinical texts contain rich health-related information. To ...

Please sign up or login with your details

Forgot password? Click here to reset