Clustering of Deep Contextualized Representations for Summarization of Biomedical Texts

08/06/2019
by   Milad Moradi, et al.
0

In recent years, summarizers that incorporate domain knowledge into the process of text summarization have outperformed generic methods, especially for summarization of biomedical texts. However, construction and maintenance of domain knowledge bases are resource-intense tasks requiring significant manual annotation. In this paper, we demonstrate that contextualized representations extracted from the pre-trained deep language model BERT, can be effectively used to measure the similarity between sentences and to quantify the informative content. The results show that our BERT-based summarizer can improve the performance of biomedical summarization. Although the summarizer does not use any sources of domain knowledge, it can capture the context of sentences more accurately than the comparison methods. The source code and data are available at https://github.com/BioTextSumm/BERT-based-Summ.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/25/2019

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Biomedical text mining is becoming increasingly important as the number ...
research
08/19/2022

Graph-Augmented Cyclic Learning Framework for Similarity Estimation of Medical Clinical Notes

Semantic textual similarity (STS) in the clinical domain helps improve d...
research
04/18/2023

A Survey on Biomedical Text Summarization with Pre-trained Language Model

The exponential growth of biomedical texts such as biomedical literature...
research
05/23/2017

Contextualizing Citations for Scientific Summarization using Word Embeddings and Domain Knowledge

Citation texts are sometimes not very informative or in some cases inacc...
research
05/10/2023

Enriching language models with graph-based context information to better understand textual data

A considerable number of texts encountered daily are somehow connected w...
research
05/10/2022

SuMe: A Dataset Towards Summarizing Biomedical Mechanisms

Can language models read biomedical texts and explain the biomedical mec...
research
10/08/2020

PARADE: A New Dataset for Paraphrase Identification Requiring Computer Science Domain Knowledge

We present a new benchmark dataset called PARADE for paraphrase identifi...

Please sign up or login with your details

Forgot password? Click here to reset