Clustering of Deep Contextualized Representations for Summarization of Biomedical Texts

by   Milad Moradi, et al.

In recent years, summarizers that incorporate domain knowledge into the process of text summarization have outperformed generic methods, especially for summarization of biomedical texts. However, construction and maintenance of domain knowledge bases are resource-intense tasks requiring significant manual annotation. In this paper, we demonstrate that contextualized representations extracted from the pre-trained deep language model BERT, can be effectively used to measure the similarity between sentences and to quantify the informative content. The results show that our BERT-based summarizer can improve the performance of biomedical summarization. Although the summarizer does not use any sources of domain knowledge, it can capture the context of sentences more accurately than the comparison methods. The source code and data are available at


page 1

page 2

page 3

page 4


BioBERT: a pre-trained biomedical language representation model for biomedical text mining

Biomedical text mining is becoming increasingly important as the number ...

Graph-Augmented Cyclic Learning Framework for Similarity Estimation of Medical Clinical Notes

Semantic textual similarity (STS) in the clinical domain helps improve d...

A Survey on Biomedical Text Summarization with Pre-trained Language Model

The exponential growth of biomedical texts such as biomedical literature...

Contextualizing Citations for Scientific Summarization using Word Embeddings and Domain Knowledge

Citation texts are sometimes not very informative or in some cases inacc...

Enriching language models with graph-based context information to better understand textual data

A considerable number of texts encountered daily are somehow connected w...

SuMe: A Dataset Towards Summarizing Biomedical Mechanisms

Can language models read biomedical texts and explain the biomedical mec...

PARADE: A New Dataset for Paraphrase Identification Requiring Computer Science Domain Knowledge

We present a new benchmark dataset called PARADE for paraphrase identifi...

Please sign up or login with your details

Forgot password? Click here to reset