Event-based clinical findings extraction from radiology reports with pre-trained language model

by   Wilson Lau, et al.

Radiology reports contain a diverse and rich set of clinical abnormalities documented by radiologists during their interpretation of the images. Comprehensive semantic representations of radiological findings would enable a wide range of secondary use applications to support diagnosis, triage, outcomes prediction, and clinical research. In this paper, we present a new corpus of radiology reports annotated with clinical findings. Our annotation schema captures detailed representations of pathologic findings that are observable on imaging ("lesions") and other types of clinical problems ("medical problems"). The schema used an event-based representation to capture fine-grained details, including assertion, anatomy, characteristics, size, count, etc. Our gold standard corpus contained a total of 500 annotated computed tomography (CT) reports. We extracted triggers and argument entities using two state-of-the-art deep learning architectures, including BERT. We then predicted the linkages between trigger and argument entities (referred to as argument roles) using a BERT-based relation extraction model. We achieved the best extraction performance using a BERT model pre-trained on 3 million radiology reports from our institution: 90.9 arguments roles. To assess model generalizability, we used an external validation set randomly sampled from the MIMIC Chest X-ray (MIMIC-CXR) database. The extraction performance on this validation set was 95.6 finding triggers and 79.1 model generalized well to the cross-institutional data with a different imaging modality. We extracted the finding events from all the radiology reports in the MIMIC-CXR database and provided the extractions to the research community.



page 1

page 2

page 3

page 4


RadGraph: Extracting Clinical Entities and Relations from Radiology Reports

Extracting structured clinical information from free-text radiology repo...

RadLex Normalization in Radiology Reports

Radiology reports have been widely used for extraction of various clinic...

Extracting Radiological Findings With Normalized Anatomical Information Using a Span-Based BERT Relation Extraction Model

Medical imaging is critical to the diagnosis and treatment of numerous m...

NegBio: a high-performance tool for negation and uncertainty detection in radiology reports

Negative and uncertain medical findings are frequent in radiology report...

TextRay: Mining Clinical Reports to Gain a Broad Understanding of Chest X-rays

The chest X-ray (CXR) is by far the most commonly performed radiological...

Fine-tuning ERNIE for chest abnormal imaging signs extraction

Chest imaging reports describe the results of chest radiography procedur...

Code Repositories


An easy-to-use deep learning framework to extract events (named entities, relations) from unstructured text using BERT based pre-trained language models.

view repo
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.