BI-RADS BERT Using Section Tokenization to Understand Radiology Reports

10/14/2021
by   Grey Kuling, et al.
0

Radiology reports are the main form of communication between radiologists and other clinicians, and contain important information for patient care. However in order to use this information for research it is necessary to convert the raw text into structured data suitable for analysis. Domain specific contextual word embeddings have been shown to achieve impressive accuracy at such natural language processing tasks in medicine. In this work we pre-trained a contextual embedding BERT model using breast radiology reports and developed a classifier that incorporated the embedding with auxiliary global textual features in order to perform a section tokenization task. This model achieved a 98 segregating free text reports into sections of information outlined in the Breast Imaging Reporting and Data System (BI-RADS) lexicon, a significant improvement over the Classic BERT model without auxiliary information. We then evaluated whether using section tokenization improved the downstream extraction of the following fields: modality/procedure, previous cancer, menopausal status, purpose of exam, breast density and background parenchymal enhancement. Using the BERT model pre-trained on breast radiology reports combined with section tokenization resulted in an overall accuracy of 95.9 extraction. This is a 17 for field extraction for models without section tokenization and with Classic BERT embeddings. Our work shows the strength of using BERT in radiology report analysis and the advantages of section tokenization in identifying key features of patient factors recorded in breast radiology reports.

READ FULL TEXT

page 4

page 5

research
12/05/2019

Self-Supervised Contextual Language Representation of Radiology Reports to Improve the Identification of Communication Urgency

Machine learning methods have recently achieved high-performance in biom...
research
04/06/2019

Publicly Available Clinical BERT Embeddings

Contextual word embedding models such as ELMo (Peters et al., 2018) and ...
research
04/12/2021

Learning to Remove: Towards Isotropic Pre-trained BERT Embedding

Pre-trained language models such as BERT have become a more common choic...
research
06/16/2023

Revealing the impact of social circumstances on the selection of cancer therapy through natural language processing of social work notes

We aimed to investigate the impact of social circumstances on cancer the...
research
10/17/2018

Analysis of Railway Accidents' Narratives Using Deep Learning

Automatic understanding of domain specific texts in order to extract use...
research
12/06/2017

An innovative solution for breast cancer textual big data analysis

The digitalization of stored information in hospitals now allows for the...
research
07/06/2020

Labeling of Multilingual Breast MRI Reports

Medical reports are an essential medium in recording a patient's conditi...

Please sign up or login with your details

Forgot password? Click here to reset