On the diminishing return of labeling clinical reports

10/27/2020
by   Jean Baptiste Lamare, et al.
0

Ample evidence suggests that better machine learning models may be steadily obtained by training on increasingly larger datasets on natural language processing (NLP) problems from non-medical domains. Whether the same holds true for medical NLP has by far not been thoroughly investigated. This work shows that this is indeed not always the case. We reveal the somehow counter-intuitive observation that performant medical NLP models may be obtained with small amount of labeled data, quite the opposite to the common belief, most likely due to the domain specificity of the problem. We show quantitatively the effect of training data size on a fixed test set composed of two of the largest public chest x-ray radiology report datasets on the task of abnormality classification. The trained models not only make use of the training data efficiently, but also outperform the current state-of-the-art rule-based systems by a significant margin.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2020

CheXbert: Combining Automatic Labelers and Expert Annotations for Accurate Radiology Report Labeling Using BERT

The extraction of labels from radiology text reports enables large-scale...
research
08/30/2022

Annotated Dataset Creation through General Purpose Language Models for non-English Medical NLP

Obtaining text datasets with semantic annotations is an effortful proces...
research
07/01/2019

Is artificial data useful for biomedical Natural Language Processing

A major obstacle to the development of Natural Language Processing (NLP)...
research
05/06/2019

Caveats in Generating Medical Imaging Labels from Radiology Reports

Acquiring high-quality annotations in medical imaging is usually a costl...
research
11/18/2020

Inspecting state of the art performance and NLP metrics in image-based medical report generation

Several deep learning architectures have been proposed over the last yea...
research
06/27/2019

Training Models to Extract Treatment Plans from Clinical Notes Using Contents of Sections with Headings

Objective: Using natural language processing (NLP) to find sentences tha...
research
09/19/2021

Training Dynamic based data filtering may not work for NLP datasets

The recent increase in dataset size has brought about significant advanc...

Please sign up or login with your details

Forgot password? Click here to reset