Clinical Reading Comprehension: A Thorough Analysis of the emrQA Dataset

05/01/2020
by   Xiang Yue, et al.
0

Machine reading comprehension has made great progress in recent years owing to large-scale annotated datasets. In the clinical domain, however, creating such datasets is quite difficult due to the domain expertise required for annotation. Recently, Pampari et al. (EMNLP'18) tackled this issue by using expert-annotated question templates and existing i2b2 annotations to create emrQA, the first large-scale dataset for question answering (QA) based on clinical notes. In this paper, we provide an in-depth analysis of this dataset and the clinical reading comprehension (CliniRC) task. From our qualitative analysis, we find that (i) emrQA answers are often incomplete, and (ii) emrQA questions are often answerable without using domain knowledge. From our quantitative experiments, surprising results include that (iii) using a small sampled subset (5 the model trained on the entire dataset, (iv) this performance is close to human expert's performance, and (v) BERT models do not beat the best performing base model. Following our analysis of the emrQA, we further explore two desired aspects of CliniRC systems: the ability to utilize clinical domain knowledge and to generalize to unseen questions and contexts. We argue that both should be considered when creating future datasets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2016

MS MARCO: A Human Generated MAchine Reading COmprehension Dataset

This paper presents our recent work on the design and development of a n...
research
03/26/2018

CliCR: A Dataset of Clinical Case Reports for Machine Reading Comprehension

We present a new dataset for machine comprehension in the medical domain...
research
02/28/2018

Medical Exam Question Answering with Large-scale Reading Comprehension

Reading and understanding text is one important component in computer ai...
research
06/09/2016

A Thorough Examination of the CNN/Daily Mail Reading Comprehension Task

Enabling a computer to understand a document so that it can answer compr...
research
05/17/2023

A quantitative study of NLP approaches to question difficulty estimation

Recent years witnessed an increase in the amount of research on the task...
research
01/02/2023

MAUD: An Expert-Annotated Legal NLP Dataset for Merger Agreement Understanding

Reading comprehension of legal text can be a particularly challenging ta...
research
05/31/2019

MultiQA: An Empirical Investigation of Generalization and Transfer in Reading Comprehension

A large number of reading comprehension (RC) datasets has been created r...

Please sign up or login with your details

Forgot password? Click here to reset