Crowdsourcing Ground Truth for Medical Relation Extraction

01/09/2017
by   Anca Dumitrache, et al.
0

Cognitive computing systems require human labeled data for evaluation, and often for training. The standard practice used in gathering this data minimizes disagreement between annotators, and we have found this results in data that fails to account for the ambiguity inherent in language. We have proposed the CrowdTruth method for collecting ground truth through crowdsourcing, that reconsiders the role of people in machine learning based on the observation that disagreement between annotators provides a useful signal for phenomena such as ambiguity in the text. We report on using this method to build an annotated data set for medical relation extraction for the cause and treat relations, and how this data performed in a supervised training experiment. We demonstrate that by modeling ambiguity, labeled data gathered from crowd workers can (1) reach the level of quality of domain experts for this task while reducing the cost, and (2) provide better training data at scale than distant supervision. We further propose and validate new weighted measures for precision, recall, and F-measure, that account for ambiguity in both human and machine performance on this task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/14/2017

False Positive and Cross-relation Signals in Distant Supervision Data

Distant supervision (DS) is a well-established method for relation extra...
research
09/24/2018

Empirical Methodology for Crowdsourcing Ground Truth

The process of gathering ground truth data through human annotation is a...
research
01/04/2023

Learning Ambiguity from Crowd Sequential Annotations

Most crowdsourcing learning methods treat disagreement between annotator...
research
12/08/2020

Improving Human-Labeled Data through Dynamic Automatic Conflict Resolution

This paper develops and implements a scalable methodology for (a) estima...
research
06/27/2023

"Is a picture of a bird a bird": Policy recommendations for dealing with ambiguity in machine vision models

Many questions that we ask about the world do not have a single clear an...
research
12/25/2020

Distributional Ground Truth: Non-Redundant Crowdsourcing Data Quality Control in UI Labeling Tasks

HCI increasingly employs Machine Learning and Image Recognition, in part...
research
06/26/2019

Eliciting Knowledge from Experts:Automatic Transcript Parsing for Cognitive Task Analysis

Cognitive task analysis (CTA) is a type of analysis in applied psycholog...

Please sign up or login with your details

Forgot password? Click here to reset