Analyzing the Granularity and Cost of Annotation in Clinical Sequence Labeling

08/23/2021
by   Haozhan Sun, et al.
0

Well-annotated datasets, as shown in recent top studies, are becoming more important for researchers than ever before in supervised machine learning (ML). However, the dataset annotation process and its related human labor costs remain overlooked. In this work, we analyze the relationship between the annotation granularity and ML performance in sequence labeling, using clinical records from nursing shift-change handover. We first study a model derived from textual language features alone, without additional information based on nursing knowledge. We find that this sequence tagger performs well in most categories under this granularity. Then, we further include the additional manual annotations by a nurse, and find the sequence tagging performance remaining nearly the same. Finally, we give a guideline and reference to the community arguing it is not necessary and even not recommended to annotate in detailed granularity because of a low Return on Investment. Therefore we recommend emphasizing other features, like textual knowledge, for researchers and practitioners as a cost-effective source for increasing the sequence labeling performance.

READ FULL TEXT
research
09/08/2021

A Survey on Machine Learning Techniques for Auto Labeling of Video, Audio, and Text Data

Machine learning has been utilized to perform tasks in many different do...
research
06/12/2018

Design Challenges and Misconceptions in Neural Sequence Labeling

We investigate the design challenges of constructing effective and effic...
research
04/02/2022

BERT-Assisted Semantic Annotation Correction for Emotion-Related Questions

Annotated data have traditionally been used to provide the input for tra...
research
09/07/2023

Word segmentation granularity in Korean

This paper describes word segmentation granularity in Korean language pr...
research
03/06/2020

Practical Annotation Strategies for Question Answering Datasets

Annotating datasets for question answering (QA) tasks is very costly, as...
research
06/09/2022

CrowdWorkSheets: Accounting for Individual and Collective Identities Underlying Crowdsourced Dataset Annotation

Human annotated data plays a crucial role in machine learning (ML) resea...
research
09/09/2021

Toward a Perspectivist Turn in Ground Truthing for Predictive Computing

Most Artificial Intelligence applications are based on supervised machin...

Please sign up or login with your details

Forgot password? Click here to reset