LabelVizier: Interactive Validation and Relabeling for Technical Text Annotations

03/31/2023
by   Xiaoyu Zhang, et al.
0

With the rapid accumulation of text data produced by data-driven techniques, the task of extracting "data annotations"–concise, high-quality data summaries from unstructured raw text–has become increasingly important. The recent advances in weak supervision and crowd-sourcing techniques provide promising solutions to efficiently create annotations (labels) for large-scale technical text data. However, such annotations may fail in practice because of the change in annotation requirements, application scenarios, and modeling goals, where label validation and relabeling by domain experts are required. To approach this issue, we present LabelVizier, a human-in-the-loop workflow that incorporates domain knowledge and user-specific requirements to reveal actionable insights into annotation flaws, then produce better-quality labels for large-scale multi-label datasets. We implement our workflow as an interactive notebook to facilitate flexible error profiling, in-depth annotation validation for three error types, and efficient annotation relabeling on different data scales. We evaluated the efficiency and generalizability of our workflow with two use cases and four expert reviews. The results indicate that LabelVizier is applicable in various application scenarios and assist domain experts with different knowledge backgrounds to efficiently improve technical text annotation quality.

READ FULL TEXT

page 5

page 6

page 7

page 8

research
05/31/2023

Automated Annotation with Generative AI Requires Validation

Generative large language models (LLMs) can be a powerful tool for augme...
research
11/27/2021

Label Assistant: A Workflow for Assisted Data Annotation in Image Segmentation Tasks

Recent research in the field of computer vision strongly focuses on deep...
research
04/24/2020

TeamTat: a collaborative text annotation tool

Manually annotated data is key to developing text-mining and information...
research
11/30/2021

Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis

Obtaining annotations for large training sets is expensive, especially i...
research
05/22/2020

Knowledge Annotation for Intelligent Textbooks

With the increased popularity of electronic textbooks, there is a growin...
research
05/22/2020

Concept Annotation for Intelligent Textbooks

With the increased popularity of electronic textbooks, there is a growin...
research
06/25/2021

Semantic annotation for computational pathology: Multidisciplinary experience and best practice recommendations

Recent advances in whole slide imaging (WSI) technology have led to the ...

Please sign up or login with your details

Forgot password? Click here to reset