OneLabeler: A Flexible System for Building Data Labeling Tools

03/27/2022
by   Yu Zhang, et al.
0

Labeled datasets are essential for supervised machine learning. Various data labeling tools have been built to collect labels in different usage scenarios. However, developing labeling tools is time-consuming, costly, and expertise-demanding on software development. In this paper, we propose a conceptual framework for data labeling and OneLabeler based on the conceptual framework to support easy building of labeling tools for diverse usage scenarios. The framework consists of common modules and states in labeling tools summarized through coding of existing tools. OneLabeler supports configuration and composition of common software modules through visual programming to build data labeling tools. A module can be a human, machine, or mixed computation procedure in data labeling. We demonstrate the expressiveness and utility of the system through ten example labeling tools built with OneLabeler. A user study with developers provides evidence that OneLabeler supports efficient building of diverse data labeling tools.

READ FULL TEXT

page 7

page 10

page 12

research
09/03/2020

Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions

Data programming is a programmatic weak supervision approach to efficien...
research
08/01/2023

VideoPro: A Visual Analytics Approach for Interactive Video Programming

Constructing supervised machine learning models for real-world video ana...
research
01/08/2023

MEGAnno: Exploratory Labeling for NLP in Computational Notebooks

We present MEGAnno, a novel exploratory annotation framework designed fo...
research
02/10/2023

Machine Learning Based Approach to Recommend MITRE ATT CK Framework for Software Requirements and Design Specifications

Engineering more secure software has become a critical challenge in the ...
research
11/22/2022

Good Data from Bad Models : Foundations of Threshold-based Auto-labeling

Creating large-scale high-quality labeled datasets is a major bottleneck...
research
07/11/2022

Orchestrating Tool Chains for Model-based Systems Engineering with RCE

When using multiple software tools to analyze, visualize, or optimize mo...
research
04/18/2019

Codes, Functions, and Causes: A Critique of Brette's Conceptual Analysis of Coding

In a recent article, Brette argues that coding as a concept is inappropr...

Please sign up or login with your details

Forgot password? Click here to reset