Data Programming by Demonstration: A Framework for Interactively Learning Labeling Functions

09/03/2020
by   Sara Evensen, et al.
1

Data programming is a programmatic weak supervision approach to efficiently curate large-scale labeled training data. Writing data programs (labeling functions) requires, however, both programming literacy and domain expertise. Many subject matter experts have neither programming proficiency nor time to effectively write data programs. Furthermore, regardless of one's expertise in coding or machine learning, transferring domain expertise into labeling functions by enumerating rules and thresholds is not only time consuming but also inherently difficult. Here we propose a new framework, data programming by demonstration (DPBD), to generate labeling rules using interactive demonstrations of users. DPBD aims to relieve the burden of writing labeling functions from users, enabling them to focus on higher-level semantics such as identifying relevant signals for labeling tasks. We operationalize our framework with Ruler, an interactive system that synthesizes labeling rules for document classification by using span-level annotations of users on document examples. We compare Ruler with conventional data programming through a user study conducted with 10 data scientists creating labeling functions for sentiment and spam classification tasks. We find that Ruler is easier to use and learn and offers higher overall satisfaction, while providing discriminative model performances comparable to ones achieved by conventional data programming.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/24/2021

TagRuler: Interactive Tool for Span-Level Data Programming by Demonstration

Despite rapid developments in the field of machine learning research, co...
research
11/28/2017

Snorkel: Rapid Training Data Creation with Weak Supervision

Labeling training data is increasingly the largest bottleneck in deployi...
research
03/27/2022

OneLabeler: A Flexible System for Building Data Labeling Tools

Labeled datasets are essential for supervised machine learning. Various ...
research
08/01/2023

VideoPro: A Visual Analytics Approach for Interactive Video Programming

Constructing supervised machine learning models for real-world video ana...
research
03/02/2022

Nemo: Guiding and Contextualizing Weak Supervision for Interactive Data Programming

Weak Supervision (WS) techniques allow users to efficiently create large...
research
05/25/2016

Data Programming: Creating Large Training Sets, Quickly

Large labeled training sets are the critical building blocks of supervis...
research
03/14/2018

Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data

Paucity of large curated hand-labeled training data for every domain-of-...

Please sign up or login with your details

Forgot password? Click here to reset