Data Programming using Continuous and Quality-Guided Labeling Functions

11/22/2019
by   Oishik Chatterjee, et al.
0

Scarcity of labeled data is a bottleneck for supervised learning models. A paradigm that has evolved for dealing with this problem is data programming. An existing data programming paradigm allows human supervision to be provided as a set of discrete labeling functions (LF) that output possibly noisy labels to input instances and a generative modelfor consolidating the weak labels. We enhance and generalize this paradigm by supporting functions that output a continuous score (instead of a hard label) that noisily correlates with labels. We show across five applications that continuous LFs are more natural to program and lead to improved recall. We also show that accuracy of existing generative models is unstable with respect to initialization, training epochs, and learning rates. We give control to the data programmer to guide the training process by providing intuitive quality guides with each LF. We propose an elegant method of incorporating these guides into the generative model. Our overall method, called CAGE, makes the data programming paradigm more reliable than other tricks based on initialization, sign-penalties, or soft-accuracy constraints.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/25/2016

Data Programming: Creating Large Training Sets, Quickly

Large labeled training sets are the critical building blocks of supervis...
research
04/07/2020

Inspector Gadget: A Data Programming-based Labeling System for Industrial Images

As machine learning for images becomes democratized in the Software 2.0 ...
research
03/11/2019

GOGGLES: Automatic Training Data Generation with Affinity Coding

Generating large labeled training data is becoming the biggest bottlenec...
research
03/14/2018

Adversarial Data Programming: Using GANs to Relax the Bottleneck of Curated Labeled Data

Paucity of large curated hand-labeled training data for every domain-of-...
research
04/30/2020

Generative Adversarial Data Programming

The paucity of large curated hand-labeled training data forms a major bo...
research
04/13/2022

Label Augmentation with Reinforced Labeling for Weak Supervision

Weak supervision (WS) is an alternative to the traditional supervised le...
research
05/08/2023

Q A Label Learning

Assigning labels to instances is crucial for supervised machine learning...

Please sign up or login with your details

Forgot password? Click here to reset