The Word is Mightier than the Label: Learning without Pointillistic Labels using Data Programming

08/24/2021
by   Chufan Gao, et al.
0

Most advanced supervised Machine Learning (ML) models rely on vast amounts of point-by-point labelled training examples. Hand-labelling vast amounts of data may be tedious, expensive, and error-prone. Recently, some studies have explored the use of diverse sources of weak supervision to produce competitive end model classifiers. In this paper, we survey recent work on weak supervision, and in particular, we investigate the Data Programming (DP) framework. Taking a set of potentially noisy heuristics as input, DP assigns denoised probabilistic labels to each data point in a dataset using a probabilistic graphical model of heuristics. We analyze the math fundamentals behind DP and demonstrate the power of it by applying it on two real-world text classification tasks. Furthermore, we compare DP with pointillistic active and semi-supervised learning techniques traditionally applied in data-sparse settings.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/09/2022

Weak Supervision for Affordable Modeling of Electrocardiogram Data

Analysing electrocardiograms (ECGs) is an inexpensive and non-invasive, ...
research
06/18/2021

Dependency Structure Misspecification in Multi-Source Weak Supervision Models

Data programming (DP) has proven to be an attractive alternative to cost...
research
08/22/2020

Data Programming using Semi-Supervision and Subset Selection

The paradigm of data programming <cit.> has shown a lot of promise in us...
research
02/04/2020

Iterative Data Programming for Expanding Text Classification Corpora

Real-world text classification tasks often require many labeled training...
research
02/28/2022

Resolving label uncertainty with implicit posterior models

We propose a method for jointly inferring labels across a collection of ...
research
04/21/2016

A Novel Approach to Dropped Pronoun Translation

Dropped Pronouns (DP) in which pronouns are frequently dropped in the so...
research
06/18/2022

Weakly Supervised Classification of Vital Sign Alerts as Real or Artifact

A significant proportion of clinical physiologic monitoring alarms are f...

Please sign up or login with your details

Forgot password? Click here to reset