Data Consistency for Weakly Supervised Learning

02/08/2022
by   Chidubem Arachie, et al.
1

In many applications, training machine learning models involves using large amounts of human-annotated data. Obtaining precise labels for the data is expensive. Instead, training with weak supervision provides a low-cost alternative. We propose a novel weak supervision algorithm that processes noisy labels, i.e., weak signals, while also considering features of the training data to produce accurate labels for training. Our method searches over classifiers of the data representation to find plausible labelings. We call this paradigm data consistent weak supervision. A key facet of our framework is that we are able to estimate labels for data examples low or no coverage from the weak supervision. In addition, we make no assumptions about the joint distribution of the weak signals and true labels of the data. Instead, we use weak signals and the data features to solve a constrained optimization that enforces data consistency among the labels we generate. Empirical evaluation of our method on different datasets shows that it significantly outperforms state-of-the-art weak supervision methods on both text and image classification tasks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/15/2020

Constrained Labeling for Weakly Supervised Learning

Curation of large fully supervised datasets has become one of the major ...
research
09/14/2023

Detecting Misinformation with LLM-Predicted Credibility Signals and Weak Supervision

Credibility signals represent a wide range of heuristics that are typica...
research
12/17/2021

A data-centric weak supervised learning for highway traffic incident detection

Using the data from loop detector sensors for near-real-time detection o...
research
06/09/2020

Foreshadowing the Benefits of Incidental Supervision

Learning theory mostly addresses the standard learning paradigm, assumin...
research
07/19/2021

CHEF: A Cheap and Fast Pipeline for Iteratively Cleaning Label Uncertainties (Technical Report)

High-quality labels are expensive to obtain for many machine learning ta...
research
02/04/2021

Disambiguation of weak supervision with exponential convergence rates

Machine learning approached through supervised learning requires expensi...
research
03/30/2022

The Weak Supervision Landscape

Many ways of annotating a dataset for machine learning classification ta...

Please sign up or login with your details

Forgot password? Click here to reset