Fast and Three-rious: Speeding Up Weak Supervision with Triplet Methods

02/27/2020
∙
by   Daniel Y. Fu, et al.
∙
6
∙

Weak supervision is a popular method for building machine learning models without relying on ground truth annotations. Instead, it generates probabilistic training labels by estimating the accuracies of multiple noisy labeling sources (e.g., heuristics, crowd workers). Existing approaches use latent variable estimation to model the noisy sources, but these methods can be computationally expensive, scaling superlinearly in the data. In this work, we show that, for a class of latent variable models highly applicable to weak supervision, we can find a closed-form solution to model parameters, obviating the need for iterative solutions like stochastic gradient descent (SGD). We use this insight to build FlyingSquid, a weak supervision framework that runs orders of magnitude faster than previous weak supervision approaches and requires fewer assumptions. In particular, we prove bounds on generalization error without assuming that the latent variable model can exactly parameterize the underlying data distribution. Empirically, we validate FlyingSquid on benchmark weak supervision datasets and find that it achieves the same or higher quality compared to previous approaches without the need to tune an SGD procedure, recovers model parameters 170 times faster on average, and enables new video analysis and online learning applications.

READ FULL TEXT

page 4

page 10

page 11

page 22

page 23

page 24

page 26

page 40

research
∙ 07/05/2021

End-to-End Weak Supervision

Aggregating multiple sources of weak supervision (WS) can ease the data-...
research
∙ 05/15/2019

Passage Ranking with Weak Supervision

In this paper, we propose a weak supervision framework for neural rankin...
research
∙ 05/15/2019

Passage Ranking with Weak Supervsion

In this paper, we propose a weak supervision framework for neural rankin...
research
∙ 11/10/2019

Meta Label Correction for Learning with Weak Supervision

Leveraging weak or noisy supervision for building effective machine lear...
research
∙ 10/25/2016

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

A challenge in training discriminative models like neural networks is ob...
research
∙ 06/02/2023

An Adaptive Method for Weak Supervision with Drifting Data

We introduce an adaptive method with formal quality guarantees for weak ...
research
∙ 06/02/2022

Weakly Supervised Representation Learning with Sparse Perturbations

The theory of representation learning aims to build methods that provabl...

Please sign up or login with your details

Forgot password? Click here to reset