Learning the Structure of Generative Models without Labeled Data

03/02/2017
by   Stephen H. Bach, et al.
0

Curating labeled training data has become the primary bottleneck in machine learning. Recent frameworks address this bottleneck with generative models to synthesize labels at scale from weak supervision sources. The generative model's dependency structure directly affects the quality of the estimated labels, but selecting a structure automatically without any labeled data is a distinct challenge. We propose a structure estimation method that maximizes the ℓ_1-regularized marginal pseudolikelihood of the observed data. Our analysis shows that the amount of unlabeled data required to identify the true structure scales sublinearly in the number of possible dependencies for a broad class of models. Simulations show that our method is 100× faster than a maximum likelihood approach and selects 1/4 as many extraneous dependencies. We also show that our method provides an average of 1.5 F1 points of improvement over existing, user-developed information extraction applications on real-world data such as PubMed journal abstracts.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2017

Inferring Generative Model Structure with Static Analysis

Obtaining enough labeled data to robustly train complex discriminative m...
research
03/14/2019

Learning Dependency Structures for Weak Supervision Models

Labeling training data is a key bottleneck in the modern machine learnin...
research
10/25/2016

Socratic Learning: Augmenting Generative Models to Incorporate Latent Subsets in Training Data

A challenge in training discriminative models like neural networks is ob...
research
02/06/2021

Jointly Improving Language Understanding and Generation with Quality-Weighted Weak Supervision of Automatic Labeling

Neural natural language generation (NLG) and understanding (NLU) models ...
research
02/20/2018

Actively Avoiding Nonsense in Generative Models

A generative model may generate utter nonsense when it is fit to maximiz...
research
10/21/2019

Multi-Resolution Weak Supervision for Sequential Data

Since manually labeling training data is slow and expensive, recent indu...

Please sign up or login with your details

Forgot password? Click here to reset