Understanding Programmatic Weak Supervision via Source-aware Influence Function

05/25/2022
by   Jieyu Zhang, et al.
0

Programmatic Weak Supervision (PWS) aggregates the source votes of multiple weak supervision sources into probabilistic training labels, which are in turn used to train an end model. With its increasing popularity, it is critical to have some tool for users to understand the influence of each component (e.g., the source vote or training data) in the pipeline and interpret the end model behavior. To achieve this, we build on Influence Function (IF) and propose source-aware IF, which leverages the generation process of the probabilistic labels to decompose the end model's training objective and then calculate the influence associated with each (data, source, class) tuple. These primitive influence score can then be used to estimate the influence of individual component of PWS, such as source vote, supervision source, and training data. On datasets of diverse domains, we demonstrate multiple use cases: (1) interpreting incorrect predictions from multiple angles that reveals insights for debugging the PWS pipeline, (2) identifying mislabeling of sources with a gain of 9 performance by removing harmful components in the training objective (13 better than ordinary IF).

READ FULL TEXT
research
06/19/2022

Integrated Weak Learning

We introduce Integrated Weak Learning, a principled framework that integ...
research
05/11/2022

Weak Supervision with Incremental Source Accuracy Estimation

Motivated by the desire to generate labels for real-time data we develop...
research
07/05/2021

End-to-End Weak Supervision

Aggregating multiple sources of weak supervision (WS) can ease the data-...
research
05/15/2019

Passage Ranking with Weak Supervision

In this paper, we propose a weak supervision framework for neural rankin...
research
09/23/2021

WRENCH: A Comprehensive Benchmark for Weak Supervision

Recent Weak Supervision (WS) approaches have had widespread success in e...
research
05/29/2023

Alfred: A System for Prompted Weak Supervision

Alfred is the first system for programmatic weak supervision (PWS) that ...
research
07/19/2021

CHEF: A Cheap and Fast Pipeline for Iteratively Cleaning Label Uncertainties (Technical Report)

High-quality labels are expensive to obtain for many machine learning ta...

Please sign up or login with your details

Forgot password? Click here to reset