Label Augmentation with Reinforced Labeling for Weak Supervision

04/13/2022
by   Gurkan Solmaz, et al.
0

Weak supervision (WS) is an alternative to the traditional supervised learning to address the need for ground truth. Data programming is a practical WS approach that allows programmatic labeling data samples using labeling functions (LFs) instead of hand-labeling each data point. However, the existing approach fails to fully exploit the domain knowledge encoded into LFs, especially when the LFs' coverage is low. This is due to the common data programming pipeline that neglects to utilize data features during the generative process. This paper proposes a new approach called reinforced labeling (RL). Given an unlabeled dataset and a set of LFs, RL augments the LFs' outputs to cases not covered by LFs based on similarities among samples. Thus, RL can lead to higher labeling coverage for training an end classifier. The experiments on several domains (classification of YouTube comments, wine quality, and weather prediction) result in considerable gains. The new approach produces significant performance improvement, leading up to +21 points in accuracy and +61 points in F1 scores compared to the state-of-the-art data programming approach.

READ FULL TEXT

page 17

page 19

page 20

page 21

research
03/11/2019

GOGGLES: Automatic Training Data Generation with Affinity Coding

Generating large labeled training data is becoming the biggest bottlenec...
research
11/30/2021

Automatic Synthesis of Diverse Weak Supervision Sources for Behavior Analysis

Obtaining annotations for large training sets is expensive, especially i...
research
06/18/2021

Dependency Structure Misspecification in Multi-Source Weak Supervision Models

Data programming (DP) has proven to be an attractive alternative to cost...
research
04/28/2022

WeaNF: Weak Supervision with Normalizing Flows

A popular approach to decrease the need for costly manual annotation of ...
research
11/22/2019

Data Programming using Continuous and Quality-Guided Labeling Functions

Scarcity of labeled data is a bottleneck for supervised learning models....
research
04/07/2020

Inspector Gadget: A Data Programming-based Labeling System for Industrial Images

As machine learning for images becomes democratized in the Software 2.0 ...
research
12/14/2018

Bootstrapping Conversational Agents With Weak Supervision

Many conversational agents in the market today follow a standard bot dev...

Please sign up or login with your details

Forgot password? Click here to reset