XPASC: Measuring Generalization in Weak Supervision by Explainability and Association

06/03/2022
by   Luisa März, et al.
0

Weak supervision is leveraged in a wide range of domains and tasks due to its ability to create massive amounts of labeled data, requiring only little manual effort. Standard approaches use labeling functions to specify signals that are relevant for the labeling. It has been conjectured that weakly supervised models over-rely on those signals and as a result suffer from overfitting. To verify this assumption, we introduce a novel method, XPASC (eXPlainability-Association SCore), for measuring the generalization of a model trained with a weakly supervised dataset. Considering the occurrences of features, classes and labeling functions in a dataset, XPASC takes into account the relevance of each feature for the predictions of the model as well as the associations of the feature with the class and the labeling function, respectively. The association in XPASC can be measured in two variants: XPASC-CHI SQAURE measures associations relative to their statistical significance, while XPASC-PPMI measures association strength more generally. We use XPASC to analyze KnowMAN, an adversarial architecture intended to control the degree of generalization from the labeling functions and thus to mitigate the problem of overfitting. On one hand, we show that KnowMAN is able to control the degree of generalization through a hyperparameter. On the other hand, results and qualitative analysis show that generalization and performance do not relate one-to-one, and that the highest degree of generalization does not necessarily imply the best performance. Therefore methods that allow for controlling the amount of generalization can achieve the right degree of benign overfitting. Our contributions in this study are i) the XPASC score to measure generalization in weakly-supervised models, ii) evaluation of XPASC across datasets and models and iii) the release of the XPASC implementation.

READ FULL TEXT

page 1

page 13

research
09/16/2021

KnowMAN: Weakly Supervised Multinomial Adversarial Networks

The absence of labeled data for training neural models is often addresse...
research
04/14/2022

ULF: Unsupervised Labeling Function Correction using Cross-Validation for Weak Supervision

A way to overcome expensive and time-consuming manual data labeling is w...
research
07/10/2023

Onion Universe Algorithm: Applications in Weakly Supervised Learning

We introduce Onion Universe Algorithm (OUA), a novel classification meth...
research
04/28/2022

WeaNF: Weak Supervision with Normalizing Flows

A popular approach to decrease the need for costly manual annotation of ...
research
04/14/2021

A Weakly Supervised Model for Solving Math word Problems

Solving math word problems (MWPs) is an important and challenging proble...
research
06/21/2021

Demonstration of Panda: A Weakly Supervised Entity Matching System

Entity matching (EM) refers to the problem of identifying tuple pairs in...

Please sign up or login with your details

Forgot password? Click here to reset