Data Programming using Semi-Supervision and Subset Selection

08/22/2020
by   Ayush Maheshwari, et al.
14

The paradigm of data programming <cit.> has shown a lot of promise in using weak supervision in the form of rules and labelling functions to learn in scenarios where labelled data is not available. Another approach which has shown a lot of promise is that of semi-supervised learning where we augment small amounts of labelled data with a large unlabelled dataset. In this work, we argue that by not using any labelled data, data programming based approaches can yield sub-optimal performance, particularly, in cases when the labelling functions are noisy. The first contribution of this work is to study a framework of joint learning which combines un-supervised consensus from labelling functions with semi-supervised learning and jointly learns a model to efficiently use the rules/labelling functions along with semi-supervised loss functions on the feature space. Next, we also study a subset selection approach to select the set of examples which can be used as the labelled set. We evaluate our techniques on synthetic data as well as four publicly available datasets and show improvement over state-of-the-art techniques[Source code of the paper at <https://github.com/ayushbits/Semi-Supervised-LFs-Subset-Selection>].

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2021

Can semi-supervised learning reduce the amount of manual labelling required for effective radio galaxy morphology classification?

In this work, we examine the robustness of state-of-the-art semi-supervi...
research
08/01/2021

SPEAR : Semi-supervised Data Programming in Python

We present SPEAR, an open-source python library for data programming wit...
research
11/27/2022

Impact of Labelled Set Selection and Supervision Policies on Semi-supervised Learning

In semi-supervised representation learning frameworks, when the number o...
research
08/24/2021

The Word is Mightier than the Label: Learning without Pointillistic Labels using Data Programming

Most advanced supervised Machine Learning (ML) models rely on vast amoun...
research
06/14/2020

MixMOOD: A systematic approach to class distribution mismatch in semi-supervised learning using deep dataset dissimilarity measures

In this work, we propose MixMOOD - a systematic approach to mitigate eff...
research
04/20/2021

More Than Meets The Eye: Semi-supervised Learning Under Non-IID Data

A common heuristic in semi-supervised deep learning (SSDL) is to select ...
research
03/02/2020

Structured Prediction with Partial Labelling through the Infimum Loss

Annotating datasets is one of the main costs in nowadays supervised lear...

Please sign up or login with your details

Forgot password? Click here to reset