Active Learning by Greedy Split and Label Exploration

06/17/2019
by   Alyssa Herbst, et al.
0

Annotating large unlabeled datasets can be a major bottleneck for machine learning applications. We introduce a scheme for inferring labels of unlabeled data at a fraction of the cost of labeling the entire dataset. We refer to the scheme as greedy split and label exploration (GSAL). GSAL greedily queries an oracle (or human labeler) and partitions a dataset to find data subsets that have mostly the same label. GSAL can then infer labels by majority vote of the known labels in each subset. GSAL makes the decision to split or label from a subset by maximizing a lower bound on the expected number of correctly labeled examples. GSAL improves upon existing hierarchical labeling schemes by using supervised models to partition the data, therefore avoiding reliance on unsupervised clustering methods that may not accurately group data by label. We design GSAL with strategies to avoid bias that could be introduced through this adaptive partitioning. We evaluate GSAL on labeling of three datasets and find that it outperforms existing strategies for adaptive labeling.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/20/2020

Active and Incremental Learning with Weak Supervision

Large amounts of labeled training data are one of the main contributors ...
research
06/24/2020

Minimum Cost Active Labeling

Labeling a data set completely is important for groundtruth generation. ...
research
03/03/2018

Deep Bayesian Active Semi-Supervised Learning

In many applications the process of generating label information is expe...
research
09/24/2022

TransPOS: Transformers for Consolidating Different POS Tagset Datasets

In hope of expanding training data, researchers often want to merge two ...
research
03/02/2022

Information Gain Propagation: a new way to Graph Active Learning with Soft Labels

Graph Neural Networks (GNNs) have achieved great success in various task...
research
05/07/2018

Label Refinery: Improving ImageNet Classification through Label Progression

Among the three main components (data, labels, and models) of any superv...
research
08/12/2021

LabOR: Labeling Only if Required for Domain Adaptive Semantic Segmentation

Unsupervised Domain Adaptation (UDA) for semantic segmentation has been ...

Please sign up or login with your details

Forgot password? Click here to reset