Combining MixMatch and Active Learning for Better Accuracy with Fewer Labels

by   Shuang Song, et al.

We propose using active learning based techniques to further improve the state-of-the-art semi-supervised learning MixMatch algorithm. We provide a thorough empirical evaluation of several active-learning and baseline methods, which successfully demonstrate a significant improvement on the benchmark CIFAR-10, CIFAR-100, and SVHN datasets (as much as 1.5 We also provide an empirical analysis of the cost trade-off between incrementally gathering more labeled versus unlabeled data. This analysis can be used to measure the relative value of labeled/unlabeled data at different points of the learning curve, where we find that although the incremental value of labeled data can be as much as 20x that of unlabeled, it quickly diminishes to less than 3x once more than 2,000 labeled example are observed. Code can be found at



There are no comments yet.


page 1

page 2

page 3

page 4


The Use of Unlabeled Data versus Labeled Data for Stopping Active Learning for Text Classification

Annotation of training data is the major bottleneck in the creation of t...

A Simple Baseline for Low-Budget Active Learning

Active learning focuses on choosing a subset of unlabeled data to be lab...

Rényi Entropy Bounds on the Active Learning Cost-Performance Tradeoff

Semi-supervised classification, one of the most prominent fields in mach...

Fast Interactive Image Retrieval using large-scale unlabeled data

An interactive image retrieval system learns which images in the databas...

Spacing Loss for Discovering Novel Categories

Novel Class Discovery (NCD) is a learning paradigm, where a machine lear...

Addressing Limited Data for Textual Entailment Across Domains

We seek to address the lack of labeled data (and high cost of annotation...

Reducing Annotating Load: Active Learning with Synthetic Images in Surgical Instrument Segmentation

Accurate instrument segmentation in endoscopic vision of robot-assisted ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.