Large deviations for the perceptron model and consequences for active learning

12/09/2019
by   Hugo Cui, et al.
0

Active learning is a branch of machine learning that deals with problems where unlabeled data is abundant yet obtaining labels is expensive. The learning algorithm has the possibility of querying a limited number of samples to obtain the corresponding labels, subsequently used for supervised learning. In this work, we consider the task of choosing the subset of samples to be labeled from a fixed finite pool of samples. We assume the pool of samples to be a random matrix and the ground truth labels to be generated by a single-layer teacher random neural network. We employ replica methods to analyze the large deviations for the accuracy achieved after supervised learning on a subset of the original pool. These large deviations then provide optimal achievable performance boundaries for any active learning algorithm. We show that the optimal learning performance can be efficiently approached by simple message-passing active learning algorithms. We also provide a comparison with the performance of some other popular active learning strategies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2021

A Simple Baseline for Low-Budget Active Learning

Active learning focuses on choosing a subset of unlabeled data to be lab...
research
11/01/2019

Picking groups instead of samples: A close look at Static Pool-based Meta-Active Learning

Active Learning techniques are used to tackle learning problems where ob...
research
10/12/2020

Active learning with RESSPECT: Resource allocation for extragalactic astronomical transients

The recent increase in volume and complexity of available astronomical d...
research
08/25/2023

Active learning for fast and slow modeling attacks on Arbiter PUFs

Modeling attacks, in which an adversary uses machine learning techniques...
research
08/15/2023

BI-LAVA: Biocuration with Hierarchical Image Labeling through Active Learning and Visual Analysis

In the biomedical domain, taxonomies organize the acquisition modalities...
research
06/26/2020

CheXpert++: Approximating the CheXpert labeler for Speed,Differentiability, and Probabilistic Output

It is often infeasible or impossible to obtain ground truth labels for m...
research
12/02/2020

Message Passing Adaptive Resonance Theory for Online Active Semi-supervised Learning

Active learning is widely used to reduce labeling effort and training ti...

Please sign up or login with your details

Forgot password? Click here to reset