Minimax Lower Bounds for Realizable Transductive Classification

02/09/2016
by   Ilya Tolstikhin, et al.
0

Transductive learning considers a training set of m labeled samples and a test set of u unlabeled samples, with the goal of best labeling that particular test set. Conversely, inductive learning considers a training set of m labeled samples drawn iid from P(X,Y), with the goal of best labeling any future samples drawn iid from P(X). This comparison suggests that transduction is a much easier type of inference than induction, but is this really the case? This paper provides a negative answer to this question, by proving the first known minimax lower bounds for transductive, realizable, binary classification. Our lower bounds show that m should be at least Ω(d/ϵ + (1/δ)/ϵ) when ϵ-learning a concept class H of finite VC-dimension d<∞ with confidence 1-δ, for all m ≤ u. This result draws three important conclusions. First, general transduction is as hard as general induction, since both problems have Ω(d/m) minimax values. Second, the use of unlabeled data does not help general transduction, since supervised learning algorithms such as ERM and (Hanneke, 2015) match our transductive lower bounds while ignoring the unlabeled test set. Third, our transductive lower bounds imply lower bounds for semi-supervised learning, which add to the important discussion about the role of unlabeled data in machine learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/16/2020

Curriculum Labeling: Self-paced Pseudo-Labeling for Semi-Supervised Learning

Semi-supervised learning aims to take advantage of a large amount of unl...
research
03/09/2017

Detecting Sockpuppets in Deceptive Opinion Spam

This paper explores the problem of sockpuppet detection in deceptive opi...
research
01/17/2022

Risk bounds for PU learning under Selected At Random assumption

Positive-unlabeled learning (PU learning) is known as a special case of ...
research
01/19/2021

On The Consistency Training for Open-Set Semi-Supervised Learning

Conventional semi-supervised learning (SSL) methods, e.g., MixMatch, ach...
research
09/25/2020

Fairness in Semi-supervised Learning: Unlabeled Data Help to Reduce Discrimination

A growing specter in the rise of machine learning is whether the decisio...
research
05/20/2018

Minimax Lower Bounds for Cost Sensitive Classification

The cost-sensitive classification problem plays a crucial role in missio...
research
11/19/2019

Anomaly and Novelty detection for robust semi-supervised learning

Three important issues are often encountered in Supervised and Semi-Supe...

Please sign up or login with your details

Forgot password? Click here to reset