A note on the price of bandit feedback for mistake-bounded online learning

01/18/2021
by   Jesse Geneson, et al.
0

The standard model and the bandit model are two generalizations of the mistake-bound model to online multiclass classification. In both models the learner guesses a classification in each round, but in the standard model the learner recieves the correct classification after each guess, while in the bandit model the learner is only told whether or not their guess is correct in each round. For any set F of multiclass classifiers, define opt_std(F) and opt_bandit(F) to be the optimal worst-case number of prediction mistakes in the standard and bandit models respectively. Long (Theoretical Computer Science, 2020) claimed that for all M > 2 and infinitely many k, there exists a set F of functions from a set X to a set Y of size k such that opt_std(F) = M and opt_bandit(F) ≥ (1 - o(1))(|Y|ln|Y|)opt_std(F). The proof of this result depended on the following lemma, which is false e.g. for all prime p ≥ 5, s = 1 (the all 1 vector), t = 2 (the all 2 vector), and all z. Lemma: Fix n ≥ 2 and prime p, and let u be chosen uniformly at random from {0, …, p-1}^n. For any s, t ∈{1, …, p-1}^n with s ≠ t and for any z ∈{0, …, p-1}, we have (t · u = z p | s · u = z p) = 1/p. We show that this lemma is false precisely when s and t are multiples of each other mod p. Then using a new lemma, we fix Long's proof.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/03/2022

Sharp bounds on the price of bandit feedback for several models of mistake-bounded online learning

We determine sharp bounds on the price of bandit feedback for several va...
research
05/30/2021

Sharper bounds for online learning of smooth functions of a single variable

We investigate the generalization of the mistake-bound model to continuo...
research
05/17/2022

Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks

In this paper, we present online algorithm called Delaytron for learning...
research
08/06/2023

Self-Directed Linear Classification

In online classification, a learner is presented with a sequence of exam...
research
02/08/2019

Bandit Principal Component Analysis

We consider a partial-feedback variant of the well-studied online PCA pr...
research
05/17/2021

Multiclass Classification using dilute bandit feedback

This paper introduces a new online learning framework for multiclass cla...
research
01/04/2023

Online Learning of Smooth Functions

In this paper, we study the online learning of real-valued functions whe...

Please sign up or login with your details

Forgot password? Click here to reset