Multiclass Classification using dilute bandit feedback

05/17/2021
by   Gaurav Batra, et al.
0

This paper introduces a new online learning framework for multiclass classification called learning with diluted bandit feedback. At every time step, the algorithm predicts a candidate label set instead of a single label for the observed example. It then receives feedback from the environment whether the actual label lies in this candidate label set or not. This feedback is called "diluted bandit feedback". Learning in this setting is even more challenging than the bandit feedback setting, as there is more uncertainty in the supervision. We propose an algorithm for multiclass classification using dilute bandit feedback (MC-DBF), which uses the exploration-exploitation strategy to predict the candidate set in each trial. We show that the proposed algorithm achieves O(T^1-1/m+2) mistake bound if candidate label set size (in each step) is m. We demonstrate the effectiveness of the proposed approach with extensive simulations.

READ FULL TEXT

page 2

page 6

page 7

page 8

research
08/08/2023

Multiclass Online Learnability under Bandit Feedback

We study online multiclass classification under bandit feedback. We exte...
research
05/17/2022

Delaytron: Efficient Learning of Multiclass Classifiers with Delayed Bandit Feedbacks

In this paper, we present online algorithm called Delaytron for learning...
research
06/09/2023

Online Learning with Set-Valued Feedback

We study a variant of online multiclass classification where the learner...
research
06/05/2020

Learning Multiclass Classifier Under Noisy Bandit Feedback

This paper addresses the problem of multiclass classification with corru...
research
06/23/2023

Nearest Neighbour with Bandit Feedback

In this paper we adapt the nearest neighbour rule to the contextual band...
research
02/04/2019

Online Multiclass Classification Based on Prediction Margin for Partial Feedback

We consider the problem of online multiclass classification with partial...
research
01/18/2021

A note on the price of bandit feedback for mistake-bounded online learning

The standard model and the bandit model are two generalizations of the m...

Please sign up or login with your details

Forgot password? Click here to reset