Nonstochastic Bandits with Infinitely Many Experts

02/09/2021
∙
by   X. Flora Meng, et al.
∙
0
∙

We study the problem of nonstochastic bandits with infinitely many experts: A learner aims to maximize the total reward by taking actions sequentially based on bandit feedback while benchmarking against a countably infinite set of experts. We propose a variant of Exp4.P that, for finitely many experts, enables inference of correct expert rankings while preserving the order of the regret upper bound. We then incorporate the variant into a meta-algorithm that works on infinitely many experts. We prove a high-probability upper bound of 𝒊Ėƒ( i^*K + √(KT)) on the regret, up to polylog factors, where i^* is the unknown position of the best expert, K is the number of actions, and T is the time horizon. We also provide an example of structured experts and discuss how to expedite learning in such case. Our meta-learning algorithm achieves the tightest regret upper bound for the setting considered when i^* = 𝒊Ėƒ( √(T/K)). If a prior distribution is assumed to exist for i^*, the probability of satisfying a tight regret bound increases with T, the rate of which can be fast.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 05/24/2023

On the Minimax Regret for Online Learning with Feedback Graphs

In this work, we improve on the upper and lower bounds for the regret of...
research
∙ 08/11/2022

Regret Analysis for Hierarchical Experts Bandit Problem

We study an extension of standard bandit problem in which there are R la...
research
∙ 10/05/2022

Constant regret for sequence prediction with limited advice

We investigate the problem of cumulative regret minimization for individ...
research
∙ 02/11/2021

Meta-Thompson Sampling

Efficient exploration in multi-armed bandits is a fundamental online lea...
research
∙ 03/14/2023

Information-Theoretic Regret Bounds for Bandits with Fixed Expert Advice

We investigate the problem of bandits with expert advice when the expert...
research
∙ 03/28/2018

A Better Resource Allocation Algorithm with Semi-Bandit Feedback

We study a sequential resource allocation problem between a fixed number...
research
∙ 05/30/2014

Learning to Act Greedily: Polymatroid Semi-Bandits

Many important optimization problems, such as the minimum spanning tree ...

Please sign up or login with your details

Forgot password? Click here to reset