Bounded Regret for Finitely Parameterized Multi-Armed Bandits

03/03/2020
by   Kishan Panaganti, et al.
0

We consider the problem of finitely parameterized multi-armed bandits where the model of the underlying stochastic environment can be characterized based on a common unknown parameter. The true parameter is unknown to the learning agent. However, the set of possible parameters, which is finite, is known a priori. We propose an algorithm that is simple and easy to implement, which we call FP-UCB algorithm, which uses the information about the underlying parameter set for faster learning. In particular, we show that the FP-UCB algorithm achieves a bounded regret under some structural condition on the underlying parameter set. We also show that, if the underlying parameter set does not satisfy the necessary structural condition, FP-UCB algorithm achieves a logarithmic regret, but with a smaller preceding constant compared to the standard UCB algorithm. We also validate the superior performance of the FP-UCB algorithm through extensive numerical simulations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2019

Regret Minimisation in Multi-Armed Bandits Using Bounded Arm Memory

In this paper, we propose a constant word (RAM model) algorithm for regr...
research
01/28/2022

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

In this paper, we generalize the concept of heavy-tailed multi-armed ban...
research
03/27/2017

A Scale Free Algorithm for Stochastic Bandits with Bounded Kurtosis

Existing strategies for finite-armed stochastic bandits mostly depend on...
research
02/04/2021

Transfer Learning in Bandits with Latent Continuity

Structured stochastic multi-armed bandits provide accelerated regret rat...
research
03/01/2018

The K-Nearest Neighbour UCB algorithm for multi-armed bandits with covariates

In this paper we propose and explore the k-Nearest Neighbour UCB algorit...
research
10/22/2019

Smoothness-Adaptive Stochastic Bandits

We consider the problem of non-parametric multi-armed bandits with stoch...
research
11/14/2013

Fundamental Limits of Online and Distributed Algorithms for Statistical Learning and Estimation

Many machine learning approaches are characterized by information constr...

Please sign up or login with your details

Forgot password? Click here to reset