Regret bounds for Narendra-Shapiro bandit algorithms

02/17/2015
by   Sébastien Gadat, et al.
0

Narendra-Shapiro (NS) algorithms are bandit-type algorithms that have been introduced in the sixties (with a view to applications in Psychology or learning automata), whose convergence has been intensively studied in the stochastic algorithm literature. In this paper, we adress the following question: are the Narendra-Shapiro (NS) bandit algorithms competitive from a regret point of view? In our main result, we show that some competitive bounds can be obtained for such algorithms in their penalized version (introduced in Lamberton_Pages). More precisely, up to an over-penalization modification, the pseudo-regret R̅_n related to the penalized two-armed bandit algorithm is uniformly bounded by C √(n) (where C is made explicit in the paper). We also generalize existing convergence and rates of convergence results to the multi-armed case of the over-penalized bandit algorithm, including the convergence toward the invariant measure of a Piecewise Deterministic Markov Process (PDMP) after a suitable renormalization. Finally, ergodic properties of this PDMP are given in the multi-armed case.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/20/2020

Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms

EXP-based algorithms are often used for exploration in multi-armed bandi...
research
02/14/2012

Graphical Models for Bandit Problems

We introduce a rich class of graphical models for multi-armed bandit pro...
research
02/17/2017

Beyond the Hazard Rate: More Perturbation Algorithms for Adversarial Multi-armed Bandits

Recent work on follow the perturbed leader (FTPL) algorithms for the adv...
research
10/11/2018

Regularized Contextual Bandits

We consider the stochastic contextual bandit problem with additional reg...
research
11/04/2019

Optimistic Optimization for Statistical Model Checking with Regret Bounds

We explore application of multi-armed bandit algorithms to statistical m...
research
10/15/2018

Regret vs. Bandwidth Trade-off for Recommendation Systems

We consider recommendation systems that need to operate under wireless b...
research
11/02/2022

On the Safety of Interpretable Machine Learning: A Maximum Deviation Approach

Interpretable and explainable machine learning has seen a recent surge o...

Please sign up or login with your details

Forgot password? Click here to reset