Boundary Crossing Probabilities for General Exponential Families

05/24/2017
by   Odalric-Ambrym Maillard, et al.
0

We consider parametric exponential families of dimension K on the real line. We study a variant of boundary crossing probabilities coming from the multi-armed bandit literature, in the case when the real-valued distributions form an exponential family of dimension K. Formally, our result is a concentration inequality that bounds the probability that B^ψ(θ̂_n,θ^)≥ f(t/n)/n, where θ^ is the parameter of an unknown target distribution, θ̂_n is the empirical parameter estimate built from n observations, ψ is the log-partition function of the exponential family and B^ψ is the corresponding Bregman divergence. From the perspective of stochastic multi-armed bandits, we pay special attention to the case when the boundary function f is logarithmic, as it is enables to analyze the regret of the state-of-the-art and strategies, whose analysis was left open in such generality. Indeed, previous results only hold for the case when K=1, while we provide results for arbitrary finite dimension K, thus considerably extending the existing results. Perhaps surprisingly, we highlight that the proof techniques to achieve these strong results already existed three decades ago in the work of T.L. Lai, and were apparently forgotten in the bandit community. We provide a modern rewriting of these beautiful techniques that we believe are useful beyond the application to stochastic multi-armed bandits.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/12/2013

Thompson Sampling for 1-Dimensional Exponential Family Bandits

Thompson Sampling has been demonstrated in many complex bandit models, h...
research
01/18/2022

Bregman Deviations of Generic Exponential Families

We revisit the method of mixture technique, also known as the Laplace me...
research
03/10/2023

A General Recipe for the Analysis of Randomized Multi-Armed Bandit Algorithms

In this paper we propose a general methodology to derive regret bounds f...
research
12/02/2021

Indexed Minimum Empirical Divergence for Unimodal Bandits

We consider a multi-armed bandit problem specified by a set of one-dimen...
research
01/05/2020

A Hoeffding Inequality for Finite State Markov Chains and its Applications to Markovian Bandits

This paper develops a Hoeffding inequality for the partial sums ∑_k=1^n ...
research
01/08/2020

On Thompson Sampling for Smoother-than-Lipschitz Bandits

Thompson Sampling is a well established approach to bandit and reinforce...
research
01/30/2020

Finite-time Analysis of Kullback-Leibler Upper Confidence Bounds for Optimal Adaptive Allocation with Multiple Plays and Markovian Rewards

We study an extension of the classic stochastic multi-armed bandit probl...

Please sign up or login with your details

Forgot password? Click here to reset