Profitable Bandits

05/08/2018
by   Mastane Achab, et al.
0

Originally motivated by default risk management applications, this paper investigates a novel problem, referred to as the profitable bandit problem here. At each step, an agent chooses a subset of the K possible actions. For each action chosen, she then receives the sum of a random number of rewards. Her objective is to maximize her cumulated earnings. We adapt and study three well-known strategies in this purpose, that were proved to be most efficient in other settings: kl-UCB, Bayes-UCB and Thompson Sampling. For each of them, we prove a finite time regret bound which, together with a lower bound we obtain as well, establishes asymptotic optimality. Our goal is also to compare these three strategies from a theoretical and empirical perspective both at the same time. We give simple, self-contained proofs that emphasize their similarities, as well as their differences. While both Bayesian strategies are automatically adapted to the geometry of information, the numerical experiments carried out show a slight advantage for Thompson Sampling in practice.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/16/2021

Reinforcement Learning for Markovian Bandits: Is Posterior Sampling more Scalable than Optimism?

We study learning algorithms for the classical Markovian bandit problem ...
research
12/10/2020

Thompson Sampling for CVaR Bandits

Risk awareness is an important feature to formulate a variety of real wo...
research
05/18/2012

Thompson Sampling: An Asymptotically Optimal Finite Time Analysis

The question of the optimality of Thompson Sampling for solving the stoc...
research
02/23/2017

A minimax and asymptotically optimal algorithm for stochastic bandits

We propose the kl-UCB ++ algorithm for regret minimization in stochastic...
research
07/06/2020

Multi-Armed Bandits with Local Differential Privacy

This paper investigates the problem of regret minimization for multi-arm...
research
11/08/2020

Asymptotic Convergence of Thompson Sampling

Thompson sampling has been shown to be an effective policy across a vari...

Please sign up or login with your details

Forgot password? Click here to reset