Satisficing in Time-Sensitive Bandit Learning

03/07/2018
by   Daniel Russo, et al.
0

Much of the recent literature on bandit learning focuses on algorithms that aim to converge on an optimal action. One shortcoming is that this orientation does not account for time sensitivity, which can play a crucial role when learning an optimal action requires much more information than near-optimal ones. Indeed, popular approaches such as upper-confidence-bound methods and Thompson sampling can fare poorly in such situations. We consider instead learning a satisficing action, which is near-optimal while requiring less information, and propose satisficing Thompson sampling, an algorithm that serves this purpose. We establish a general bound on expected discounted regret and study the application of satisficing Thompson sampling to linear and infinite-armed bandits, demonstrating arbitrarily large benefits over Thompson sampling. We also discuss the relation between the notion of satisficing and the theory of rate distortion, which offers guidance on the selection of satisficing actions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2018

An Information-Theoretic Analysis of Thompson Sampling for Large Action Spaces

Information-theoretic Bayesian regret bounds of Russo and Van Roy captur...
research
09/15/2012

Further Optimal Regret Bounds for Thompson Sampling

Thompson Sampling is one of the oldest heuristics for multi-armed bandit...
research
05/29/2019

Regret Bounds for Thompson Sampling in Restless Bandit Problems

Restless bandit problems are instances of non-stationary multi-armed ban...
research
11/01/2021

Decentralized Cooperative Reinforcement Learning with Hierarchical Information Structure

Multi-agent reinforcement learning (MARL) problems are challenging due t...
research
05/12/2019

On the Performance of Thompson Sampling on Logistic Bandits

We study the logistic bandit, in which rewards are binary with success p...
research
06/11/2020

TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation

Thompson sampling has become a ubiquitous approach to online decision pr...
research
05/04/2023

Weighted Tallying Bandits: Overcoming Intractability via Repeated Exposure Optimality

In recommender system or crowdsourcing applications of online learning, ...

Please sign up or login with your details

Forgot password? Click here to reset