An Information-Theoretic Analysis of Nonstationary Bandit Learning

02/09/2023
by   Seungki Min, et al.
0

In nonstationary bandit learning problems, the decision-maker must continually gather information and adapt their action selection as the latent state of the environment evolves. In each time period, some latent optimal action maximizes expected reward under the environment state. We view the optimal action sequence as a stochastic process, and take an information-theoretic approach to analyze attainable performance. We bound limiting per-period regret in terms of the entropy rate of the optimal action process. The bound applies to a wide array of problems studied in the literature and reflects the problem's information structure through its information-ratio.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2018

An Information-Theoretic Analysis of Thompson Sampling for Large Action Spaces

Information-theoretic Bayesian regret bounds of Russo and Van Roy captur...
research
05/30/2018

An Information-Theoretic Analysis for Thompson Sampling with Many Actions

Information-theoretic Bayesian regret bounds of Russo and Van Roy captur...
research
09/30/2021

Reinforcement Learning with Information-Theoretic Actuation

Reinforcement Learning formalises an embodied agent's interaction with t...
research
01/06/2022

Gaussian Imagination in Bandit Learning

Assuming distributions are Gaussian often facilitates computations that ...
research
02/18/2021

A Bit Better? Quantifying Information for Bandit Learning

The information ratio offers an approach to assessing the efficacy with ...
research
11/26/2009

A conversion between utility and information

Rewards typically express desirabilities or preferences over a set of al...

Please sign up or login with your details

Forgot password? Click here to reset