Mirror Descent and the Information Ratio

09/25/2020
by   Tor Lattimore, et al.
0

We establish a connection between the stability of mirror descent and the information ratio by Russo and Van Roy [2014]. Our analysis shows that mirror descent with suitable loss estimators and exploratory distributions enjoys the same bound on the adversarial regret as the bounds on the Bayesian regret for information-directed sampling. Along the way, we develop the theory for information-directed sampling and provide an efficient algorithm for adversarial bandits for which the regret upper bound matches exactly the best known information-theoretic upper bound.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/28/2019

Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

The information-theoretic analysis by Russo and Van Roy (2014) in combin...
research
11/11/2020

Asymptotically Optimal Information-Directed Sampling

We introduce a computationally efficient algorithm for finite stochastic...
research
05/27/2022

Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

We study the Bayesian regret of the renowned Thompson Sampling algorithm...
research
06/09/2022

Regret Bounds for Information-Directed Reinforcement Learning

Information-directed sampling (IDS) has revealed its potential as a data...
research
01/29/2018

Information Directed Sampling and Bandits with Heteroscedastic Noise

In the stochastic bandit problem, the goal is to maximize an unknown fun...
research
09/30/2020

An Upper Bound for Wiretap Multi-way Channels

A general model for wiretap multi-way channels is introduced that includ...
research
05/30/2022

Adversarial Bandits Robust to S-Switch Regret

We study the adversarial bandit problem under S number of switching best...

Please sign up or login with your details

Forgot password? Click here to reset