Connections Between Mirror Descent, Thompson Sampling and the Information Ratio

05/28/2019
by   Julian Zimmert, et al.
0

The information-theoretic analysis by Russo and Van Roy (2014) in combination with minimax duality has proved a powerful tool for the analysis of online learning algorithms in full and partial information settings. In most applications there is a tantalising similarity to the classical analysis based on mirror descent. We make a formal connection, showing that the information-theoretic bounds in most applications can be derived from existing techniques for online convex optimisation. Besides this, for k-armed adversarial bandits we provide an efficient algorithm with regret that matches the best information-theoretic upper bound and improve best known regret guarantees for online linear optimisation on ℓ_p-balls and bandits with graph feedback.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/25/2020

Mirror Descent and the Information Ratio

We establish a connection between the stability of mirror descent and th...
research
07/12/2019

Exploration by Optimisation in Partial Monitoring

We provide a simple and efficient algorithm for adversarial k-action d-o...
research
02/01/2019

An Information-Theoretic Approach to Minimax Regret in Partial Monitoring

We prove a new minimax theorem connecting the worst-case Bayesian regret...
research
03/14/2023

Information-Theoretic Regret Bounds for Bandits with Fixed Expert Advice

We investigate the problem of bandits with expert advice when the expert...
research
06/01/2021

Minimax Regret for Bandit Convex Optimisation of Ridge Functions

We analyse adversarial bandit convex optimisation with an adversary that...
research
05/27/2022

Lifting the Information Ratio: An Information-Theoretic Analysis of Thompson Sampling for Contextual Bandits

We study the Bayesian regret of the renowned Thompson Sampling algorithm...
research
08/08/2022

Optimistic Optimisation of Composite Objective with Exponentiated Update

This paper proposes a new family of algorithms for the online optimisati...

Please sign up or login with your details

Forgot password? Click here to reset