PAC-Bayes Bounds for Bandit Problems: A Survey and Experimental Comparison

11/29/2022
by   Hamish Flynn, et al.
0

PAC-Bayes has recently re-emerged as an effective theory with which one can derive principled learning algorithms with tight performance guarantees. However, applications of PAC-Bayes to bandit problems are relatively rare, which is a great misfortune. Many decision-making problems in healthcare, finance and natural sciences can be modelled as bandit problems. In many of these applications, principled algorithms with strong performance guarantees would be very much appreciated. This survey provides an overview of PAC-Bayes performance bounds for bandit problems and an experimental comparison of these bounds. Our experimental comparison has revealed that available PAC-Bayes upper bounds on the cumulative regret are loose, whereas available PAC-Bayes lower bounds on the expected reward can be surprisingly tight. We found that an offline contextual bandit algorithm that learns a policy by optimising a PAC-Bayes bound was able to learn randomised neural network polices with competitive expected reward and non-vacuous performance guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/19/2019

PAC-Bayes with Backprop

We explore a method to train probabilistic neural networks by minimizing...
research
07/30/2020

A PAC algorithm in relative precision for bandit problem with costly sampling

This paper considers the problem of maximizing an expectation function o...
research
06/07/2021

How Tight Can PAC-Bayes be in the Small Data Regime?

In this paper, we investigate the question: Given a small number of data...
research
10/10/2019

Still no free lunches: the price to pay for tighter PAC-Bayes bounds

"No free lunch" results state the impossibility of obtaining meaningful ...
research
10/23/2019

Unifying Variational Inference and PAC-Bayes for Supervised Learning that Scales

Neural Network based controllers hold enormous potential to learn comple...
research
05/25/2023

Exponential Smoothing for Off-Policy Learning

Off-policy learning (OPL) aims at finding improved policies from logged ...
research
06/04/2018

Network Reliability Estimation in Theory and Practice

As engineered systems expand, become more interdependent, and operate in...

Please sign up or login with your details

Forgot password? Click here to reset