PAC-Bayesian Lifelong Learning For Multi-Armed Bandits

03/07/2022
by   Hamish Flynn, et al.
0

We present a PAC-Bayesian analysis of lifelong learning. In the lifelong learning problem, a sequence of learning tasks is observed one-at-a-time, and the goal is to transfer information acquired from previous tasks to new learning tasks. We consider the case when each learning task is a multi-armed bandit problem. We derive lower bounds on the expected average reward that would be obtained if a given multi-armed bandit algorithm was run in a new task with a particular prior and for a set number of steps. We propose lifelong learning algorithms that use our new bounds as learning objectives. Our proposed algorithms are evaluated in several lifelong multi-armed bandit problems and are found to perform better than a baseline method that does not use generalisation bounds.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/20/2023

Optimal Activation of Halting Multi-Armed Bandit Models

We study new types of dynamic allocation problems the Halting Bandit mod...
research
07/25/2013

Sequential Transfer in Multi-armed Bandit with Finite Set of Models

Learning from prior tasks and transferring that experience to improve fu...
research
09/26/2013

Building Bridges: Viewing Active Learning from the Multi-Armed Bandit Lens

In this paper we propose a multi-armed bandit inspired, pool based activ...
research
03/24/2015

A Note on Information-Directed Sampling and Thompson Sampling

This note introduce three Bayesian style Multi-armed bandit algorithms: ...
research
07/23/2021

Finite-time Analysis of Globally Nonstationary Multi-Armed Bandits

We consider nonstationary multi-armed bandit problems where the model pa...
research
03/10/2018

Enhancing Evolutionary Optimization in Uncertain Environments by Allocating Evaluations via Multi-armed Bandit Algorithms

Optimization problems with uncertain fitness functions are common in the...
research
05/19/2022

Multi-Armed Bandits in Brain-Computer Interfaces

The multi-armed bandit (MAB) problem models a decision-maker that optimi...

Please sign up or login with your details

Forgot password? Click here to reset