Efficient Online Bayesian Inference for Neural Bandits

12/01/2021
by   Gerardo Duran-Martin, et al.
5

In this paper we present a new algorithm for online (sequential) inference in Bayesian neural networks, and show its suitability for tackling contextual bandit problems. The key idea is to combine the extended Kalman filter (which locally linearizes the likelihood function at each time step) with a (learned or random) low-dimensional affine subspace for the parameters; the use of a subspace enables us to scale our algorithm to models with ∼ 1M parameters. While most other neural bandit methods need to store the entire past dataset in order to avoid the problem of "catastrophic forgetting", our approach uses constant memory. This is possible because we represent uncertainty about all the parameters in the model, not just the final linear layer. We show good results on the "Deep Bayesian Bandit Showdown" benchmark, as well as MNIST and a recommender system.

READ FULL TEXT

page 8

page 9

page 10

page 11

research
01/24/2019

Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching

We study the neural-linear bandit model for solving sequential decision-...
research
05/31/2023

Low-rank extended Kalman filtering for online learning of neural networks from streaming data

We propose an efficient online approximate Bayesian inference algorithm ...
research
02/07/2021

Online Limited Memory Neural-Linear Bandits with Likelihood Matching

We study neural-linear bandits for solving problems where both explorati...
research
07/13/2021

No Regrets for Learning the Prior in Bandits

We propose AdaTS, a Thompson sampling algorithm that adapts sequentially...
research
02/21/2020

Online Learning in Contextual Bandits using Gated Linear Networks

We introduce a new and completely online contextual bandit algorithm cal...
research
08/06/2021

Joint AP Probing and Scheduling: A Contextual Bandit Approach

We consider a set of APs with unknown data rates that cooperatively serv...
research
07/09/2020

Influence Diagram Bandits: Variational Thompson Sampling for Structured Bandit Problems

We propose a novel framework for structured bandits, which we call an in...

Please sign up or login with your details

Forgot password? Click here to reset