Online Limited Memory Neural-Linear Bandits with Likelihood Matching

02/07/2021
by   Ofir Nabati, et al.
0

We study neural-linear bandits for solving problems where both exploration and representation learning play an important role. Neural-linear bandits leverage the representation power of Deep Neural Networks (DNNs) and combine it with efficient exploration mechanisms designed for linear contextual bandits on top of the last hidden layer. A recent analysis of DNNs in the "infinite-width" regime suggests that when these models are trained with gradient descent the optimal solution is close to the initialization point and the DNN can be viewed as a kernel machine. As a result, it is possible to exploit linear exploration algorithms on top of a DNN via the kernel construction. The problem is that in practice the kernel changes during the learning process and the agent's performance degrades. This can be resolved by recomputing new uncertainty estimations with stored data. Nevertheless, when the buffer's size is limited, a phenomenon called catastrophic forgetting emerges. Instead, we propose a likelihood matching algorithm that is resilient to catastrophic forgetting and is completely online. We perform simulations on a variety of datasets and observe that our algorithm achieves comparable performance to the unlimited memory approach while exhibits resilience to catastrophic forgetting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/24/2019

Deep Neural Linear Bandits: Overcoming Catastrophic Forgetting through Likelihood Matching

We study the neural-linear bandit model for solving sequential decision-...
research
09/16/2018

Memory Efficient Experience Replay for Streaming Learning

In supervised machine learning, an agent is typically trained once and t...
research
05/03/2020

Continuous Learning in a Single-Incremental-Task Scenario with Spike Features

Deep Neural Networks (DNNs) have two key deficiencies, their dependence ...
research
12/01/2021

Efficient Online Bayesian Inference for Neural Bandits

In this paper we present a new algorithm for online (sequential) inferen...
research
12/03/2020

Neural Contextual Bandits with Deep Representation and Shallow Exploration

We study a general class of contextual bandits, where each context-actio...
research
10/08/2019

Automatic Construction of Multi-layer Perceptron Network from Streaming Examples

Autonomous construction of deep neural network (DNNs) is desired for dat...
research
06/10/2020

Gaussian Gated Linear Networks

We propose the Gaussian Gated Linear Network (G-GLN), an extension to th...

Please sign up or login with your details

Forgot password? Click here to reset