Recurrent Neural-Linear Posterior Sampling for Non-Stationary Contextual Bandits

07/09/2020
by   Aditya Ramesh, et al.
0

An agent in a non-stationary contextual bandit problem should balance between exploration and the exploitation of (periodic or structured) patterns present in its previous experiences. Handcrafting an appropriate historical context is an attractive alternative to transform a non-stationary problem into a stationary problem that can be solved efficiently. However, even a carefully designed historical context may introduce spurious relationships or lack a convenient representation of crucial information. In order to address these issues, we propose an approach that learns to represent the relevant context for a decision based solely on the raw history of interactions between the agent and the environment. This approach relies on a combination of features extracted by recurrent neural networks with a contextual linear bandit algorithm based on posterior sampling. Our experiments on a diverse selection of contextual and non-contextual non-stationary problems show that our recurrent approach consistently outperforms its feedforward counterpart, which requires handcrafted historical contexts, while being more widely applicable than conventional non-stationary bandit algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/05/2017

Efficient Contextual Bandits in Non-stationary Worlds

Most contextual bandit algorithms minimize regret to the best fixed poli...
research
02/13/2020

Multiscale Non-stationary Stochastic Bandits

Classic contextual bandit algorithms for linear models, such as LinUCB, ...
research
03/12/2023

Energy Regularized RNNs for Solving Non-Stationary Bandit Problems

We consider a Multi-Armed Bandit problem in which the rewards are non-st...
research
12/27/2017

Active Search for High Recall: a Non-Stationary Extension of Thompson Sampling

We consider the problem of Active Search, where a maximum of relevant ob...
research
02/03/2019

A New Algorithm for Non-stationary Contextual Bandits: Efficient, Optimal, and Parameter-free

We propose the first contextual bandit algorithm that is parameter-free,...
research
08/02/2023

Maximizing Success Rate of Payment Routing using Non-stationary Bandits

This paper discusses the system architecture design and deployment of no...
research
01/21/2019

Parallel Contextual Bandits in Wireless Handover Optimization

As cellular networks become denser, a scalable and dynamic tuning of wir...

Please sign up or login with your details

Forgot password? Click here to reset