Maximum entropy exploration in contextual bandits with neural networks and energy based models

10/12/2022
by   Adam Elwood, et al.
0

Contextual bandits can solve a huge range of real-world problems. However, current popular algorithms to solve them either rely on linear models, or unreliable uncertainty estimation in non-linear models, which are required to deal with the exploration-exploitation trade-off. Inspired by theories of human cognition, we introduce novel techniques that use maximum entropy exploration, relying on neural networks to find optimal policies in settings with both continuous and discrete action spaces. We present two classes of models, one with neural networks as reward estimators, and the other with energy based models, which model the probability of obtaining an optimal reward given an action. We evaluate the performance of these models in static and dynamic contextual bandit simulation environments. We show that both techniques outperform well-known standard algorithms, where energy based models have the best overall performance. This provides practitioners with new techniques that perform well in static and dynamic settings, and are particularly well suited to non-linear scenarios with continuous action spaces.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2023

Neural Exploitation and Exploration of Contextual Bandits

In this paper, we study utilizing neural networks for the exploitation a...
research
03/12/2018

Semiparametric Contextual Bandits

This paper studies semiparametric contextual bandits, a generalization o...
research
08/21/2023

Graph Neural Bandits

Contextual bandits algorithms aim to choose the optimal arm with the hig...
research
02/12/2023

Statistical Complexity and Optimal Algorithms for Non-linear Ridge Bandits

We consider the sequential decision-making problem where the mean outcom...
research
02/21/2020

Online Learning in Contextual Bandits using Gated Linear Networks

We introduce a new and completely online contextual bandit algorithm cal...
research
03/08/2022

Neural Contextual Bandits via Reward-Biased Maximum Likelihood Estimation

Reward-biased maximum likelihood estimation (RBMLE) is a classic princip...
research
11/05/2021

An Empirical Study of Neural Kernel Bandits

Neural bandits have enabled practitioners to operate efficiently on prob...

Please sign up or login with your details

Forgot password? Click here to reset