Deep Contextual Multi-armed Bandits

07/25/2018
by   Mark Collier, et al.
0

Contextual multi-armed bandit problems arise frequently in important industrial applications. Existing solutions model the context either linearly, which enables uncertainty driven (principled) exploration, or non-linearly, by using epsilon-greedy exploration policies. Here we present a deep learning framework for contextual multi-armed bandits that is both non-linear and enables principled exploration at the same time. We tackle the exploration vs. exploitation trade-off through Thompson sampling by exploiting the connection between inference time dropout and sampling from the posterior over the weights of a Bayesian neural network. In order to adjust the level of exploration automatically as more data is made available to the model, the dropout rate is learned rather than considered a hyperparameter. We demonstrate that our approach substantially reduces regret on two tasks (the UCI Mushroom task and the Casino Parity task) when compared to 1) non-contextual bandits, 2) epsilon-greedy deep contextual bandits, and 3) fixed dropout rate deep contextual bandits. Our approach is currently being applied to marketing optimization problems at HubSpot.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/05/2023

Neural Exploitation and Exploration of Contextual Bandits

In this paper, we study utilizing neural networks for the exploitation a...
research
07/13/2019

Parameterized Exploration

We introduce Parameterized Exploration (PE), a simple family of methods ...
research
11/11/2018

Adapting multi-armed bandits policies to contextual bandits scenarios

This work explores adaptations of successful multi-armed bandits policie...
research
07/27/2020

Greedy Bandits with Sampled Context

Bayesian strategies for contextual bandits have proved promising in sing...
research
03/23/2023

Adaptive Endpointing with Deep Contextual Multi-armed Bandits

Current endpointing (EP) solutions learn in a supervised framework, whic...
research
09/07/2019

AutoML for Contextual Bandits

Contextual Bandits is one of the widely popular techniques used in appli...
research
07/10/2023

SHAP@k:Efficient and Probably Approximately Correct (PAC) Identification of Top-k Features

The SHAP framework provides a principled method to explain the predictio...

Please sign up or login with your details

Forgot password? Click here to reset