Productization Challenges of Contextual Multi-Armed Bandits

07/10/2019
by   David Abensur, et al.
0

Contextual Multi-Armed Bandits is a well-known and accepted online optimization algorithm, that is used in many Web experiences to tailor content or presentation to users' traffic. Much has been published on theoretical guarantees (e.g. regret bounds) of proposed algorithmic variants, but relatively little attention has been devoted to the challenges encountered while productizing contextual bandits schemes in large scale settings. This work enumerates several productization challenges we encountered while leveraging contextual bandits for two concrete use cases at scale. We discuss how to (1) determine the context (engineer the features) that model the bandit arms; (2) sanity check the health of the optimization process; (3) evaluate the process in an offline manner; (4) add potential actions (arms) on the fly to a running process; (5) subject the decision process to constraints; and (6) iteratively improve the online learning algorithm. For each such challenge, we explain the issue, provide our approach, and relate to prior art where applicable.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/23/2021

Analysis of Thompson Sampling for Partially Observable Contextual Multi-Armed Bandits

Contextual multi-armed bandits are classical models in reinforcement lea...
research
05/10/2013

Exponentiated Gradient LINUCB for Contextual Multi-Armed Bandits

We present Exponentiated Gradient LINUCB, an algorithm for con-textual m...
research
04/26/2022

Evolutionary Multi-Armed Bandits with Genetic Thompson Sampling

As two popular schools of machine learning, online learning and evolutio...
research
12/06/2021

Contextual Bandit Applications in Customer Support Bot

Virtual support agents have grown in popularity as a way for businesses ...
research
07/15/2020

Upper Counterfactual Confidence Bounds: a New Optimism Principle for Contextual Bandits

The principle of optimism in the face of uncertainty is one of the most ...
research
03/07/2020

Online Residential Demand Response via Contextual Multi-Armed Bandits

Residential load demands have huge potential to be exploited to enhance ...
research
08/21/2020

Offline Contextual Multi-armed Bandits for Mobile Health Interventions: A Case Study on Emotion Regulation

Delivering treatment recommendations via pervasive electronic devices su...

Please sign up or login with your details

Forgot password? Click here to reset