Conservative Exploration using Interleaving

06/03/2018
by   Sumeet Katariya, et al.
2

In many practical problems, a learning agent may want to learn the best action in hindsight without ever taking a bad action, which is significantly worse than the default production action. In general, this is impossible because the agent has to explore unknown actions, some of which can be bad, to learn better actions. However, when the actions are combinatorial, this may be possible if the unknown action can be evaluated by interleaving it with the production action. We formalize this concept as learning in stochastic combinatorial semi-bandits with exchangeable actions. We design efficient learning algorithms for this problem, bound their n-step regret, and evaluate them on both synthetic and real-world problems. Our real-world experiments show that our algorithms can learn to recommend K most attractive movies without ever violating a strict production constraint, both overall and subject to a diversity constraint.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/08/2020

Improved Algorithms for Conservative Exploration in Bandits

In many fields such as digital marketing, healthcare, finance, and robot...
research
02/11/2019

Exploiting Structure of Uncertainty for Efficient Combinatorial Semi-Bandits

We improve the efficiency of algorithms for stochastic combinatorial sem...
research
06/28/2014

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

A stochastic combinatorial semi-bandit is an online learning problem whe...
research
06/01/2022

Incentivizing Combinatorial Bandit Exploration

Consider a bandit algorithm that recommends actions to self-interested u...
research
10/30/2020

The Combinatorial Multi-Bandit Problem and its Application to Energy Management

We study a Combinatorial Multi-Bandit Problem motivated by applications ...
research
05/30/2022

Generalizing Hierarchical Bayesian Bandits

A contextual bandit is a popular and practical framework for online lear...
research
04/24/2023

Moving Forward by Moving Backward: Embedding Action Impact over Action Semantics

A common assumption when training embodied agents is that the impact of ...

Please sign up or login with your details

Forgot password? Click here to reset