Online Action Learning in High Dimensions: A New Exploration Rule for Contextual ε_t-Greedy Heuristics

09/29/2020
by   Claudio C. Flores, et al.
0

Bandit problems are pervasive in various fields of research and are also present in several practical applications. Examples, including dynamic pricing and assortment and the design of auctions and incentives, permeate a large number of sequential treatment experiments. Different applications impose distinct levels of restrictions on viable actions. Some favor diversity of outcomes, while others require harmful actions to be closely monitored or mainly avoided. In this paper, we extend one of the most popular bandit solutions, the original ϵ_t-greedy heuristics, to high-dimensional contexts. Moreover, we introduce a competing exploration mechanism that counts with searching sets based on order statistics. We view our proposals as alternatives for cases where pluralism is valued or, in the opposite direction, cases where the end-user should carefully tune the range of exploration of new actions. We find reasonable bounds for the cumulative regret of a decaying ϵ_t-greedy heuristic in both cases and we provide an upper bound for the initialization phase that implies the regret bounds when order statistics are considered to be at most equal but mostly better than the case when random searching is the sole exploration mechanism. Additionally, we show that end-users have sufficient flexibility to avoid harmful actions since any cardinality for the higher-order statistics can be used to achieve an stricter upper bound. In a simulation exercise, we show that the algorithms proposed in this paper outperform simple and adapted counterparts.

READ FULL TEXT

page 38

page 39

research
05/04/2019

Tight Regret Bounds for Infinite-armed Linear Contextual Bandits

Linear contextual bandit is a class of sequential decision making proble...
research
02/12/2018

Practical Evaluation and Optimization of Contextual Bandit Algorithms

We study and empirically optimize contextual bandit learning, exploratio...
research
02/18/2021

A Simple Unified Framework for High Dimensional Bandit Problems

Stochastic high dimensional bandit problems with low dimensional structu...
research
12/02/2021

Convergence Guarantees for Deep Epsilon Greedy Policy Learning

Policy learning is a quickly growing area. As robotics and computers con...
research
02/26/2020

Structured Linear Contextual Bandits: A Sharp and Geometric Smoothed Analysis

Bandit learning algorithms typically involve the balance of exploration ...
research
09/14/2023

Doubly High-Dimensional Contextual Bandits: An Interpretable Model for Joint Assortment-Pricing

Key challenges in running a retail business include how to select produc...

Please sign up or login with your details

Forgot password? Click here to reset