Multi-armed Bandit Requiring Monotone Arm Sequences

06/07/2021
by   Ningyuan Chen, et al.
0

In many online learning or multi-armed bandit problems, the taken actions or pulled arms are ordinal and required to be monotone over time. Examples include dynamic pricing, in which the firms use markup pricing policies to please early adopters and deter strategic waiting, and clinical trials, in which the dose allocation usually follows the dose escalation principle to prevent dose limiting toxicities. We consider the continuum-armed bandit problem when the arm sequence is required to be monotone. We show that when the unknown objective function is Lipschitz continuous, the regret is O(T). When in addition the objective function is unimodal or quasiconcave, the regret is Õ(T^3/4) under the proposed algorithm, which is also shown to be the optimal rate. This deviates from the optimal rate Õ(T^2/3) in the continuous-armed bandit literature and demonstrates the cost to the learning efficiency brought by the monotonicity requirement.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/07/2018

Thompson Sampling for Combinatorial Multi-armed Bandit with Probabilistically Triggered Arms

We analyze the regret of combinatorial Thompson sampling (CTS) for the c...
research
03/17/2019

On Multi-Armed Bandit Designs for Phase I Clinical Trials

We study the problem of finding the optimal dosage in a phase I clinical...
research
01/05/2022

Bridging Adversarial and Nonstationary Multi-armed Bandit

In the multi-armed bandit framework, there are two formulations that are...
research
07/11/2019

Online Learning to Estimate Warfarin Dose with Contextual Linear Bandits

Warfarin is one of the most commonly used oral blood anticoagulant agent...
research
11/28/2019

Bayesian Optimization for Categorical and Category-Specific Continuous Inputs

Many real-world functions are defined over both categorical and category...
research
04/10/2017

Automated Curriculum Learning for Neural Networks

We introduce a method for automatically selecting the path, or syllabus,...
research
05/21/2019

Adaptive Model Selection Framework: An Application to Airline Pricing

Multiple machine learning and prediction models are often used for the s...

Please sign up or login with your details

Forgot password? Click here to reset