Multi-Armed Bandit Problem and Batch UCB Rule

02/01/2019
by   Alexander Kolnogorov, et al.
0

We obtain the upper bound of the loss function for a strategy in the multi-armed bandit problem with Gaussian distributions of incomes. Considered strategy is an asymptotic generalization of the strategy proposed by J. Bather for the multi-armed bandit problem and using UCB rule, i.e. choosing the action corresponding to the maximum of the upper bound of the confidence interval of the current estimate of the expected value of one-step income. Results are obtained with the help of invariant description of the control on the unit horizon in the domain of close distributions because just there the loss function attains its maximal values. UCB rule is widely used in machine learning. It can be also used for the batch data processing optimization if there are two alternative processing methods available with different a priori unknown efficiencies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/25/2019

Gaussian One-Armed Bandit and Optimization of Batch Data Processing

We consider the minimax setup for Gaussian one-armed bandit problem, i.e...
research
03/15/2016

Optimal Sensing via Multi-armed Bandit Relaxations in Mixed Observability Domains

Sequential decision making under uncertainty is studied in a mixed obser...
research
10/02/2020

On Statistical Discrimination as a Failure of Social Learning: A Multi-Armed Bandit Approach

We analyze statistical discrimination using a multi-armed bandit model w...
research
06/11/2019

Ultra Fast Medoid Identification via Correlated Sequential Halving

The medoid of a set of n points is the point in the set that minimizes t...
research
03/18/2015

Differentiating the multipoint Expected Improvement for optimal batch design

This work deals with parallel optimization of expensive objective functi...
research
12/13/2022

Towards Efficient and Domain-Agnostic Evasion Attack with High-dimensional Categorical Inputs

Our work targets at searching feasible adversarial perturbation to attac...
research
05/23/2017

A Multi-Armed Bandit to Smartly Select a Training Set from Big Medical Data

With the availability of big medical image data, the selection of an ade...

Please sign up or login with your details

Forgot password? Click here to reset