Gaussian One-Armed Bandit and Optimization of Batch Data Processing

01/25/2019
by   Alexander Kolnogorov, et al.
0

We consider the minimax setup for Gaussian one-armed bandit problem, i.e. the two-armed bandit problem with Gaussian distributions of incomes and known distribution corresponding to the first arm. This setup naturally arises when the optimization of batch data processing is considered and there are two alternative processing methods available with a priori known efficiency of the first method. One should estimate the efficiency of the second method and provide predominant usage of the most efficient of both them. According to the main theorem of the theory of games minimax strategy and minimax risk are searched for as Bayesian ones corresponding to the worst-case prior distribution. As a result, we obtain the recursive integro-difference equation and the second order partial differential equation in the limiting case as the number of batches goes to infinity. This makes it possible to determine minimax risk and minimax strategy by numerical methods. If the number of batches is large enough we show that batch data processing almost does not influence the control performance, i.e. the value of the minimax risk. Moreover, in case of Bernoulli incomes and large number of batches, batch data processing provides almost the same minimax risk as the optimal one-by-one data processing.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/15/2019

Exponential two-armed bandit problem

We consider exponential two-armed bandit problem in which incomes are de...
research
02/01/2019

Multi-Armed Bandit Problem and Batch UCB Rule

We obtain the upper bound of the loss function for a strategy in the mul...
research
07/13/2019

A new approach to Poissonian two-armed bandit problem

We consider a continuous time two-armed bandit problem in which incomes ...
research
06/03/2021

A Closer Look at the Worst-case Behavior of Multi-armed Bandit Algorithms

One of the key drivers of complexity in the classical (stochastic) multi...
research
02/11/2022

A PDE-Based Analysis of the Symmetric Two-Armed Bernoulli Bandit

This work addresses a version of the two-armed Bernoulli bandit problem ...
research
12/10/2014

Generalised Entropy MDPs and Minimax Regret

Bayesian methods suffer from the problem of how to specify prior beliefs...
research
07/03/2020

Hedging using reinforcement learning: Contextual k-Armed Bandit versus Q-learning

The construction of replication strategies for contingent claims in the ...

Please sign up or login with your details

Forgot password? Click here to reset