Filtered Poisson Process Bandit on a Continuum

07/20/2020
by   James A. Grant, et al.
0

We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. Point data in the filtered sample are then revealed to the decision-maker, whose reward is the total number of revealed points. Using knowledge of the function governing the filtering, but without knowledge of the Poisson intensity function, the decision-maker seeks to maximise the expected number of revealed points over T rounds. We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space. This approach enjoys O(T^(2/3)) regret under a Lipschitz assumption on the reward function. We provide lower bounds on the regret of any algorithm for the problem, via new lower bounds for related finite-armed bandits, and show that the orders of the upper and lower bounds match up to a logarithmic factor.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/16/2011

Regret lower bounds and extended Upper Confidence Bounds policies in stochastic multi-armed bandit problem

This paper is devoted to regret lower bounds in the classical model of s...
research
05/16/2019

Adaptive Sensor Placement for Continuous Spaces

We consider the problem of adaptively placing sensors along an interval ...
research
10/04/2018

Adaptive Policies for Perimeter Surveillance Problems

Maximising the detection of intrusions is a fundamental and often critic...
research
06/28/2021

Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits

We introduce the "inverse bandit" problem of estimating the rewards of a...
research
05/08/2019

Batch-Size Independent Regret Bounds for the Combinatorial Multi-Armed Bandit Problem

We consider the combinatorial multi-armed bandit (CMAB) problem, where t...
research
01/05/2021

Sequential Choice Bandits with Feedback for Personalizing users' experience

In this work, we study sequential choice bandits with feedback. We propo...
research
11/26/2015

Gains and Losses are Fundamentally Different in Regret Minimization: The Sparse Case

We demonstrate that, in the classical non-stochastic regret minimization...

Please sign up or login with your details

Forgot password? Click here to reset