Gaussian Process Bandits with Aggregated Feedback

12/24/2021
by   Mengyan Zhang, et al.
33

We consider the continuum-armed bandits problem, under a novel setting of recommending the best arms within a fixed budget under aggregated feedback. This is motivated by applications where the precise rewards are impossible or expensive to obtain, while an aggregated reward or feedback, such as the average over a subset, is available. We constrain the set of reward functions by assuming that they are from a Gaussian Process and propose the Gaussian Process Optimistic Optimisation (GPOO) algorithm. We adaptively construct a tree with nodes as subsets of the arm space, where the feedback is the aggregated reward of representatives of a node. We propose a new simple regret notion with respect to aggregated feedback on the recommended arms. We provide theoretical analysis for the proposed algorithm, and recover single point feedback as a special case. We illustrate GPOO and compare it with related algorithms on simulated data.

READ FULL TEXT
research
10/02/2019

Stochastic Bandits with Delayed Composite Anonymous Feedback

We explore a novel setting of the Multi-Armed Bandit (MAB) problem inspi...
research
12/26/2022

Gaussian Process Classification Bandits

Classification bandits are multi-armed bandit problems whose task is to ...
research
11/29/2021

Contextual Combinatorial Volatile Bandits with Satisfying via Gaussian Processes

In many real-world applications of combinatorial bandits such as content...
research
06/03/2021

Sleeping Combinatorial Bandits

In this paper, we study an interesting combination of sleeping and combi...
research
04/11/2023

: Fair Multi-Armed Bandits with Guaranteed Rewards per Arm

Classic no-regret online prediction algorithms, including variants of th...
research
02/23/2023

Reward Learning as Doubly Nonparametric Bandits: Optimal Design and Scaling Laws

Specifying reward functions for complex tasks like object manipulation o...
research
06/21/2021

Smooth Sequential Optimisation with Delayed Feedback

Stochastic delays in feedback lead to unstable sequential learning using...

Please sign up or login with your details

Forgot password? Click here to reset