Multi-Armed Bandits with Generalized Temporally-Partitioned Rewards

03/01/2023
by   Ronald C. van den Broek, et al.
0

Decision-making problems of sequential nature, where decisions made in the past may have an impact on the future, are used to model many practically important applications. In some real-world applications, feedback about a decision is delayed and may arrive via partial rewards that are observed with different delays. Motivated by such scenarios, we propose a novel problem formulation called multi-armed bandits with generalized temporally-partitioned rewards. To formalize how feedback about a decision is partitioned across several time steps, we introduce β-spread property. We derive a lower bound on the performance of any uniformly efficient algorithm for the considered problem. Moreover, we provide an algorithm called TP-UCB-FR-G and prove an upper bound on its performance measure. In some scenarios, our upper bound improves upon the state of the art. We provide experimental results validating the proposed algorithm and our theoretical results.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2022

Generalizing distribution of partial rewards for multi-armed bandits with temporally-partitioned rewards

We investigate the Multi-Armed Bandit problem with Temporally-Partitione...
research
06/01/2022

Multi-Armed Bandit Problem with Temporally-Partitioned Rewards: When Partial Feedback Counts

There is a rising interest in industrial online applications where data ...
research
01/02/2023

Local Differential Privacy for Sequential Decision Making in a Changing Environment

We study the problem of preserving privacy while still providing high ut...
research
10/02/2019

Stochastic Bandits with Delayed Composite Anonymous Feedback

We explore a novel setting of the Multi-Armed Bandit (MAB) problem inspi...
research
06/28/2023

Pure exploration in multi-armed bandits with low rank structure using oblivious sampler

In this paper, we consider the low rank structure of the reward sequence...
research
10/04/2018

Adaptive Policies for Perimeter Surveillance Problems

Maximising the detection of intrusions is a fundamental and often critic...
research
03/13/2020

Learning and Fairness in Energy Harvesting: A Maximin Multi-Armed Bandits Approach

Recent advances in wireless radio frequency (RF) energy harvesting allow...

Please sign up or login with your details

Forgot password? Click here to reset