Combinatorial Semi-Bandit in the Non-Stationary Environment

02/10/2020
by   Wei Chen, et al.
13

In this paper, we investigate the non-stationary combinatorial semi-bandit problem, both in the switching case and in the dynamic case. In the general case where (a) the reward function is non-linear, (b) arms may be probabilistically triggered, and (c) only approximate offline oracle exists <cit.>, our algorithm achieves Õ(√(S T)) distribution-dependent regret in the switching case, and Õ(V^1/3T^2/3) in the dynamic case, where S is the number of switchings and V is the sum of the total “distribution changes”. The regret bounds in both scenarios are nearly optimal, but our algorithm needs to know the parameter S or V in advance. We further show that by employing another technique, our algorithm no longer needs to know the parameters S or V but the regret bounds could become suboptimal. In a special case where the reward function is linear and we have an exact oracle, we design a parameter-free algorithm that achieves nearly optimal regret both in the switching case and in the dynamic case without knowing the parameters in advance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/05/2021

Adversarial Combinatorial Bandits with General Non-linear Reward Functions

In this paper we study the adversarial combinatorial bandit with a known...
research
12/11/2019

Near-optimal Oracle-efficient Algorithms for Stationary and Non-Stationary Stochastic Linear Bandits

We investigate the design of two algorithms that enjoy not only computat...
research
03/04/2023

MNL-Bandit in non-stationary environments

In this paper, we study the MNL-Bandit problem in a non-stationary envir...
research
05/29/2022

An Optimization-based Algorithm for Non-stationary Kernel Bandits without Prior Knowledge

We propose an algorithm for non-stationary kernel bandits that does not ...
research
10/22/2021

Break your Bandit Routine with LSD Rewards: a Last Switch Dependent Analysis of Satiation and Seasonality

Motivated by the fact that humans like some level of unpredictability or...
research
11/06/2021

Dynamic Regret Minimization for Control of Non-stationary Linear Dynamical Systems

We consider the problem of controlling a Linear Quadratic Regulator (LQR...
research
02/01/2021

Generalized non-stationary bandits

In this paper, we study a non-stationary stochastic bandit problem, whic...

Please sign up or login with your details

Forgot password? Click here to reset