Non-Stationary Bandits with Intermediate Observations

06/03/2020
by   Claire Vernade, et al.
0

Online recommender systems often face long delays in receiving feedback, especially when optimizing for some long-term metrics. While mitigating the effects of delays in learning is well-understood in stationary environments, the problem becomes much more challenging when the environment changes. In fact, if the timescale of the change is comparable to the delay, it is impossible to learn about the environment, since the available observations are already obsolete. However, the arising issues can be addressed if intermediate signals are available without delay, such that given those signals, the long-term behavior of the system is stationary. To model this situation, we introduce the problem of stochastic, non-stationary, delayed bandits with intermediate observations. We develop a computationally efficient algorithm based on UCRL, and prove sublinear regret guarantees for its performance. Experimental results demonstrate that our method is able to learn in non-stationary delayed environments where existing methods fail.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/18/2023

Non-stationary Delayed Combinatorial Semi-Bandit with Causally Related Rewards

Sequential decision-making under uncertainty is often associated with lo...
research
07/24/2018

Learning from Delayed Outcomes with Intermediate Observations

Optimizing for long term value is desirable in many practical applicatio...
research
09/05/2020

Unifying Clustered and Non-stationary Bandits

Non-stationary bandits and online clustering of bandits lift the restric...
research
12/01/2020

Non-Stationary Latent Bandits

Users of recommender systems often behave in a non-stationary fashion, d...
research
02/04/2005

Sub-Structural Niching in Non-Stationary Environments

Niching enables a genetic algorithm (GA) to maintain diversity in a popu...
research
02/17/2016

Online optimization and regret guarantees for non-additive long-term constraints

We consider online optimization in the 1-lookahead setting, where the ob...
research
08/13/2016

A Non-stationary Service Curve Model for Estimation of Cellular Sleep Scheduling

While steady-state solutions of backlog and delay have been derived for ...

Please sign up or login with your details

Forgot password? Click here to reset