Non-stationary Contextual Bandits and Universal Learning

02/14/2023
by   Moise Blanchard, et al.
0

We study the fundamental limits of learning in contextual bandits, where a learner's rewards depend on their actions and a known context, which extends the canonical multi-armed bandit to the case where side-information is available. We are interested in universally consistent algorithms, which achieve sublinear regret compared to any measurable fixed policy, without any function class restriction. For stationary contextual bandits, when the underlying reward mechanism is time-invariant, [Blanchard et al.] characterized learnable context processes for which universal consistency is achievable; and further gave algorithms ensuring universal consistency whenever this is achievable, a property known as optimistic universal consistency. It is well understood, however, that reward mechanisms can evolve over time, possibly depending on the learner's actions. We show that optimistic universal learning for non-stationary contextual bandits is impossible in general, contrary to all previously studied settings in online learning – including standard supervised learning. We also give necessary and sufficient conditions for universal learning under various non-stationarity models, including online and adversarial reward mechanisms. In particular, the set of learnable processes for non-stationary rewards is still extremely general – larger than i.i.d., stationary or ergodic – but in general strictly smaller than that for supervised learning or stationary contextual bandits, shedding light on new non-stationary phenomena.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/31/2022

Contextual Bandits and Optimistically Universal Learning

We consider the contextual bandit problem on general action and context ...
research
09/05/2020

Unifying Clustered and Non-stationary Bandits

Non-stationary bandits and online clustering of bandits lift the restric...
research
08/05/2017

Efficient Contextual Bandits in Non-stationary Worlds

Most contextual bandit algorithms minimize regret to the best fixed poli...
research
03/09/2021

Regret Bounds for Generalized Linear Bandits under Parameter Drift

Generalized Linear Bandits (GLBs) are powerful extensions to the Linear ...
research
08/21/2013

Distributed Online Learning via Cooperative Contextual Bandits

In this paper we propose a novel framework for decentralized, online lea...
research
03/09/2022

Universal Regression with Adversarial Responses

We provide algorithms for regression with adversarial responses under la...
research
01/03/2022

Using Non-Stationary Bandits for Learning in Repeated Cournot Games with Non-Stationary Demand

Many past attempts at modeling repeated Cournot games assume that demand...

Please sign up or login with your details

Forgot password? Click here to reset