MNL-Bandit in non-stationary environments

03/04/2023
by   Ayoub Foussoul, et al.
0

In this paper, we study the MNL-Bandit problem in a non-stationary environment and present an algorithm with worst-case dynamic regret of Õ( min{√(NTL) , N^1/3(Δ_∞^K)^1/3 T^2/3 + √(NT)}). Here N is the number of arms, L is the number of switches and Δ_∞^K is a variation measure of the unknown parameters. We also show that our algorithm is near-optimal (up to logarithmic factors). Our algorithm builds upon the epoch-based algorithm for stationary MNL-Bandit in Agrawal et al. 2016. However, non-stationarity poses several challenges and we introduce new techniques and ideas to address these. In particular, we give a tight characterization for the bias introduced in the estimators due to non stationarity and derive new concentration bounds.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset