Multinomial Logit Bandit with Low Switching Cost

07/09/2020
by   Kefan Dong, et al.
4

We study multinomial logit bandit with limited adaptivity, where the algorithms change their exploration actions as infrequently as possible when achieving almost optimal minimax regret. We propose two measures of adaptivity: the assortment switching cost and the more fine-grained item switching cost. We present an anytime algorithm (AT-DUCB) with O(N log T) assortment switches, almost matching the lower bound Ω(N log T/loglog T). In the fixed-horizon setting, our algorithm FH-DUCB incurs O(N loglog T) assortment switches, matching the asymptotic lower bound. We also present the ESUCB algorithm with item switching cost O(N log^2 T).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/21/2021

Online Convex Optimization with Continuous Switching Constraint

In many sequential decision making applications, the change of decision ...
research
05/30/2019

Provably Efficient Q-Learning with Low Switching Cost

We take initial steps in studying PAC-MDP algorithms with limited adapti...
research
10/24/2019

Minimax Regret of Switching-Constrained Online Convex Optimization: No Phase Transition

We study the problem of switching-constrained online convex optimization...
research
08/24/2020

Algorithms and Lower Bounds for the Worker-Task Assignment Problem

We study the problem of assigning workers to tasks where each task has d...
research
11/10/2020

Efficient Algorithms for Stochastic Repeated Second-price Auctions

Developing efficient sequential bidding strategies for repeated auctions...
research
03/05/2018

Online learning over a finite action set with limited switching

This paper studies the value of switching actions in the Prediction From...
research
06/30/2020

Lower Bounds for Dynamic Distributed Task Allocation

We study the problem of distributed task allocation in multi-agent syste...

Please sign up or login with your details

Forgot password? Click here to reset