Fully Gap-Dependent Bounds for Multinomial Logit Bandit

11/19/2020
by   Jiaqi Yang, et al.
0

We study the multinomial logit (MNL) bandit problem, where at each time step, the seller offers an assortment of size at most K from a pool of N items, and the buyer purchases an item from the assortment according to a MNL choice model. The objective is to learn the model parameters and maximize the expected revenue. We present (i) an algorithm that identifies the optimal assortment S^* within O(∑_i = 1^N Δ_i^-2) time steps with high probability, and (ii) an algorithm that incurs O(∑_i ∉ S^* KΔ_i^-1log T) regret in T time steps. To our knowledge, our algorithms are the first to achieve gap-dependent bounds that fully depends on the suboptimality gaps of all items. Our technical contributions include an algorithmic framework that relates the MNL-bandit problem to a variant of the top-K arm identification problem in multi-armed bandits, a generalized epoch-based offering procedure, and a layer-based adaptive estimation procedure.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset