Simultaneously Learning Stochastic and Adversarial Bandits under the Position-Based Model

07/12/2022
by   Cheng Chen, et al.
0

Online learning to rank (OLTR) interactively learns to choose lists of items from a large collection based on certain click models that describe users' click behaviors. Most recent works for this problem focus on the stochastic environment where the item attractiveness is assumed to be invariant during the learning process. In many real-world scenarios, however, the environment could be dynamic or even arbitrarily changing. This work studies the OLTR problem in both stochastic and adversarial environments under the position-based model (PBM). We propose a method based on the follow-the-regularized-leader (FTRL) framework with Tsallis entropy and develop a new self-bounding constraint especially designed for PBM. We prove the proposed algorithm simultaneously achieves O(logT) regret in the stochastic environment and O(m√(nT)) regret in the adversarial environment, where T is the number of rounds, n is the number of items and m is the number of positions. We also provide a lower bound of order Ω(m√(nT)) for adversarial PBM, which matches our upper bound and improves over the state-of-the-art lower bound. The experiments show that our algorithm could simultaneously learn in both stochastic and adversarial environments and is competitive compared to existing methods that are designed for a single environment.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/28/2022

Adaptive Best-of-Both-Worlds Algorithm for Heavy-Tailed Multi-Armed Bandits

In this paper, we generalize the concept of heavy-tailed multi-armed ban...
research
06/29/2022

A Best-of-Both-Worlds Algorithm for Bandits with Delayed Feedback

We present a modified tuning of the algorithm of Zimmert and Seldin [202...
research
03/15/2023

Borda Regret Minimization for Generalized Linear Dueling Bandits

Dueling bandits are widely used to model preferential feedback that is p...
research
06/27/2018

Dynamic Assortment Selection under the Nested Logit Models

We study a stylized dynamic assortment planning problem during a selling...
research
10/25/2018

Almost Optimal Algorithms for Linear Stochastic Bandits with Heavy-Tailed Payoffs

In linear stochastic bandits, it is commonly assumed that payoffs are wi...
research
07/02/2018

Adaptation to Easy Data in Prediction with Limited Advice

We derive an online learning algorithm with improved regret guarantees f...
research
07/21/2019

Alice's Adventures in the Markovian World

This paper proposes an algorithm Alice having no access to the physics l...

Please sign up or login with your details

Forgot password? Click here to reset