Bernoulli Rank-1 Bandits for Click Feedback

03/19/2017
by   Sumeet Katariya, et al.
0

The probability that a user will click a search result depends both on its relevance and its position on the results page. The position based model explains this behavior by ascribing to every item an attraction probability, and to every position an examination probability. To be clicked, a result must be both attractive and examined. The probabilities of an item-position pair being clicked thus form the entries of a rank-1 matrix. We propose the learning problem of a Bernoulli rank-1 bandit where at each step, the learning agent chooses a pair of row and column arms, and receives the product of their Bernoulli-distributed values as a reward. This is a special case of the stochastic rank-1 bandit problem considered in recent work that proposed an elimination based algorithm Rank1Elim, and showed that Rank1Elim's regret scales linearly with the number of rows and columns on "benign" instances. These are the instances where the minimum of the average row and column rewards μ is bounded away from zero. The issue with Rank1Elim is that it fails to be competitive with straightforward bandit strategies as μ→ 0. In this paper we propose Rank1ElimKL which simply replaces the (crude) confidence intervals of Rank1Elim with confidence intervals based on Kullback-Leibler (KL) divergences, and with the help of a novel result concerning the scaling of KL divergences we prove that with this change, our algorithm will be competitive no matter the value of μ. Experiments with synthetic data confirm that on benign instances the performance of Rank1ElimKL is significantly better than that of even Rank1Elim, while experiments with models derived from real data confirm that the improvements are significant across the board, regardless of whether the data is benign or not.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/10/2016

Stochastic Rank-1 Bandits

We propose stochastic rank-1 bandits, a class of online learning problem...
research
12/13/2017

Stochastic Low-Rank Bandits

Many problems in computer vision and recommender systems involve low-ran...
research
06/22/2023

Logarithmic Regret for Matrix Games against an Adversary with Noisy Bandit Feedback

This paper considers a variant of zero-sum matrix games where at each ti...
research
09/14/2020

Hellinger KL-UCB based Bandit Algorithms for Markovian and i.i.d. Settings

In the regret-based formulation of multi-armed bandit (MAB) problems, ex...
research
02/15/2023

Bandit Social Learning: Exploration under Myopic Behavior

We study social learning dynamics where the agents collectively follow a...
research
01/09/2018

Better and Simpler Error Analysis of the Sinkhorn-Knopp Algorithm for Matrix Scaling

Given a non-negative n × m real matrix A, the matrix scaling problem is...

Please sign up or login with your details

Forgot password? Click here to reset