Sequential Matrix Completion

10/23/2017
by   Annie Marsden, et al.
0

We propose a novel algorithm for sequential matrix completion in a recommender system setting, where the (i,j)th entry of the matrix corresponds to a user i's rating of product j. The objective of the algorithm is to provide a sequential policy for user-product pair recommendation which will yield the highest possible ratings after a finite time horizon. The algorithm uses a Gamma process factor model with two posterior-focused bandit policies, Thompson Sampling and Information-Directed Sampling. While Thompson Sampling shows competitive performance in simulations, state-of-the-art performance is obtained from Information-Directed Sampling, which makes its recommendations based off a ratio between the expected reward and a measure of information gain. To our knowledge, this is the first implementation of Information Directed Sampling on large real datasets. This approach contributes to a recent line of research on bandit approaches to collaborative filtering including Kawale et al. (2015), Li et al. (2010), Bresler et al. (2014), Li et al. (2016), Deshpande & Montanari (2012), and Zhao et al. (2013). The setting of this paper, as has been noted in Kawale et al. (2015) and Zhao et al. (2013), presents significant challenges to bounding regret after finite horizons. We discuss these challenges in relation to simpler models for bandits with side information, such as linear or gaussian process bandits, and hope the experiments presented here motivate further research toward theoretical guarantees.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/26/2023

Thompson Sampling Regret Bounds for Contextual Bandits with sub-Gaussian rewards

In this work, we study the performance of the Thompson Sampling algorith...
research
10/07/2021

A Model Selection Approach for Corruption Robust Reinforcement Learning

We develop a model selection approach to tackle reinforcement learning w...
research
08/21/2023

Matrix Completion over Finite Fields: Bounds and Belief Propagation Algorithms

We consider the low rank matrix completion problem over finite fields. T...
research
10/12/2019

Thompson Sampling in Non-Episodic Restless Bandits

Restless bandit problems assume time-varying reward distributions of the...
research
03/13/2018

Binary Matrix Completion Using Unobserved Entries

A matrix completion problem, which aims to recover a complete matrix fro...
research
06/11/2020

TS-UCB: Improving on Thompson Sampling With Little to No Additional Computation

Thompson sampling has become a ubiquitous approach to online decision pr...
research
06/12/2019

Estimation of the Shapley value by ergodic sampling

The idea of approximating the Shapley value of an n-person game by rando...

Please sign up or login with your details

Forgot password? Click here to reset