Partial Bandit and Semi-Bandit: Making the Most Out of Scarce Users' Feedback

09/16/2020
by   Alexandre Letard, et al.
15

Recent works on Multi-Armed Bandits (MAB) and Combinatorial Multi-Armed Bandits (COM-MAB) show good results on a global accuracy metric. This can be achieved, in the case of recommender systems, with personalization. However, with a combinatorial online learning approach, personalization implies a large amount of user feedbacks. Such feedbacks can be hard to acquire when users need to be directly and frequently solicited. For a number of fields of activities undergoing the digitization of their business, online learning is unavoidable. Thus, a number of approaches allowing implicit user feedback retrieval have been implemented. Nevertheless, this implicit feedback can be misleading or inefficient for the agent's learning. Herein, we propose a novel approach reducing the number of explicit feedbacks required by Combinatorial Multi Armed bandit (COM-MAB) algorithms while providing similar levels of global accuracy and learning efficiency to classical competitive methods. In this paper we present a novel approach for considering user feedback and evaluate it using three distinct strategies. Despite a limited number of feedbacks returned by users (as low as 20 those of state of the art approaches.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/21/2021

BanditMF: Multi-Armed Bandit Based Matrix Factorization Recommender System

Multi-armed bandits (MAB) provide a principled online learning approach ...
research
08/22/2016

Multi-Dueling Bandits and Their Application to Online Ranker Evaluation

New ranking algorithms are continually being developed and refined, nece...
research
08/24/2023

Master-slave Deep Architecture for Top-K Multi-armed Bandits with Non-linear Bandit Feedback and Diversity Constraints

We propose a novel master-slave architecture to solve the top-K combinat...
research
12/08/2020

A Multi-Armed Bandit-based Approach to Mobile Network Provider Selection

We argue for giving users the ability to lease bandwidth temporarily fro...
research
06/11/2020

Bandit-PAM: Almost Linear Time k-Medoids Clustering via Multi-Armed Bandits

Clustering is a ubiquitous task in data science. Compared to the commonl...
research
07/30/2018

Preference-based Online Learning with Dueling Bandits: A Survey

In machine learning, the notion of multi-armed bandits refers to a class...

Please sign up or login with your details

Forgot password? Click here to reset