Efficient Explorative Key-term Selection Strategies for Conversational Contextual Bandits

03/01/2023
by   Zhiyong Wang, et al.
0

Conversational contextual bandits elicit user preferences by occasionally querying for explicit feedback on key-terms to accelerate learning. However, there are aspects of existing approaches which limit their performance. First, information gained from key-term-level conversations and arm-level recommendations is not appropriately incorporated to speed up learning. Second, it is important to ask explorative key-terms to quickly elicit the user's potential interests in various domains to accelerate the convergence of user preference estimation, which has never been considered in existing works. To tackle these issues, we first propose “ConLinUCB", a general framework for conversational bandits with better information incorporation, combining arm-level and key-term-level feedback to estimate user preference in one step at each time. Based on this framework, we further design two bandit algorithms with explorative key-term selection strategies, ConLinUCB-BS and ConLinUCB-MCR. We prove tighter regret upper bounds of our proposed algorithms. Particularly, ConLinUCB-BS achieves a regret bound of O(√(dTlog T)), better than the previous result O(d√(T)log T). Extensive experiments on synthetic and real-world data show significant advantages of our algorithms in learning accuracy (up to 54% improvement) and computational efficiency (up to 72% improvement), compared to the classic ConUCB algorithm, showing the potential benefit to recommender systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/06/2022

Hierarchical Conversational Preference Elicitation with Bandit Feedback

The recent advances of conversational recommendations provide a promisin...
research
08/21/2022

Comparison-based Conversational Recommender System with Relative Bandit Feedback

With the recent advances of conversational recommendations, the recommen...
research
06/18/2019

Simple Algorithms for Dueling Bandits

In this paper, we present simple algorithms for Dueling Bandits. We prov...
research
02/20/2020

Regret Minimization in Stochastic Contextual Dueling Bandits

We consider the problem of stochastic K-armed dueling bandit in the cont...
research
02/29/2020

Contextual-Bandit Based Personalized Recommendation with Time-Varying User Interests

A contextual bandit problem is studied in a highly non-stationary enviro...
research
01/02/2019

Warm-starting Contextual Bandits: Robustly Combining Supervised and Bandit Feedback

We investigate the feasibility of learning from both fully-labeled super...
research
08/21/2020

Offline Contextual Multi-armed Bandits for Mobile Health Interventions: A Case Study on Emotion Regulation

Delivering treatment recommendations via pervasive electronic devices su...

Please sign up or login with your details

Forgot password? Click here to reset