Towards Learning Reward Functions from User Interactions

08/15/2017
by   Ziming Li, et al.
0

In the physical world, people have dynamic preferences, e.g., the same situation can lead to satisfaction for some humans and to frustration for others. Personalization is called for. The same observation holds for online behavior with interactive systems. It is natural to represent the behavior of users who are engaging with interactive systems such as a search engine or a recommender system, as a sequence of actions where each next action depends on the current situation and the user reward of taking a particular action. By and large, current online evaluation metrics for interactive systems such as search engines or recommender systems, are static and do not reflect differences in user behavior. They rarely capture or model the reward experienced by a user while interacting with an interactive system. We argue that knowing a user's reward function is essential for an interactive system as both for learning and evaluation. We propose to learn users' reward functions directly from observed interaction traces. In particular, we present how users' reward functions can be uncovered directly using inverse reinforcement learning techniques. We also show how to incorporate user features into the learning process. Our main contribution is a novel and dynamic approach to restore a user's reward function. We present an analytic approach to this problem and complement it with initial experiments using the interaction logs of a cultural heritage institution that demonstrate the feasibility of the approach by uncovering different reward functions for different user groups.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/28/2022

Personalized Reward Learning with Interaction-Grounded Learning (IGL)

In an era of countless content offerings, recommender systems alleviate ...
research
05/03/2021

Generative Adversarial Reward Learning for Generalized Behavior Tendency Inference

Recent advances in reinforcement learning have inspired increasing inter...
research
05/01/2023

Explanation through Reward Model Reconciliation using POMDP Tree Search

As artificial intelligence (AI) algorithms are increasingly used in miss...
research
06/19/2020

Optimizing Interactive Systems via Data-Driven Objectives

Effective optimization is essential for real-world interactive systems t...
research
02/17/2018

Optimizing Interactive Systems with Data-Driven Objectives

Effective optimization is essential for to provide a satisfactory user ...
research
04/17/2023

Causal Decision Transformer for Recommender Systems via Offline Reinforcement Learning

Reinforcement learning-based recommender systems have recently gained po...
research
08/03/2023

Fast Slate Policy Optimization: Going Beyond Plackett-Luce

An increasingly important building block of large scale machine learning...

Please sign up or login with your details

Forgot password? Click here to reset