Information Directed Reward Learning for Reinforcement Learning

02/24/2021
by   David Lindner, et al.
0

For many reinforcement learning (RL) applications, specifying a reward is difficult. In this paper, we consider an RL setting where the agent can obtain information about the reward only by querying an expert that can, for example, evaluate individual states or provide binary preferences over trajectories. From such expensive feedback, we aim to learn a model of the reward function that allows standard RL algorithms to achieve high expected return with as few expert queries as possible. For this purpose, we propose Information Directed Reward Learning (IDRL), which uses a Bayesian model of the reward function and selects queries that maximize the information gain about the difference in return between potentially optimal policies. In contrast to prior active reward learning methods designed for specific types of queries, IDRL naturally accommodates different query types. Moreover, by shifting the focus from reducing the reward approximation error to improving the policy induced by the reward model, it achieves similar or better performance with significantly fewer queries. We support our findings with extensive evaluations in multiple environments and with different types of queries.

READ FULL TEXT
research
04/18/2023

Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning

An appropriate reward function is of paramount importance in specifying ...
research
05/18/2023

Bayesian Reparameterization of Reward-Conditioned Reinforcement Learning with Energy-based Models

Recently, reward-conditioned reinforcement learning (RCRL) has gained po...
research
11/28/2017

Hierarchical Policy Search via Return-Weighted Density Estimation

Learning an optimal policy from a multi-modal reward function is a chall...
research
12/14/2021

Programmatic Reward Design by Example

Reward design is a fundamental problem in reinforcement learning (RL). A...
research
12/08/2019

Effects of a Social Force Model reward in Robot Navigation based on Deep Reinforcement Learning

In this paper is proposed an inclusion of the Social Force Model (SFM) i...
research
10/08/2020

Maximum Reward Formulation In Reinforcement Learning

Reinforcement learning (RL) algorithms typically deal with maximizing th...
research
07/13/2023

Reward-Directed Conditional Diffusion: Provable Distribution Estimation and Reward Improvement

We explore the methodology and theory of reward-directed generation via ...

Please sign up or login with your details

Forgot password? Click here to reset