Provably Feedback-Efficient Reinforcement Learning via Active Reward Learning

04/18/2023
by   Dingwen Kong, et al.
0

An appropriate reward function is of paramount importance in specifying a task in reinforcement learning (RL). Yet, it is known to be extremely challenging in practice to design a correct reward function for even simple tasks. Human-in-the-loop (HiL) RL allows humans to communicate complex goals to the RL agent by providing various types of feedback. However, despite achieving great empirical successes, HiL RL usually requires too much feedback from a human teacher and also suffers from insufficient theoretical understanding. In this paper, we focus on addressing this issue from a theoretical perspective, aiming to provide provably feedback-efficient algorithmic frameworks that take human-in-the-loop to specify rewards of given tasks. We provide an active-learning-based RL algorithm that first explores the environment without specifying a reward function and then asks a human teacher for only a few queries about the rewards of a task at some state-action pairs. After that, the algorithm guarantees to provide a nearly optimal policy for the task with high probability. We show that, even with the presence of random noise in the feedback, the algorithm only takes O(H_R^2) queries on the reward function to provide an ϵ-optimal policy for any ϵ > 0. Here H is the horizon of the RL environment, and _R specifies the complexity of the function class representing the reward function. In contrast, standard RL algorithms require to query the reward function for at least Ω(poly(d, 1/ϵ)) state-action pairs where d depends on the complexity of the environmental transition.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/24/2021

Information Directed Reward Learning for Reinforcement Learning

For many reinforcement learning (RL) applications, specifying a reward i...
research
09/16/2019

Leveraging human Domain Knowledge to model an empirical Reward function for a Reinforcement Learning problem

Traditional Reinforcement Learning (RL) problems depend on an exhaustive...
research
08/30/2023

Iterative Reward Shaping using Human Feedback for Correcting Reward Misspecification

A well-defined reward function is crucial for successful training of an ...
research
03/02/2023

Active Reward Learning from Multiple Teachers

Reward learning algorithms utilize human feedback to infer a reward func...
research
11/01/1997

Dynamic Non-Bayesian Decision Making

The model of a non-Bayesian agent who faces a repeated game with incompl...
research
03/09/2020

Human AI interaction loop training: New approach for interactive reinforcement learning

Reinforcement Learning (RL) in various decision-making tasks of machine ...
research
03/10/2021

Maximum Entropy RL (Provably) Solves Some Robust RL Problems

Many potential applications of reinforcement learning (RL) require guara...

Please sign up or login with your details

Forgot password? Click here to reset