Programmatic Reward Design by Example

12/14/2021
by   Weichao Zhou, et al.
0

Reward design is a fundamental problem in reinforcement learning (RL). A misspecified or poorly designed reward can result in low sample efficiency and undesired behaviors. In this paper, we propose the idea of programmatic reward design, i.e. using programs to specify the reward functions in RL environments. Programs allow human engineers to express sub-goals and complex task scenarios in a structured and interpretable way. The challenge of programmatic reward design, however, is that while humans can provide the high-level structures, properly setting the low-level details, such as the right amount of reward for a specific sub-task, remains difficult. A major contribution of this paper is a probabilistic framework that can infer the best candidate programmatic reward function from expert demonstrations. Inspired by recent generative-adversarial approaches, our framework searches for the most likely programmatic reward function under which the optimally generated trajectories cannot be differentiated from the demonstrated trajectories. Experimental results show that programmatic reward functionslearned using this framework can significantly outperform those learned using existing reward learning algo-rithms, and enable RL agents to achieve state-of-the-artperformance on highly complex tasks.

READ FULL TEXT

page 15

page 17

page 19

research
04/20/2022

A Hierarchical Bayesian Approach to Inverse Reinforcement Learning with Symbolic Reward Machines

A misspecified reward can degrade sample efficiency and induce undesired...
research
05/15/2023

An Offline Time-aware Apprenticeship Learning Framework for Evolving Reward Functions

Apprenticeship learning (AL) is a process of inducing effective decision...
research
02/24/2021

Information Directed Reward Learning for Reinforcement Learning

For many reinforcement learning (RL) applications, specifying a reward i...
research
10/18/2022

Unpacking Reward Shaping: Understanding the Benefits of Reward Engineering on Sample Complexity

Reinforcement learning provides an automated framework for learning beha...
research
11/17/2020

Avoiding Tampering Incentives in Deep RL via Decoupled Approval

How can we design agents that pursue a given objective when all feedback...
research
09/19/2022

Understanding reinforcement learned crowds

Simulating trajectories of virtual crowds is a commonly encountered task...
research
10/19/2018

Supervising strong learners by amplifying weak experts

Many real world learning tasks involve complex or hard-to-specify object...

Please sign up or login with your details

Forgot password? Click here to reset