Simplifying Reward Design through Divide-and-Conquer

06/07/2018
by   Ellis Ratner, et al.
0

Designing a good reward function is essential to robot planning and reinforcement learning, but it can also be challenging and frustrating. The reward needs to work across multiple different environments, and that often requires many iterations of tuning. We introduce a novel divide-and-conquer approach that enables the designer to specify a reward separately for each environment. By treating these separate reward functions as observations about the underlying true reward, we derive an approach to infer a common reward across all environments. We conduct user studies in an abstract grid world domain and in a motion planning domain for a 7-DOF manipulator that measure user effort and solution quality. We show that our method is faster, easier to use, and produces a higher quality solution than the typical method of designing a reward jointly across all environments. We additionally conduct a series of experiments that measure the sensitivity of these results to different properties of the reward design task, such as the number of environments, the number of feasible solutions per environment, and the fraction of the total features that vary within each environment. We find that independent reward design outperforms the standard, joint, reward design process but works best when the design problem can be divided into simpler subproblems.

READ FULL TEXT

page 1

page 5

page 6

research
04/28/2020

Pitfalls of learning a reward function online

In some agent designs like inverse reinforcement learning an agent needs...
research
11/18/2021

Assisted Robust Reward Design

Real-world robotic tasks require complex reward functions. When we defin...
research
09/09/2018

Active Inverse Reward Design

Reward design, the problem of selecting an appropriate reward function f...
research
11/08/2017

Inverse Reward Design

Autonomous agents optimize the reward function we give them. What they d...
research
12/07/2019

Driving Style Encoder: Situational Reward Adaptation for General-Purpose Planning in Automated Driving

General-purpose planning algorithms for automated driving combine missio...
research
07/07/2017

Emergence of Locomotion Behaviours in Rich Environments

The reinforcement learning paradigm allows, in principle, for complex be...
research
06/15/2023

Reward-Free Curricula for Training Robust World Models

There has been a recent surge of interest in developing generally-capabl...

Please sign up or login with your details

Forgot password? Click here to reset