Reward Constrained Policy Optimization

05/28/2018
by   Chen Tessler, et al.
0

Teaching agents to perform tasks using Reinforcement Learning is no easy feat. As the goal of reinforcement learning agents is to maximize the accumulated reward, they often find loopholes and misspecifications in the reward signal which lead to unwanted behavior. To overcome this, often, regularization is employed through the technique of reward shaping - the agent is provided an additional weighted reward signal, meant to lead it towards a desired behavior. The weight is considered as a hyper-parameter and is selected through trial and error, a time consuming and computationally intensive task. In this work, we present a novel multi-timescale approach for constrained policy optimization, called, 'Reward Constrained Policy Optimization' (RCPO), which enables policy regularization without the use of reward shaping. We prove the convergence of our approach and provide empirical evidence of its ability to train constraint satisfying policies.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/28/2022

Lexicographic Multi-Objective Reinforcement Learning

In this work we introduce reinforcement learning techniques for solving ...
research
02/16/2020

First Order Optimization in Policy Space for Constrained Deep Reinforcement Learning

In reinforcement learning, an agent attempts to learn high-performing be...
research
02/24/2022

Learning Transferable Reward for Query Object Localization with Policy Adaptation

We propose a reinforcement learning based approach to query object local...
research
05/14/2022

Cliff Diving: Exploring Reward Surfaces in Reinforcement Learning Environments

Visualizing optimization landscapes has led to many fundamental insights...
research
09/21/2018

Interpretable Multi-Objective Reinforcement Learning through Policy Orchestration

Autonomous cyber-physical agents and systems play an increasingly large ...
research
12/07/2022

Specifying Behavior Preference with Tiered Reward Functions

Reinforcement-learning agents seek to maximize a reward signal through e...
research
10/07/2020

Projection-Based Constrained Policy Optimization

We consider the problem of learning control policies that optimize a rew...

Please sign up or login with your details

Forgot password? Click here to reset