Constrained Reinforcement Learning via Dissipative Saddle Flow Dynamics

12/03/2022
by   Tianqi Zheng, et al.
0

In constrained reinforcement learning (C-RL), an agent seeks to learn from the environment a policy that maximizes the expected cumulative reward while satisfying minimum requirements in secondary cumulative reward constraints. Several algorithms rooted in sampled-based primal-dual methods have been recently proposed to solve this problem in policy space. However, such methods are based on stochastic gradient descent ascent algorithms whose trajectories are connected to the optimal policy only after a mixing output stage that depends on the algorithm's history. As a result, there is a mismatch between the behavioral policy and the optimal one. In this work, we propose a novel algorithm for constrained RL that does not suffer from these limitations. Leveraging recent results on regularized saddle-flow dynamics, we develop a novel stochastic gradient descent-ascent algorithm whose trajectories converge to the optimal policy almost surely.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2022

Stochastic Gradient Descent with Dependent Data for Offline Reinforcement Learning

In reinforcement learning (RL), offline learning decoupled learning from...
research
02/23/2021

State Augmented Constrained Reinforcement Learning: Overcoming the Limitations of Learning with Rewards

Constrained reinforcement learning involves multiple rewards that must i...
research
05/28/2021

Improving Generalization in Mountain Car Through the Partitioned Parameterized Policy Approach via Quasi-Stochastic Gradient Descent

The reinforcement learning problem of finding a control policy that mini...
research
09/15/2020

Soft policy optimization using dual-track advantage estimator

In reinforcement learning (RL), we always expect the agent to explore as...
research
06/02/2020

Learning optimal environments using projected stochastic gradient ascent

In this work, we generalize the direct policy search algorithms to an al...
research
08/23/2018

Playing 20 Question Game with Policy-Based Reinforcement Learning

The 20 Questions (Q20) game is a well known game which encourages deduct...
research
04/12/2021

An Efficient Algorithm for Deep Stochastic Contextual Bandits

In stochastic contextual bandit (SCB) problems, an agent selects an acti...

Please sign up or login with your details

Forgot password? Click here to reset