Average-Constrained Policy Optimization

02/02/2023
by   Akhil Agnihotri, et al.
0

Reinforcement Learning (RL) with constraints is becoming an increasingly important problem for various applications. Often, the average criterion is more suitable. Yet, RL for average criterion-constrained MDPs remains a challenging problem. Algorithms designed for discounted constrained RL problems often do not perform well for the average CMDP setting. In this paper, we introduce a new (possibly the first) policy optimization algorithm for constrained MDPs with the average criterion. The Average-Constrained Policy Optimization (ACPO) algorithm is inspired by the famed PPO-type algorithms based on trust region methods. We develop basic sensitivity theory for average MDPs, and then use the corresponding bounds in the design of the algorithm. We provide theoretical guarantees on its performance, and through extensive experimental work in various challenging MuJoCo environments, show the superior performance of the algorithm when compared to other state-of-the-art algorithms adapted for the average CMDP setting.

READ FULL TEXT

page 7

page 8

page 18

research
05/13/2023

Delay-Adapted Policy Optimization and Improved Regret for Adversarial MDP with Delayed Bandit Feedback

Policy Optimization (PO) is one of the most popular methods in Reinforce...
research
10/10/2020

Robust Constrained-MDPs: Soft-Constrained Robust Policy Optimization under Model Uncertainty

In this paper, we focus on the problem of robustifying reinforcement lea...
research
02/02/2023

Performance Bounds for Policy-Based Average Reward Reinforcement Learning Algorithms

Many policy-based reinforcement learning (RL) algorithms can be viewed a...
research
09/06/2019

Adaptive Trust Region Policy Optimization: Global Convergence and Faster Rates for Regularized MDPs

Trust region policy optimization (TRPO) is a popular and empirically suc...
research
07/16/2021

Refined Policy Improvement Bounds for MDPs

The policy improvement bound on the difference of the discounted returns...
research
06/07/2021

Average-Reward Reinforcement Learning with Trust Region Methods

Most of reinforcement learning algorithms optimize the discounted criter...
research
10/15/2022

When to Update Your Model: Constrained Model-based Reinforcement Learning

Designing and analyzing model-based RL (MBRL) algorithms with guaranteed...

Please sign up or login with your details

Forgot password? Click here to reset