Anchor-Changing Regularized Natural Policy Gradient for Multi-Objective Reinforcement Learning

06/10/2022
by   Ruida Zhou, et al.
7

We study policy optimization for Markov decision processes (MDPs) with multiple reward value functions, which are to be jointly optimized according to given criteria such as proportional fairness (smooth concave scalarization), hard constraints (constrained MDP), and max-min trade-off. We propose an Anchor-changing Regularized Natural Policy Gradient (ARNPG) framework, which can systematically incorporate ideas from well-performing first-order methods into the design of policy optimization algorithms for multi-objective MDP problems. Theoretically, the designed algorithms based on the ARNPG framework achieve Õ(1/T) global convergence with exact gradients. Empirically, the ARNPG-guided algorithms also demonstrate superior performance compared to some existing policy gradient-based approaches in both exact gradients and sample-based scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2023

Regularized Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity

This paper focuses on reinforcement learning for the regularized robust ...
research
07/13/2020

Fast Global Convergence of Natural Policy Gradient Methods with Entropy Regularization

Natural policy gradient (NPG) methods are among the most widely used pol...
research
09/15/2022

Multi-Objective Policy Gradients with Topological Constraints

Multi-objective optimization models that encode ordered sequential const...
research
06/14/2018

Stochastic Variance-Reduced Policy Gradient

In this paper, we propose a novel reinforcement- learning algorithm cons...
research
06/24/2020

When Will Generative Adversarial Imitation Learning Algorithms Attain Global Convergence

Generative adversarial imitation learning (GAIL) is a popular inverse re...
research
08/22/2023

Robust Lagrangian and Adversarial Policy Gradient for Robust Constrained Markov Decision Processes

The robust constrained Markov decision process (RCMDP) is a recent task-...
research
11/28/2001

Gradient-based Reinforcement Planning in Policy-Search Methods

We introduce a learning method called "gradient-based reinforcement plan...

Please sign up or login with your details

Forgot password? Click here to reset