Policy Gradient for Reinforcement Learning with General Utilities

10/03/2022
by   Navdeep Kumar, et al.
0

In Reinforcement Learning (RL), the goal of agents is to discover an optimal policy that maximizes the expected cumulative rewards. This objective may also be viewed as finding a policy that optimizes a linear function of its state-action occupancy measure, hereafter referred as Linear RL. However, many supervised and unsupervised RL problems are not covered in the Linear RL framework, such as apprenticeship learning, pure exploration and variational intrinsic control, where the objectives are non-linear functions of the occupancy measures. RL with non-linear utilities looks unwieldy, as methods like Bellman equation, value iteration, policy gradient, dynamic programming that had tremendous success in Linear RL, fail to trivially generalize. In this paper, we derive the policy gradient theorem for RL with general utilities. The policy gradient theorem proves to be a cornerstone in Linear RL due to its elegance and ease of implementability. Our policy gradient theorem for RL with general utilities shares the same elegance and ease of implementability. Based on the policy gradient theorem derived, we also present a simple sample-based algorithm. We believe our results will be of interest to the community and offer inspiration to future works in this generalized setting.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/04/2020

Variational Policy Gradient Method for Reinforcement Learning with General Utilities

In recent years, reinforcement learning (RL) systems with general goals ...
research
06/02/2023

Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space

We consider the reinforcement learning (RL) problem with general utiliti...
research
03/08/2021

A Crash Course on Reinforcement Learning

The emerging field of Reinforcement Learning (RL) has led to impressive ...
research
04/12/2019

Similarities between policy gradient methods (PGM) in Reinforcement learning (RL) and supervised learning (SL)

Reinforcement learning (RL) is about sequential decision making and is t...
research
01/19/2023

Advanced Scaling Methods for VNF deployment with Reinforcement Learning

Network function virtualization (NFV) and software-defined network (SDN)...
research
05/28/2021

Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

Many engineering problems have multiple objectives, and the overall aim ...
research
12/14/2020

Policy Gradient RL Algorithms as Directed Acyclic Graphs

Meta Reinforcement Learning (RL) methods focus on automating the design ...

Please sign up or login with your details

Forgot password? Click here to reset