Reinforcement Learning with General Utilities: Simpler Variance Reduction and Large State-Action Space

06/02/2023
by   Anas Barakat, et al.
0

We consider the reinforcement learning (RL) problem with general utilities which consists in maximizing a function of the state-action occupancy measure. Beyond the standard cumulative reward RL setting, this problem includes as particular cases constrained RL, pure exploration and learning from demonstrations among others. For this problem, we propose a simpler single-loop parameter-free normalized policy gradient algorithm. Implementing a recursive momentum variance reduction mechanism, our algorithm achieves 𝒪̃(ϵ^-3) and 𝒪̃(ϵ^-2) sample complexities for ϵ-first-order stationarity and ϵ-global optimality respectively, under adequate assumptions. We further address the setting of large finite state action spaces via linear function approximation of the occupancy measure and show a 𝒪̃(ϵ^-4) sample complexity for a simple policy gradient method with a linear regression subroutine.

READ FULL TEXT
research
10/03/2022

Policy Gradient for Reinforcement Learning with General Utilities

In Reinforcement Learning (RL), the goal of agents is to discover an opt...
research
06/24/2019

Ranking Policy Gradient

Sample inefficiency is a long-lasting problem in reinforcement learning ...
research
11/30/2022

Policy Optimization over General State and Action Spaces

Reinforcement learning (RL) problems over general state and action space...
research
11/29/2022

Approximating Martingale Process for Variance Reduction in Deep Reinforcement Learning with Large State Space

Approximating Martingale Process (AMP) is proven to be effective for var...
research
07/25/2023

Submodular Reinforcement Learning

In reinforcement learning (RL), rewards of states are typically consider...
research
11/19/2019

Efficient decorrelation of features using Gramian in Reinforcement Learning

Learning good representations is a long standing problem in reinforcemen...
research
11/28/2016

Improving Policy Gradient by Exploring Under-appreciated Rewards

This paper presents a novel form of policy gradient for model-free reinf...

Please sign up or login with your details

Forgot password? Click here to reset