Provably Efficient Model-Free Constrained RL with Linear Function Approximation

06/23/2022
∙
by   Arnob Ghosh, et al.
∙
0
∙

We study the constrained reinforcement learning problem, in which an agent aims to maximize the expected cumulative reward subject to a constraint on the expected total value of a utility function. In contrast to existing model-based approaches or model-free methods accompanied with a `simulator', we aim to develop the first model-free, simulator-free algorithm that achieves a sublinear regret and a sublinear constraint violation even in large-scale systems. To this end, we consider the episodic constrained Markov decision processes with linear function approximation, where the transition dynamics and the reward function can be represented as a linear function of some known feature mapping. We show that 𝒊Ėƒ(√(d^3H^3T)) regret and 𝒊Ėƒ(√(d^3H^3T)) constraint violation bounds can be achieved, where d is the dimension of the feature mapping, H is the length of the episode, and T is the total number of steps. Our bounds are attained without explicitly estimating the unknown transition model or requiring a simulator, and they depend on the state space only through the dimension of the feature mapping. Hence our bounds hold even when the number of states goes to infinity. Our main results are achieved via novel adaptations of the standard LSVI-UCB algorithms. In particular, we first introduce primal-dual optimization into the LSVI-UCB algorithm to balance between regret and constraint violation. More importantly, we replace the standard greedy selection with respect to the state-action function in LSVI-UCB with a soft-max policy. This turns out to be key in establishing uniform concentration for the constrained case via its approximation-smoothness trade-off. We also show that one can achieve an even zero constraint violation while still maintaining the same order with respect to T.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
∙ 06/03/2021

A Provably-Efficient Model-Free Algorithm for Constrained Markov Decision Processes

This paper presents the first model-free, simulator-free reinforcement l...
research
∙ 03/01/2020

Provably Efficient Safe Exploration via Primal-Dual Policy Optimization

We study the Safe Reinforcement Learning (SRL) problem using the Constra...
research
∙ 11/28/2022

Provably Efficient Model-free RL in Leader-Follower MDP with Linear Function Approximation

We consider a multi-agent episodic MDP setup where an agent (leader) tak...
research
∙ 03/10/2023

Provably Efficient Model-Free Algorithms for Non-stationary CMDPs

We study model-free reinforcement learning (RL) algorithms in episodic n...
research
∙ 01/06/2021

Provably Efficient Reinforcement Learning with Linear Function Approximation Under Adaptivity Constraints

We study reinforcement learning (RL) with linear function approximation ...
research
∙ 05/31/2023

Provably Efficient Generalized Lagrangian Policy Optimization for Safe Multi-Agent Reinforcement Learning

We examine online safe multi-agent reinforcement learning using constrai...
research
∙ 01/06/2023

Provable Reset-free Reinforcement Learning by No-Regret Reduction

Real-world reinforcement learning (RL) is often severely limited since t...

Please sign up or login with your details

Forgot password? Click here to reset