A Simple Reward-free Approach to Constrained Reinforcement Learning

by   Sobhan Miryoosefi, et al.

In constrained reinforcement learning (RL), a learning agent seeks to not only optimize the overall reward but also satisfy the additional safety, diversity, or budget constraints. Consequently, existing constrained RL solutions require several new algorithmic ingredients that are notably different from standard RL. On the other hand, reward-free RL is independently developed in the unconstrained literature, which learns the transition dynamics without using the reward information, and thus naturally capable of addressing RL with multiple objectives under the common dynamics. This paper bridges reward-free RL and constrained RL. Particularly, we propose a simple meta-algorithm such that given any reward-free RL oracle, the approachability and constrained RL problems can be directly solved with negligible overheads in sample complexity. Utilizing the existing reward-free RL solvers, our framework provides sharp sample complexity results for constrained RL in the tabular MDP setting, matching the best existing results up to a factor of horizon dependence; our framework directly extends to a setting of tabular two-player Markov games, and gives a new result for constrained RL with linear function approximation.


page 1

page 2

page 3

page 4


On Reward-Free Reinforcement Learning with Linear Function Approximation

Reward-free reinforcement learning (RL) is a framework which is suitable...

Reinforcement Learning with Convex Constraints

In standard reinforcement learning (RL), a learning agent seeks to optim...

Efficient decorrelation of features using Gramian in Reinforcement Learning

Learning good representations is a long standing problem in reinforcemen...

On Reward-Free RL with Kernel and Neural Function Approximations: Single-Agent MDP and Markov Game

To achieve sample efficiency in reinforcement learning (RL), it necessit...

EAGER: Asking and Answering Questions for Automatic Reward Shaping in Language-guided RL

Reinforcement learning (RL) in long horizon and sparse reward tasks is n...

Variance-Based Rewards for Approximate Bayesian Reinforcement Learning

The exploreexploit dilemma is one of the central challenges in Reinforce...

Bilinear Classes: A Structural Framework for Provable Generalization in RL

This work introduces Bilinear Classes, a new structural framework, which...