A Policy Efficient Reduction Approach to Convex Constrained Deep Reinforcement Learning

08/29/2021
by   Tianchi Cai, et al.
0

Although well-established in general reinforcement learning (RL), value-based methods are rarely explored in constrained RL (CRL) for their incapability of finding policies that can randomize among multiple actions. To apply value-based methods to CRL, a recent groundbreaking line of game-theoretic approaches uses the mixed policy that randomizes among a set of carefully generated policies to converge to the desired constraint-satisfying policy. However, these approaches require storing a large set of policies, which is not policy efficient, and may incur prohibitive memory costs in constrained deep RL. To address this problem, we propose an alternative approach. Our approach first reformulates the CRL to an equivalent distance optimization problem. With a specially designed linear optimization oracle, we derive a meta-algorithm that solves it using any off-the-shelf RL algorithm and any conditional gradient (CG) type algorithm as subroutines. We then propose a new variant of the CG-type algorithm, which generalizes the minimum norm point (MNP) method. The proposed method matches the convergence rate of the existing game-theoretic approaches and achieves the worst-case optimal policy efficiency. The experiments on a navigation task show that our method reduces the memory costs by an order of magnitude, and meanwhile achieves better performance, demonstrating both its effectiveness and efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/14/2020

Optimistic Distributionally Robust Policy Optimization

Trust Region Policy Optimization (TRPO) and Proximal Policy Optimization...
research
09/06/2023

Marketing Budget Allocation with Offline Constrained Deep Reinforcement Learning

We study the budget allocation problem in online marketing campaigns tha...
research
06/03/2021

Iterative Empirical Game Solving via Single Policy Best Response

Policy-Space Response Oracles (PSRO) is a general algorithmic framework ...
research
02/03/2023

Distributional constrained reinforcement learning for supply chain optimization

This work studies reinforcement learning (RL) in the context of multi-pe...
research
05/30/2019

Don't Forget Your Teacher: A Corrective Reinforcement Learning Framework

Although reinforcement learning (RL) can provide reliable solutions in m...
research
01/22/2018

Get Your Workload in Order: Game Theoretic Prioritization of Database Auditing

For enhancing the privacy protections of databases, where the increasing...
research
03/02/2021

NavTuner: Learning a Scene-Sensitive Family of Navigation Policies

The advent of deep learning has inspired research into end-to-end learni...

Please sign up or login with your details

Forgot password? Click here to reset