Diverse Policy Optimization for Structured Action Space

02/23/2023
by   Wenhao Li, et al.
0

Enhancing the diversity of policies is beneficial for robustness, exploration, and transfer in reinforcement learning (RL). In this paper, we aim to seek diverse policies in an under-explored setting, namely RL tasks with structured action spaces with the two properties of composability and local dependencies. The complex action structure, non-uniform reward landscape, and subtle hyperparameter tuning due to the properties of structured actions prevent existing approaches from scaling well. We propose a simple and effective RL method, Diverse Policy Optimization (DPO), to model the policies in structured action space as the energy-based models (EBM) by following the probabilistic RL framework. A recently proposed novel and powerful generative model, GFlowNet, is introduced as the efficient, diverse EBM-based policy sampler. DPO follows a joint optimization framework: the outer layer uses the diverse policies sampled by the GFlowNet to update the EBM-based policies, which supports the GFlowNet training in the inner layer. Experiments on ATSC and Battle benchmarks demonstrate that DPO can efficiently discover surprisingly diverse policies in challenging scenarios and substantially outperform existing state-of-the-art methods.

READ FULL TEXT
research
07/25/2023

Counterfactual Explanation Policies in RL

As Reinforcement Learning (RL) agents are increasingly employed in diver...
research
04/08/2021

ACERAC: Efficient reinforcement learning in fine time discretization

We propose a framework for reinforcement learning (RL) in fine time disc...
research
04/18/2022

Training and Evaluation of Deep Policies using Reinforcement Learning and Generative Models

We present a data-efficient framework for solving sequential decision-ma...
research
01/03/2023

Safe Reinforcement Learning for an Energy-Efficient Driver Assistance System

Reinforcement learning (RL)-based driver assistance systems seek to impr...
research
05/30/2023

Generating Behaviorally Diverse Policies with Latent Diffusion Models

Recent progress in Quality Diversity Reinforcement Learning (QD-RL) has ...
research
11/30/2022

Policy Optimization over General State and Action Spaces

Reinforcement learning (RL) problems over general state and action space...
research
05/31/2019

Diversity-Inducing Policy Gradient: Using Maximum Mean Discrepancy to Find a Set of Diverse Policies

Standard reinforcement learning methods aim to master one way of solving...

Please sign up or login with your details

Forgot password? Click here to reset