Policy Representation via Diffusion Probability Model for Reinforcement Learning

05/22/2023
by   Long Yang, et al.
0

Popular reinforcement learning (RL) algorithms tend to produce a unimodal policy distribution, which weakens the expressiveness of complicated policy and decays the ability of exploration. The diffusion probability model is powerful to learn complicated multimodal distributions, which has shown promising and potential applications to RL. In this paper, we formally build a theoretical foundation of policy representation via the diffusion probability model and provide practical implementations of diffusion policy for online model-free RL. Concretely, we character diffusion policy as a stochastic process, which is a new approach to representing a policy. Then we present a convergence guarantee for diffusion policy, which provides a theory to understand the multimodality of diffusion policy. Furthermore, we propose the DIPO which is an implementation for model-free online RL with DIffusion POlicy. To the best of our knowledge, DIPO is the first algorithm to solve model-free online RL problems with the diffusion model. Finally, extensive empirical results show the effectiveness and superiority of DIPO on the standard continuous control Mujoco benchmark.

READ FULL TEXT
research
11/30/2021

Model-Free μ Synthesis via Adversarial Reinforcement Learning

Motivated by the recent empirical success of policy-based reinforcement ...
research
08/12/2022

Diffusion Policies as an Expressive Policy Class for Offline Reinforcement Learning

Offline reinforcement learning (RL), which aims to learn an optimal poli...
research
04/26/2023

Reinforcement Learning with Partial Parametric Model Knowledge

We adapt reinforcement learning (RL) methods for continuous control to b...
research
03/04/2021

Conservative Optimistic Policy Optimization via Multiple Importance Sampling

Reinforcement Learning (RL) has been able to solve hard problems such as...
research
02/23/2023

To the Noise and Back: Diffusion for Shared Autonomy

Shared autonomy is an operational concept in which a user and an autonom...
research
10/10/2018

The Laplacian in RL: Learning Representations with Efficient Approximations

The smallest eigenvectors of the graph Laplacian are well-known to provi...
research
12/11/2018

Efficient Model-Free Reinforcement Learning Using Gaussian Process

Efficient Reinforcement Learning usually takes advantage of demonstratio...

Please sign up or login with your details

Forgot password? Click here to reset