Evolutionary Stochastic Policy Distillation

04/27/2020
by   Hao Sun, et al.
9

Solving the Goal-Conditioned Reward Sparse (GCRS) task is a challenging reinforcement learning problem due to the sparsity of reward signals. In this work, we propose a new formulation of GCRS tasks from the perspective of the drifted random walk on the state space, and design a novel method called Evolutionary Stochastic Policy Distillation (ESPD) to solve them based on the insight of reducing the First Hitting Time of the stochastic process. As a self-imitate approach, ESPD enables a target policy to learn from a series of its stochastic variants through the technique of policy distillation (PD). The learning mechanism of ESPD can be considered as an Evolution Strategy (ES) that applies perturbations upon policy directly on the action space, with a SELECT function to check the superiority of stochastic variants and then use PD to update the policy. The experiments based on the MuJoCo robotics control suite show the high learning efficiency of the proposed method.

READ FULL TEXT
research
08/28/2022

Goal-Conditioned Q-Learning as Knowledge Distillation

Many applications of reinforcement learning can be formalized as goal-co...
research
05/24/2023

Inverse Reinforcement Learning with the Average Reward Criterion

We study the problem of Inverse Reinforcement Learning (IRL) with an ave...
research
10/23/2018

Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space

We explore Deep Reinforcement Learning in a parameterized action space. ...
research
10/30/2019

Policy Continuation with Hindsight Inverse Dynamics

Solving goal-oriented tasks is an important but challenging problem in r...
research
08/16/2021

Neural-to-Tree Policy Distillation with Policy Improvement Criterion

While deep reinforcement learning has achieved promising results in chal...
research
06/18/2021

Learning to Plan via a Multi-Step Policy Regression Method

We propose a new approach to increase inference performance in environme...
research
05/17/2021

Learning to Win, Lose and Cooperate through Reward Signal Evolution

Solving a reinforcement learning problem typically involves correctly pr...

Please sign up or login with your details

Forgot password? Click here to reset