Asynchronous Episodic Deep Deterministic Policy Gradient: Towards Continuous Control in Computationally Complex Environments

03/03/2019
by   Zhizheng Zhang, et al.
0

Deep Deterministic Policy Gradient (DDPG) has been proved to be a successful reinforcement learning (RL) algorithm for continuous control tasks. However, DDPG still suffers from data insufficiency and training inefficiency, especially in computationally complex environments. In this paper, we propose Asynchronous Episodic DDPG (AE-DDPG), as an expansion of DDPG, which can achieve more effective learning with less training time required. First, we design a modified scheme for data collection in an asynchronous fashion. Generally, for asynchronous RL algorithms, sample efficiency or/and training stability diminish as the degree of parallelism increases. We consider this problem from the perspectives of both data generation and data utilization. In detail, we re-design experience replay by introducing the idea of episodic control so that the agent can latch on good trajectories rapidly. In addition, we also inject a new type of noise in action space to enrich the exploration behaviors. Experiments demonstrate that our AE-DDPG achieves higher rewards and requires less time consuming than most popular RL algorithms in Learning to Run task which has a computationally complex environment. Not limited to the control tasks in computationally complex environments, AE-DDPG also achieves higher rewards and 2- to 4-fold improvement in sample efficiency on average compared to other variants of DDPG in MuJoCo environments. Furthermore, we verify the effectiveness of each proposed technique component through abundant ablation study.

READ FULL TEXT

page 1

page 5

page 7

research
12/10/2020

An Efficient Asynchronous Method for Integrating Evolutionary and Gradient-based Policy Search

Deep reinforcement learning (DRL) algorithms and evolution strategies (E...
research
11/12/2021

AWD3: Dynamic Reduction of the Estimation Bias

Value-based deep Reinforcement Learning (RL) algorithms suffer from the ...
research
12/10/2021

Edge-Compatible Reinforcement Learning for Recommendations

Most reinforcement learning (RL) recommendation systems designed for edg...
research
07/20/2023

Exploring reinforcement learning techniques for discrete and continuous control tasks in the MuJoCo environment

We leverage the fast physics simulator, MuJoCo to run tasks in a continu...
research
07/27/2020

Self-Adapting Recurrent Models for Object Pushing from Learning in Simulation

Planar pushing remains a challenging research topic, where building the ...
research
12/17/2020

High-Throughput Synchronous Deep RL

Deep reinforcement learning (RL) is computationally demanding and requir...
research
02/17/2020

Adaptive Experience Selection for Policy Gradient

Policy gradient reinforcement learning (RL) algorithms have achieved imp...

Please sign up or login with your details

Forgot password? Click here to reset