Learning Memory-Dependent Continuous Control from Demonstrations

02/18/2021
by   Siqing Hou, et al.
0

Efficient exploration has presented a long-standing challenge in reinforcement learning, especially when rewards are sparse. A developmental system can overcome this difficulty by learning from both demonstrations and self-exploration. However, existing methods are not applicable to most real-world robotic controlling problems because they assume that environments follow Markov decision processes (MDP); thus, they do not extend to partially observable environments where historical observations are necessary for decision making. This paper builds on the idea of replaying demonstrations for memory-dependent continuous control, by proposing a novel algorithm, Recurrent Actor-Critic with Demonstration and Experience Replay (READER). Experiments involving several memory-crucial continuous control tasks reveal significantly reduce interactions with the environment using our method with a reasonably small number of demonstration samples. The algorithm also shows better sample efficiency and learning capabilities than a baseline reinforcement learning algorithm for memory-based control from demonstrations.

READ FULL TEXT
research
02/14/2018

Reinforcement Learning from Imperfect Demonstrations

Robust real-world learning should benefit from both demonstrations and i...
research
02/27/2020

Exploration-efficient Deep Reinforcement Learning with Demonstration Guidance for Robot Control

Although deep reinforcement learning (DRL) algorithms have made importan...
research
06/04/2017

Actor-Critic for Linearly-Solvable Continuous MDP with Partially Known Dynamics

In many robotic applications, some aspects of the system dynamics can be...
research
09/03/2019

Making Efficient Use of Demonstrations to Solve Hard Exploration Problems

This paper introduces R2D3, an agent that makes efficient use of demonst...
research
11/20/2017

Is prioritized sweeping the better episodic control?

Episodic control has been proposed as a third approach to reinforcement ...
research
06/12/2020

Continuous Control for Searching and Planning with a Learned Model

Decision-making agents with planning capabilities have achieved huge suc...
research
03/02/2023

PLUNDER: Probabilistic Program Synthesis for Learning from Unlabeled and Noisy Demonstrations

Learning from demonstration (LfD) is a widely researched paradigm for te...

Please sign up or login with your details

Forgot password? Click here to reset