Mixing Human Demonstrations with Self-Exploration in Experience Replay for Deep Reinforcement Learning

07/14/2021
by   Dylan Klein, et al.
0

We investigate the effect of using human demonstration data in the replay buffer for Deep Reinforcement Learning. We use a policy gradient method with a modified experience replay buffer where a human demonstration experience is sampled with a given probability. We analyze different ratios of using demonstration data in a task where an agent attempts to reach a goal while avoiding obstacles. Our results suggest that while the agents trained by pure self-exploration and pure demonstration had similar success rates, the pure demonstration model converged faster to solutions with less number of steps.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset