DeepAI AI Chat
Log In Sign Up

Backward Curriculum Reinforcement Learning

by   KyungMin Ko, et al.

The current reinforcement learning algorithm uses forward-generated trajectories to train the agent. The forward-generated trajectories give the agent little guidance, so the agent can explore as much as possible. While the appreciation of reinforcement learning comes from enough exploration, this gives the trade-off of losing sample efficiency. The sampling efficiency is an important factor that decides the performance of the algorithm. Past tasks use reward shaping techniques and changing the structure of the network to increase sample efficiency, however these methods require many steps to implement. In this work, we propose novel reverse curriculum reinforcement learning. Reverse curriculum learning starts training the agent using the backward trajectory of the episode rather than the original forward trajectory. This gives the agent a strong reward signal, so the agent can learn in a more sample-efficient manner. Moreover, our method only requires a minor change in algorithm, which is reversing the order of trajectory before training the agent. Therefore, it can be simply applied to any state-of-art algorithms.


page 1

page 2

page 3

page 4


Backward Imitation and Forward Reinforcement Learning via Bi-directional Model Rollouts

Traditional model-based reinforcement learning (RL) methods generate for...

Reverse Curriculum Generation for Reinforcement Learning

Many relevant tasks require an agent to reach a certain state, or to man...

Trajectory-based Learning for Ball-in-Maze Games

Deep Reinforcement Learning has shown tremendous success in solving seve...

Solving Sokoban with backward reinforcement learning

In some puzzles, the strategy we need to use near the goal can be quite ...

Robust Reinforcement Learning via Genetic Curriculum

Achieving robust performance is crucial when applying deep reinforcement...

Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update

We propose Episodic Backward Update - a new algorithm to boost the perfo...

Self-Paced Contextual Reinforcement Learning

Generalization and adaptation of learned skills to novel situations is a...