A Probabilistic Interpretation of Self-Paced Learning with Applications to Reinforcement Learning

by   Pascal Klink, et al.

Across machine learning, the use of curricula has shown strong empirical potential to improve learning from data by avoiding local optima of training objectives. For reinforcement learning (RL), curricula are especially interesting, as the underlying optimization has a strong tendency to get stuck in local optima due to the exploration-exploitation trade-off. Recently, a number of approaches for an automatic generation of curricula for RL have been shown to increase performance while requiring less expert knowledge compared to manually designed curricula. However, these approaches are seldomly investigated from a theoretical perspective, preventing a deeper understanding of their mechanics. In this paper, we present an approach for automated curriculum generation in RL with a clear theoretical underpinning. More precisely, we formalize the well-known self-paced learning paradigm as inducing a distribution over training tasks, which trades off between task complexity and the objective to match a desired task distribution. Experiments show that training on this induced distribution helps to avoid poor local optima across RL algorithms in different tasks with uninformative rewards and challenging exploration requirements.


page 16

page 17

page 20

page 39

page 41


Self-Paced Deep Reinforcement Learning

Generalization and reuse of agent behaviour across a variety of learning...

Parallelized Reverse Curriculum Generation

For reinforcement learning (RL), it is challenging for an agent to maste...

Unsupervised Curricula for Visual Meta-Reinforcement Learning

In principle, meta-reinforcement learning algorithms leverage experience...

PBCS : Efficient Exploration and Exploitation Using a Synergy between Reinforcement Learning and Motion Planning

The exploration-exploitation trade-off is at the heart of reinforcement ...

Making Sense of Reinforcement Learning and Probabilistic Inference

Reinforcement learning (RL) combines a control problem with statistical ...

Distribution-Free One-Pass Learning

In many large-scale machine learning applications, data are accumulated ...

Nested-Wasserstein Self-Imitation Learning for Sequence Generation

Reinforcement learning (RL) has been widely studied for improving sequen...