Bottom-Up Meta-Policy Search

10/22/2019
by   Luckeciano C. Melo, et al.
0

Despite of the recent progress in agents that learn through interaction, there are several challenges in terms of sample efficiency and generalization across unseen behaviors during training. To mitigate these problems, we propose and apply a first-order Meta-Learning algorithm called Bottom-Up Meta-Policy Search (BUMPS), which works with two-phase optimization procedure: firstly, in a meta-training phase, it distills few expert policies to create a meta-policy capable of generalizing knowledge to unseen tasks during training; secondly, it applies a fast adaptation strategy named Policy Filtering, which evaluates few policies sampled from the meta-policy distribution and selects which best solves the task. We conducted all experiments in the RoboCup 3D Soccer Simulation domain, in the context of kick motion learning. We show that, given our experimental setup, BUMPS works in scenarios where simple multi-task Reinforcement Learning does not. Finally, we performed experiments in a way to evaluate each component of the algorithm.

READ FULL TEXT
research
09/30/2019

Meta-Q-Learning

This paper introduces Meta-Q-Learning (MQL), a new off-policy algorithm ...
research
09/07/2018

Learning Invariances for Policy Generalization

While recent progress has spawned very powerful machine learning systems...
research
05/20/2023

On First-Order Meta-Reinforcement Learning with Moreau Envelopes

Meta-Reinforcement Learning (MRL) is a promising framework for training ...
research
04/05/2022

Model Based Meta Learning of Critics for Policy Gradients

Being able to seamlessly generalize across different tasks is fundamenta...
research
04/29/2023

Meta-Reinforcement Learning Based on Self-Supervised Task Representation Learning

Meta-reinforcement learning enables artificial agents to learn from rela...
research
10/16/2018

ProMP: Proximal Meta-Policy Search

Credit assignment in Meta-reinforcement learning (Meta-RL) is still poor...
research
11/04/2017

Composing Meta-Policies for Autonomous Driving Using Hierarchical Deep Reinforcement Learning

Rather than learning new control policies for each new task, it is possi...

Please sign up or login with your details

Forgot password? Click here to reset