NoRML: No-Reward Meta Learning

03/04/2019
by   Yuxiang Yang, et al.
28

Efficiently adapting to new environments and changes in dynamics is critical for agents to successfully operate in the real world. Reinforcement learning (RL) based approaches typically rely on external reward feedback for adaptation. However, in many scenarios this reward signal might not be readily available for the target task, or the difference between the environments can be implicit and only observable from the dynamics. To this end, we introduce a method that allows for self-adaptation of learned policies: No-Reward Meta Learning (NoRML). NoRML extends Model Agnostic Meta Learning (MAML) for RL and uses observable dynamics of the environment instead of an explicit reward function in MAML's finetune step. Our method has a more expressive update step than MAML, while maintaining MAML's gradient based foundation. Additionally, in order to allow more targeted exploration, we implement an extension to MAML that effectively disconnects the meta-policy parameters from the fine-tuned policies' parameters. We first study our method on a number of synthetic control problems and then validate our method on common benchmark environments, showing that NoRML outperforms MAML when the dynamics change between tasks.

READ FULL TEXT

page 6

page 8

research
03/02/2020

Rapidly Adaptable Legged Robots via Evolutionary Meta-Learning

Learning adaptable policies is crucial for robots to operate autonomousl...
research
11/19/2019

MANGA: Method Agnostic Neural-policy Generalization and Adaptation

In this paper we target the problem of transferring policies across mult...
research
07/06/2020

Meta-Learning through Hebbian Plasticity in Random Networks

Lifelong learning and adaptability are two defining aspects of biologica...
research
10/09/2020

Characterizing Policy Divergence for Personalized Meta-Reinforcement Learning

Despite ample motivation from costly exploration and limited trajectory ...
research
01/24/2019

SAM: A Modular Framework for Self-Adapting Web Menus

This paper presents SAM, a modular and extensible JavaScript framework f...
research
10/26/2022

Uncertainty-based Meta-Reinforcement Learning for Robust Radar Tracking

Nowadays, Deep Learning (DL) methods often overcome the limitations of t...
research
04/14/2023

Learning to Learn Group Alignment: A Self-Tuning Credo Framework with Multiagent Teams

Mixed incentives among a population with multiagent teams has been shown...

Please sign up or login with your details

Forgot password? Click here to reset