Learning a subspace of policies for online adaptation in Reinforcement Learning

10/11/2021
by   Jean-Baptiste Gaya, et al.
0

Deep Reinforcement Learning (RL) is mainly studied in a setting where the training and the testing environments are similar. But in many practical applications, these environments may differ. For instance, in control systems, the robot(s) on which a policy is learned might differ from the robot(s) on which a policy will run. It can be caused by different internal factors (e.g., calibration issues, system attrition, defective modules) or also by external changes (e.g., weather conditions). There is a need to develop RL methods that generalize well to variations of the training conditions. In this article, we consider the simplest yet hard to tackle generalization setting where the test environment is unknown at train time, forcing the agent to adapt to the system's new dynamics. This online adaptation process can be computationally expensive (e.g., fine-tuning) and cannot rely on meta-RL techniques since there is just a single train environment. To do so, we propose an approach where we learn a subspace of policies within the parameter space. This subspace contains an infinite number of policies that are trained to solve the training environment while having different parameter values. As a consequence, two policies in that subspace process information differently and exhibit different behaviors when facing variations of the train environment. Our experiments carried out over a large variety of benchmarks compare our approach with baselines, including diversity-based methods. In comparison, our approach is simple to tune, does not need any extra component (e.g., discriminator) and learns policies able to gather a high reward on unseen environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/06/2022

Self-Adaptive Driving in Nonstationary Environments through Conjectural Online Lookahead Adaptation

Powered by deep representation learning, reinforcement learning (RL) pro...
research
06/23/2023

Comparing the Efficacy of Fine-Tuning and Meta-Learning for Few-Shot Policy Imitation

In this paper we explore few-shot imitation learning for control problem...
research
10/27/2020

One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL

While reinforcement learning algorithms can learn effective policies for...
research
06/26/2022

Improving Policy Optimization with Generalist-Specialist Learning

Generalization in deep reinforcement learning over unseen environment va...
research
04/21/2020

Efficient Adaptation for End-to-End Vision-Based Robotic Manipulation

One of the great promises of robot learning systems is that they will be...
research
09/24/2022

Open-Ended Diverse Solution Discovery with Regulated Behavior Patterns for Cross-Domain Adaptation

While Reinforcement Learning can achieve impressive results for complex ...
research
10/17/2017

Map-based Multi-Policy Reinforcement Learning: Enhancing Adaptability of Robots by Deep Reinforcement Learning

In order for robots to perform mission-critical tasks, it is essential t...

Please sign up or login with your details

Forgot password? Click here to reset