Stepsize Learning for Policy Gradient Methods in Contextual Markov Decision Processes

06/13/2023
by   Luca Sabbioni, et al.
0

Policy-based algorithms are among the most widely adopted techniques in model-free RL, thanks to their strong theoretical groundings and good properties in continuous action spaces. Unfortunately, these methods require precise and problem-specific hyperparameter tuning to achieve good performance, and tend to struggle when asked to accomplish a series of heterogeneous tasks. In particular, the selection of the step size has a crucial impact on their ability to learn a highly performing policy, affecting the speed and the stability of the training process, and often being the main culprit for poor results. In this paper, we tackle these issues with a Meta Reinforcement Learning approach, by introducing a new formulation, known as meta-MDP, that can be used to solve any hyperparameter selection problem in RL with contextual processes. After providing a theoretical Lipschitz bound to the difference of performance in different tasks, we adopt the proposed framework to train a batch RL algorithm to dynamically recommend the most adequate step size for different policies and tasks. In conclusion, we present an experimental campaign to show the advantages of selecting an adaptive learning rate in heterogeneous environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/12/2020

Provably Convergent Policy Gradient Methods for Model-Agnostic Meta-Reinforcement Learning

We consider Model-Agnostic Meta-Learning (MAML) methods for Reinforcemen...
research
10/26/2021

Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

How to select between policies and value functions produced by different...
research
05/10/2018

Metatrace: Online Step-size Tuning by Meta-gradient Descent for Reinforcement Learning Control

Reinforcement learning (RL) has had many successes in both "deep" and "s...
research
06/06/2022

Adaptive Rollout Length for Model-Based RL Using Model-Free Deep RL

Model-based reinforcement learning promises to learn an optimal policy f...
research
04/18/2020

Time Adaptive Reinforcement Learning

Reinforcement learning (RL) allows to solve complex tasks such as Go oft...
research
02/07/2020

Bayesian Residual Policy Optimization: Scalable Bayesian Reinforcement Learning with Clairvoyant Experts

Informed and robust decision making in the face of uncertainty is critic...
research
02/27/2019

Lipschitz Adaptivity with Multiple Learning Rates in Online Learning

We aim to design adaptive online learning algorithms that take advantage...

Please sign up or login with your details

Forgot password? Click here to reset