PC-MLP: Model-based Reinforcement Learning with Policy Cover Guided Exploration

07/15/2021
by   Yuda Song, et al.
7

Model-based Reinforcement Learning (RL) is a popular learning paradigm due to its potential sample efficiency compared to model-free RL. However, existing empirical model-based RL approaches lack the ability to explore. This work studies a computationally and statistically efficient model-based algorithm for both Kernelized Nonlinear Regulators (KNR) and linear Markov Decision Processes (MDPs). For both models, our algorithm guarantees polynomial sample complexity and only uses access to a planning oracle. Experimentally, we first demonstrate the flexibility and efficacy of our algorithm on a set of exploration challenging control tasks where existing empirical model-based RL approaches completely fail. We then show that our approach retains excellent performance even in common dense reward control benchmarks that do not require heavy exploration. Finally, we demonstrate that our method can also perform reward-free exploration efficiently. Our code can be found at https://github.com/yudasong/PCMLP.

READ FULL TEXT

page 1

page 9

research
05/01/2019

Efficient Model-free Reinforcement Learning in Metric Spaces

Model-free Reinforcement Learning (RL) algorithms such as Q-learning [Wa...
research
07/16/2020

PC-PG: Policy Cover Directed Exploration for Provable Policy Gradient Learning

Direct policy gradient methods for reinforcement learning are a successf...
research
07/10/2018

Algorithmic Framework for Model-based Reinforcement Learning with Theoretical Guarantees

While model-based reinforcement learning has empirically been shown to s...
research
02/08/2021

Provable Model-based Nonlinear Bandit and Reinforcement Learning: Shelve Optimism, Embrace Virtual Curvature

This paper studies model-based bandit and reinforcement learning (RL) wi...
research
07/26/2019

On Hard Exploration for Reinforcement Learning: a Case Study in Pommerman

How to best explore in domains with sparse, delayed, and deceptive rewar...
research
04/26/2023

FLEX: an Adaptive Exploration Algorithm for Nonlinear Systems

Model-based reinforcement learning is a powerful tool, but collecting da...
research
05/04/2021

Data-Efficient Reinforcement Learning for Malaria Control

Sequential decision-making under cost-sensitive tasks is prohibitively d...

Please sign up or login with your details

Forgot password? Click here to reset