Learning Robust Controllers Via Probabilistic Model-Based Policy Search

10/26/2021
by   Valentin Charvet, et al.
8

Model-based Reinforcement Learning estimates the true environment through a world model in order to approximate the optimal policy. This family of algorithms usually benefits from better sample efficiency than their model-free counterparts. We investigate whether controllers learned in such a way are robust and able to generalize under small perturbations of the environment. Our work is inspired by the PILCO algorithm, a method for probabilistic policy search. We show that enforcing a lower bound to the likelihood noise in the Gaussian Process dynamics model regularizes the policy updates and yields more robust controllers. We demonstrate the empirical benefits of our method in a simulation benchmark.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2020

Overcoming Model Bias for Robust Offline Deep Reinforcement Learning

State-of-the-art reinforcement learning algorithms mostly rely on being ...
research
03/06/2018

Synthesizing Neural Network Controllers with Probabilistic Model based Reinforcement Learning

We present an algorithm for rapidly learning controllers for robotics sy...
research
11/16/2022

Model Based Residual Policy Learning with Applications to Antenna Control

Non-differentiable controllers and rule-based policies are widely used f...
research
10/21/2022

Random Actions vs Random Policies: Bootstrapping Model-Based Direct Policy Search

This paper studies the impact of the initial data gathering method on th...
research
03/08/2017

Model-Based Policy Search for Automatic Tuning of Multivariate PID Controllers

PID control architectures are widely used in industrial applications. De...
research
02/21/2017

Sample Efficient Policy Search for Optimal Stopping Domains

Optimal stopping problems consider the question of deciding when to stop...
research
02/04/2019

PIPPS: Flexible Model-Based Policy Search Robust to the Curse of Chaos

Previously, the exploding gradient problem has been explained to be cent...

Please sign up or login with your details

Forgot password? Click here to reset