Cautious Bayesian Optimization for Efficient and Scalable Policy Search

11/18/2020
by   Lukas P. Fröhlich, et al.
0

Sample efficiency is one of the key factors when applying policy search to real-world problems. In recent years, Bayesian Optimization (BO) has become prominent in the field of robotics due to its sample efficiency and little prior knowledge needed. However, one drawback of BO is its poor performance on high-dimensional search spaces as it focuses on global search. In the policy search setting, local optimization is typically sufficient as initial policies are often available, e.g., via meta-learning, kinesthetic demonstrations or sim-to-real approaches. In this paper, we propose to constrain the policy search space to a sublevel-set of the Bayesian surrogate model's predictive uncertainty. This simple yet effective way of constraining the policy update enables BO to scale to high-dimensional spaces (>100) as well as reduces the risk of damaging the system. We demonstrate the effectiveness of our approach on a wide range of problems, including a motor skills task, adapting deep RL agents to new reward signals and a sim-to-real task for an inverted pendulum system.

READ FULL TEXT

page 5

page 17

research
09/22/2021

Multi-Objective Bayesian Optimization over High-Dimensional Search Spaces

The ability to optimize multiple competing objective functions with high...
research
07/06/2018

A survey on policy search algorithms for learning robot controllers in a handful of trials

Most policy search algorithms require thousands of training episodes to ...
research
03/02/2020

Robust Policy Search for Robot Navigation with Stochastic Meta-Policies

Bayesian optimization is an efficient nonlinear optimization method wher...
research
01/21/2020

Bayesian Optimization for Policy Search in High-Dimensional Systems via Automatic Domain Selection

Bayesian Optimization (BO) is an effective method for optimizing expensi...
research
04/16/2020

Divergent Search for Few-Shot Image Classification

When data is unlabelled and the target task is not known a priori, diver...
research
12/06/2016

Factored Contextual Policy Search with Bayesian Optimization

Scarce data is a major challenge to scaling robot learning to truly comp...
research
04/28/2019

Learning walk and trot from the same objective using different types of exploration

In quadruped gait learning, policy search methods that scale high dimens...

Please sign up or login with your details

Forgot password? Click here to reset