Local policy search with Bayesian optimization

06/22/2021
by   Sarah Müller, et al.
0

Reinforcement learning (RL) aims to find an optimal policy by interaction with an environment. Consequently, learning complex behavior requires a vast number of samples, which can be prohibitive in practice. Nevertheless, instead of systematically reasoning and actively choosing informative samples, policy gradients for local search are often obtained from random perturbations. These random samples yield high variance estimates and hence are sub-optimal in terms of sample complexity. Actively selecting informative samples is at the core of Bayesian optimization, which constructs a probabilistic surrogate of the objective from past samples to reason about informative subsequent ones. In this paper, we propose to join both worlds. We develop an algorithm utilizing a probabilistic model of the objective function and its gradient. Based on the model, the algorithm decides where to query a noisy zeroth-order oracle to improve the gradient estimates. The resulting algorithm is a novel type of policy search method, which we compare to existing black-box algorithms. The comparison reveals improved sample complexity and reduced variance in extensive empirical evaluations on synthetic objectives. Further, we highlight the benefits of active sampling on popular RL benchmarks.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/09/2021

Learning Sampling Policy for Faster Derivative Free Optimization

Zeroth-order (ZO, also known as derivative-free) methods, which estimate...
research
09/14/2020

Variance-Reduced Off-Policy Memory-Efficient Policy Search

Off-policy policy optimization is a challenging problem in reinforcement...
research
03/02/2020

Robust Policy Search for Robot Navigation with Stochastic Meta-Policies

Bayesian optimization is an efficient nonlinear optimization method wher...
research
03/30/2021

Greedy-GQ with Variance Reduction: Finite-time Analysis and Improved Complexity

Greedy-GQ is a value-based reinforcement learning (RL) algorithm for opt...
research
06/19/2023

Practical First-Order Bayesian Optimization Algorithms

First Order Bayesian Optimization (FOBO) is a sample efficient sequentia...
research
10/21/2022

Local Bayesian optimization via maximizing probability of descent

Local optimization presents a promising approach to expensive, high-dime...
research
06/13/2012

Improving Gradient Estimation by Incorporating Sensor Data

An efficient policy search algorithm should estimate the local gradient ...

Please sign up or login with your details

Forgot password? Click here to reset