Learning Sampling Policy for Faster Derivative Free Optimization

04/09/2021
by   Zhou Zhai, et al.
0

Zeroth-order (ZO, also known as derivative-free) methods, which estimate the gradient only by two function evaluations, have attracted much attention recently because of its broad applications in machine learning community. The two function evaluations are normally generated with random perturbations from standard Gaussian distribution. To speed up ZO methods, many methods, such as variance reduced stochastic ZO gradients and learning an adaptive Gaussian distribution, have recently been proposed to reduce the variances of ZO gradients. However, it is still an open problem whether there is a space to further improve the convergence of ZO methods. To explore this problem, in this paper, we propose a new reinforcement learning based ZO algorithm (ZO-RL) with learning the sampling policy for generating the perturbations in ZO optimization instead of using random sampling. To find the optimal policy, an actor-critic RL algorithm called deep deterministic policy gradient (DDPG) with two neural network function approximators is adopted. The learned sampling policy guides the perturbed points in the parameter space to estimate a more accurate ZO gradient. To the best of our knowledge, our ZO-RL is the first algorithm to learn the sampling policy using reinforcement learning for ZO optimization which is parallel to the existing methods. Especially, our ZO-RL can be combined with existing ZO algorithms that could further accelerate the algorithms. Experimental results for different ZO optimization problems show that our ZO-RL algorithm can effectively reduce the variances of ZO gradient by learning a sampling policy, and converge faster than existing ZO algorithms in different scenarios.

READ FULL TEXT

page 1

page 2

research
10/19/2021

Neural Network Compatible Off-Policy Natural Actor-Critic Algorithm

Learning optimal behavior from existing data is one of the most importan...
research
03/07/2019

When random search is not enough: Sample-Efficient and Noise-Robust Blackbox Optimization of RL Policies

Interest in derivative-free optimization (DFO) and "evolutionary strateg...
research
06/22/2021

Local policy search with Bayesian optimization

Reinforcement learning (RL) aims to find an optimal policy by interactio...
research
05/29/2019

Linear interpolation gives better gradients than Gaussian smoothing in derivative-free optimization

In this paper, we consider derivative free optimization problems, where ...
research
11/27/2022

Generalizing Gaussian Smoothing for Random Search

Gaussian smoothing (GS) is a derivative-free optimization (DFO) algorith...
research
05/26/2022

Approximate Q-learning and SARSA(0) under the ε-greedy Policy: a Differential Inclusion Analysis

Q-learning and SARSA(0) with linear function approximation, under ϵ-gree...
research
07/25/2018

Backprop-Q: Generalized Backpropagation for Stochastic Computation Graphs

In real-world scenarios, it is appealing to learn a model carrying out s...

Please sign up or login with your details

Forgot password? Click here to reset