Zeroth-Order Actor-Critic

01/29/2022
by   Yuheng Lei, et al.
0

Zeroth-order optimization methods and policy gradient based first-order methods are two promising alternatives to solve reinforcement learning (RL) problems with complementary advantages. The former work with arbitrary policies, drive state-dependent and temporally-extended exploration, possess robustness-seeking property, but suffer from high sample complexity, while the latter are more sample efficient but restricted to differentiable policies and the learned policies are less robust. We propose Zeroth-Order Actor-Critic algorithm (ZOAC) that unifies these two methods into an on-policy actor-critic architecture to preserve the advantages from both. ZOAC conducts rollouts collection with timestep-wise perturbation in parameter space, first-order policy evaluation (PEV) and zeroth-order policy improvement (PIM) alternately in each iteration. We evaluate our proposed method on a range of challenging continuous control benchmarks using different types of policies, where ZOAC outperforms zeroth-order and first-order baseline algorithms.

READ FULL TEXT
research
05/04/2020

A Finite Time Analysis of Two Time-Scale Actor Critic Methods

Actor-critic (AC) methods have exhibited great empirical success compare...
research
11/30/2022

Efficient Reinforcement Learning (ERL): Targeted Exploration Through Action Saturation

Reinforcement Learning (RL) generally suffers from poor sample complexit...
research
11/01/2021

Learning Large Neighborhood Search Policy for Integer Programming

We propose a deep reinforcement learning (RL) method to learn large neig...
research
08/04/2023

Synthesizing Programmatic Policies with Actor-Critic Algorithms and ReLU Networks

Programmatically Interpretable Reinforcement Learning (PIRL) encodes pol...
research
09/11/2022

Performance-Driven Controller Tuning via Derivative-Free Reinforcement Learning

Choosing an appropriate parameter set for the designed controller is cri...
research
02/08/2022

Approximating Gradients for Differentiable Quality Diversity in Reinforcement Learning

Consider a walking agent that must adapt to damage. To approach this tas...
research
10/03/2022

Latent State Marginalization as a Low-cost Approach for Improving Exploration

While the maximum entropy (MaxEnt) reinforcement learning (RL) framework...

Please sign up or login with your details

Forgot password? Click here to reset