Synthesizing Programmatic Policies with Actor-Critic Algorithms and ReLU Networks

08/04/2023
by   Spyros Orfanos, et al.
0

Programmatically Interpretable Reinforcement Learning (PIRL) encodes policies in human-readable computer programs. Novel algorithms were recently introduced with the goal of handling the lack of gradient signal to guide the search in the space of programmatic policies. Most of such PIRL algorithms first train a neural policy that is used as an oracle to guide the search in the programmatic space. In this paper, we show that such PIRL-specific algorithms are not needed, depending on the language used to encode the programmatic policies. This is because one can use actor-critic algorithms to directly obtain a programmatic policy. We use a connection between ReLU neural networks and oblique decision trees to translate the policy learned with actor-critic algorithms into programmatic policies. This translation from ReLU networks allows us to synthesize policies encoded in programs with if-then-else structures, linear transformations of the input values, and PID operations. Empirical results on several control problems show that this translation approach is capable of learning short and effective policies. Moreover, the translated policies are at least competitive and often far superior to the policies PIRL algorithms synthesize.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/02/2020

Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER

We prove under commonly used assumptions the convergence of actor-critic...
research
01/29/2022

Zeroth-Order Actor-Critic

Zeroth-order optimization methods and policy gradient based first-order ...
research
05/25/2021

Unbiased Asymmetric Actor-Critic for Partially Observable Reinforcement Learning

In partially observable reinforcement learning, offline training gives a...
research
09/08/2023

Actor critic learning algorithms for mean-field control with moment neural networks

We develop a new policy gradient and actor-critic algorithm for solving ...
research
07/19/2022

Actor-Critic based Improper Reinforcement Learning

We consider an improper reinforcement learning setting where a learner i...
research
02/03/2023

Two-Stage Constrained Actor-Critic for Short Video Recommendation

The wide popularity of short videos on social media poses new opportunit...
research
04/21/2022

PG3: Policy-Guided Planning for Generalized Policy Generation

A longstanding objective in classical planning is to synthesize policies...

Please sign up or login with your details

Forgot password? Click here to reset