Multi-objective evolution for Generalizable Policy Gradient Algorithms

by   Juan Jose Garau Luis, et al.

Performance, generalizability, and stability are three Reinforcement Learning (RL) challenges relevant to many practical applications in which they present themselves in combination. Still, state-of-the-art RL algorithms fall short when addressing multiple RL objectives simultaneously and current human-driven design practices might not be well-suited for multi-objective RL. In this paper we present MetaPG, an evolutionary method that discovers new RL algorithms represented as graphs, following a multi-objective search criteria in which different RL objectives are encoded in separate fitness scores. Our findings show that, when using a graph-based implementation of Soft Actor-Critic (SAC) to initialize the population, our method is able to find new algorithms that improve upon SAC's performance and generalizability by 3 respectively, and reduce instability up to 65 graph structure of the best algorithms in the population and offer an interpretation of specific elements that help trading performance for generalizability and vice versa. We validate our findings in three different continuous control tasks: RWRL Cartpole, RWRL Walker, and Gym Pendulum.


page 1

page 2

page 3

page 4


Generalized Off-Policy Actor-Critic

We propose a new objective, the counterfactual objective, unifying exist...

Q-Pensieve: Boosting Sample Efficiency of Multi-Objective RL Through Memory Sharing of Q-Snapshots

Many real-world continuous control problems are in the dilemma of weighi...

A Scale-Independent Multi-Objective Reinforcement Learning with Convergence Analysis

Many sequential decision-making problems need optimization of different ...

Evolve To Control: Evolution-based Soft Actor-Critic for Scalable Reinforcement Learning

Advances in Reinforcement Learning (RL) have successfully tackled sample...

Rethinking Expected Cumulative Reward Formalism of Reinforcement Learning: A Micro-Objective Perspective

The standard reinforcement learning (RL) formulation considers the expec...

Reinforcement Learning Guided Multi-Objective Exam Paper Generation

To reduce the repetitive and complex work of instructors, exam paper gen...

Please sign up or login with your details

Forgot password? Click here to reset