Qualitative Differences Between Evolutionary Strategies and Reinforcement Learning Methods for Control of Autonomous Agents

05/16/2022
by   Nicola Milano, et al.
0

In this paper we analyze the qualitative differences between evolutionary strategies and reinforcement learning algorithms by focusing on two popular state-of-the-art algorithms: the OpenAI-ES evolutionary strategy and the Proximal Policy Optimization (PPO) reinforcement learning algorithm – the most similar methods of the two families. We analyze how the methods differ with respect to: (i) general efficacy, (ii) ability to cope with sparse rewards, (iii) propensity/capacity to discover minimal solutions, (iv) dependency on reward shaping, and (v) ability to cope with variations of the environmental conditions. The analysis of the performance and of the behavioral strategies displayed by the agents trained with the two methods on benchmark problems enable us to demonstrate qualitative differences which were not identified in previous studies, to identify the relative weakness of the two methods, and to propose ways to ameliorate some of those weakness. We show that the characteristics of the reward function has a strong impact which vary qualitatively not only for the OpenAI-ES and the PPO but also for alternative reinforcement learning algorithms, thus demonstrating the importance of optimizing the characteristic of the reward function to the algorithm used.

READ FULL TEXT

page 4

page 13

research
12/11/2019

Efficacy of Modern Neuro-Evolutionary Strategies for Continuous Control Optimization

We analyze the efficacy of modern neuro-evolutionary strategies for cont...
research
11/26/2021

Learning Long-Term Reward Redistribution via Randomized Return Decomposition

Many practical applications of reinforcement learning require agents to ...
research
12/17/2021

Contrastive Explanations for Comparing Preferences of Reinforcement Learning Agents

In complex tasks where the reward function is not straightforward and co...
research
09/15/2020

Autonomous Learning of Features for Control: Experiments with Embodied and Situated Agents

As discussed in previous studies, the efficacy of evolutionary or reinfo...
research
06/01/2023

Effect of Monetary Reward on Users' Individual Strategies Using Co-Evolutionary Learning

Consumer generated media (CGM), such as social networking services rely ...
research
08/26/2020

Assessment of Reward Functions for Reinforcement Learning Traffic Signal Control under Real-World Limitations

Adaptive traffic signal control is one key avenue for mitigating the gro...
research
02/24/2016

Learning values across many orders of magnitude

Most learning algorithms are not invariant to the scale of the function ...

Please sign up or login with your details

Forgot password? Click here to reset