Solving the scalarization issues of Advantage-based Reinforcement Learning Algorithms

04/08/2020
by   Federico A. Galatolo, et al.
0

In this paper we investigate some of the issues that arise from the scalarization of the multi-objective optimization problem in the Advantage Actor Critic (A2C) reinforcement learning algorithm. We show how a naive scalarization leads to gradients overlapping and we also argue that the entropy regularization term just inject uncontrolled noise into the system. We propose two methods: one to avoid gradient overlapping (NOG) but keeping the same loss formulation; and one to avoid the noise injection (TE) but generating action distributions with a desired entropy. A comprehensive pilot experiment has been carried out showing how using our proposed methods speeds up the training of 210 based reinforcement learning algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/10/2023

A proof of convergence of inverse reinforcement learning for multi-objective optimization

We show the convergence of Wasserstein inverse reinforcement learning fo...
research
05/18/2022

A2C is a special case of PPO

Advantage Actor-critic (A2C) and Proximal Policy Optimization (PPO) are ...
research
07/06/2019

Playing Flappy Bird via Asynchronous Advantage Actor Critic Algorithm

Flappy Bird, which has a very high popularity, has been trained in many ...
research
02/28/2017

Bridging the Gap Between Value and Policy Based Reinforcement Learning

We establish a new connection between value and policy based reinforceme...
research
10/05/2019

Towards Simplicity in Deep Reinforcement Learning: Streamlined Off-Policy Learning

The field of Deep Reinforcement Learning (DRL) has recently seen a surge...
research
04/29/2018

From Credit Assignment to Entropy Regularization: Two New Algorithms for Neural Sequence Prediction

In this work, we study the credit assignment problem in reward augmented...

Please sign up or login with your details

Forgot password? Click here to reset