Entropy Regularized Reinforcement Learning with Cascading Networks

10/16/2022
by   Riccardo Della Vecchia, et al.
0

Deep Reinforcement Learning (Deep RL) has had incredible achievements on high dimensional problems, yet its learning process remains unstable even on the simplest tasks. Deep RL uses neural networks as function approximators. These neural models are largely inspired by developments in the (un)supervised machine learning community. Compared to these learning frameworks, one of the major difficulties of RL is the absence of i.i.d. data. One way to cope with this difficulty is to control the rate of change of the policy at every iteration. In this work, we challenge the common practices of the (un)supervised learning community of using a fixed neural architecture, by having a neural model that grows in size at each policy update. This allows a closed form entropy regularized policy update, which leads to a better control of the rate of change of the policy at each iteration and help cope with the non i.i.d. nature of RL. Initial experiments on classical RL benchmarks show promising results with remarkable convergence on some RL tasks when compared to other deep RL baselines, while exhibiting limitations on others.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2020

The Impact of Non-stationarity on Generalisation in Deep Reinforcement Learning

Non-stationarity arises in Reinforcement Learning (RL) even in stationar...
research
09/07/2019

Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning

Model-free deep reinforcement learning (RL) algorithms have been widely ...
research
07/13/2021

Cautious Policy Programming: Exploiting KL Regularization in Monotonic Policy Improvement for Reinforcement Learning

In this paper, we propose cautious policy programming (CPP), a novel val...
research
07/01/2019

FiDi-RL: Incorporating Deep Reinforcement Learning with Finite-Difference Policy Search for Efficient Learning of Continuous Control

In recent years significant progress has been made in dealing with chall...
research
06/11/2021

GDI: Rethinking What Makes Reinforcement Learning Different From Supervised Learning

Deep Q Network (DQN) firstly kicked the door of deep reinforcement learn...
research
05/25/2020

Formal Methods with a Touch of Magic

Machine learning and formal methods have complimentary benefits and draw...
research
06/24/2019

Deep Conservative Policy Iteration

Conservative Policy Iteration (CPI) is a founding algorithm of Approxima...

Please sign up or login with your details

Forgot password? Click here to reset