Non-Deterministic Policy Improvement Stabilizes Approximated Reinforcement Learning

12/22/2016
by   Wendelin Böhmer, et al.
0

This paper investigates a type of instability that is linked to the greedy policy improvement in approximated reinforcement learning. We show empirically that non-deterministic policy improvement can stabilize methods like LSPI by controlling the improvements' stochasticity. Additionally we show that a suitable representation of the value function also stabilizes the solution to some degree. The presented approach is simple and should also be easily transferable to more sophisticated algorithms like deep reinforcement learning.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/02/2023

Accelerating Policy Gradient by Estimating Value Function from Prior Computation in Deep Reinforcement Learning

This paper investigates the use of prior computation to estimate the val...
research
02/11/2021

Echo State Networks for Reinforcement Learning

Echo State Networks (ESNs) are a type of single-layer recurrent neural n...
research
05/28/2019

Generation of Policy-Level Explanations for Reinforcement Learning

Though reinforcement learning has greatly benefited from the incorporati...
research
05/20/2018

Safe Policy Learning from Observations

In this paper, we consider the problem of learning a policy by observing...
research
05/25/2019

A Kernel Loss for Solving the Bellman Equation

Value function learning plays a central role in many state-of-the-art re...
research
09/24/2021

Combing Policy Evaluation and Policy Improvement in a Unified f-Divergence Framework

The framework of deep reinforcement learning (DRL) provides a powerful a...
research
01/09/2018

DeepTraffic: Driving Fast through Dense Traffic with Deep Reinforcement Learning

We present a micro-traffic simulation (named "DeepTraffic") where the pe...

Please sign up or login with your details

Forgot password? Click here to reset