Discrete-to-Deep Supervised Policy Learning

05/05/2020
by   Budi Kurniawan, et al.
0

Neural networks are effective function approximators, but hard to train in the reinforcement learning (RL) context mainly because samples are correlated. For years, scholars have got around this by employing experience replay or an asynchronous parallel-agent system. This paper proposes Discrete-to-Deep Supervised Policy Learning (D2D-SPL) for training neural networks in RL. D2D-SPL discretises the continuous state space into discrete states and uses actor-critic to learn a policy. It then selects from each discrete state an input value and the action with the highest numerical preference as an input/target pair. Finally it uses input/target pairs from all discrete states to train a classifier. D2D-SPL uses a single agent, needs no experience replay and learns much faster than state-of-the-art methods. We test our method with two RL environments, the Cartpole and an aircraft manoeuvring simulator.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/01/2022

Actor Prioritized Experience Replay

A widely-studied deep reinforcement learning (RL) technique known as Pri...
research
07/01/2017

Sample-efficient Actor-Critic Reinforcement Learning with Supervised Data for Dialogue Management

Deep reinforcement learning (RL) methods have significant potential for ...
research
03/12/2023

Synthetic Experience Replay

A key theme in the past decade has been that when large neural networks ...
research
07/16/2018

Remember and Forget for Experience Replay

Experience replay (ER) is crucial for attaining high data-efficiency in ...
research
10/28/2021

Hindsight Goal Ranking on Replay Buffer for Sparse Reward Environment

This paper proposes a method for prioritizing the replay experience refe...
research
11/18/2020

Weighted Entropy Modification for Soft Actor-Critic

We generalize the existing principle of the maximum Shannon entropy in r...
research
01/13/2021

Continuous Deep Q-Learning with Simulator for Stabilization of Uncertain Discrete-Time Systems

Applications of reinforcement learning (RL) to stabilization problems of...

Please sign up or login with your details

Forgot password? Click here to reset