Randomized Policy Learning for Continuous State and Action MDPs

06/08/2020
by   Hiteshi Sharma, et al.
11

Deep reinforcement learning methods have achieved state-of-the-art results in a variety of challenging, high-dimensional domains ranging from video games to locomotion. The key to success has been the use of deep neural networks used to approximate the policy and value function. Yet, substantial tuning of weights is required for good results. We instead use randomized function approximation. Such networks are not only cheaper than training fully connected networks but also improve the numerical performance. We present RANDPOL, a generalized policy iteration algorithm for MDPs with continuous state and action spaces. Both the policy and value functions are represented with randomized networks. We also give finite time guarantees on the performance of the algorithm. Then we show the numerical performance on challenging environments and compare them with deep neural network based algorithms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/13/2015

Deep Reinforcement Learning in Parameterized Action Space

Recent work has shown that deep neural networks are capable of approxima...
research
05/21/2017

Shallow Updates for Deep Reinforcement Learning

Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQ...
research
06/06/2018

Randomized Value Functions via Multiplicative Normalizing Flows

Randomized value functions offer a promising approach towards the challe...
research
11/07/2013

Exploring Deep and Recurrent Architectures for Optimal Control

Sophisticated multilayer neural networks have achieved state of the art ...
research
12/11/2018

Deep neural networks algorithms for stochastic control problems on finite horizon, part I: convergence analysis

This paper develops algorithms for high-dimensional stochastic control p...
research
11/03/2016

Learning Locomotion Skills Using DeepRL: Does the Choice of Action Space Matter?

The use of deep reinforcement learning allows for high-dimensional state...
research
02/08/2020

Provably Efficient Adaptive Approximate Policy Iteration

Model-free reinforcement learning algorithms combined with value functio...

Please sign up or login with your details

Forgot password? Click here to reset