Improving Generalization in Mountain Car Through the Partitioned Parameterized Policy Approach via Quasi-Stochastic Gradient Descent

05/28/2021
by   Caleb M. Bowyer, et al.
0

The reinforcement learning problem of finding a control policy that minimizes the minimum time objective for the Mountain Car environment is considered. Particularly, a class of parameterized nonlinear feedback policies is optimized over to reach the top of the highest mountain peak in minimum time. The optimization is carried out using quasi-Stochastic Gradient Descent (qSGD) methods. In attempting to find the optimal minimum time policy, a new parameterized policy approach is considered that seeks to learn an optimal policy parameter for different regions of the state space, rather than rely on a single macroscopic policy parameter for the entire state space. This partitioned parameterized policy approach is shown to outperform the uniform parameterized policy approach and lead to greater generalization than prior methods, where the Mountain Car became trapped in circular trajectories in the state space.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/03/2022

Constrained Reinforcement Learning via Dissipative Saddle Flow Dynamics

In constrained reinforcement learning (C-RL), an agent seeks to learn fr...
research
10/23/2018

Hierarchical Approaches for Reinforcement Learning in Parameterized Action Space

We explore Deep Reinforcement Learning in a parameterized action space. ...
research
05/29/2018

Supervised Policy Update

We propose a new sample-efficient methodology, called Supervised Policy ...
research
02/06/2022

Stochastic Gradient Descent with Dependent Data for Offline Reinforcement Learning

In reinforcement learning (RL), offline learning decoupled learning from...
research
02/17/2018

Learning to Race through Coordinate Descent Bayesian Optimisation

In the automation of many kinds of processes, the observable outcome can...
research
06/02/2020

Learning optimal environments using projected stochastic gradient ascent

In this work, we generalize the direct policy search algorithms to an al...

Please sign up or login with your details

Forgot password? Click here to reset