Non-Parametric Stochastic Policy Gradient with Strategic Retreat for Non-Stationary Environment

03/24/2022
by   Apan Dastider, et al.
0

In modern robotics, effectively computing optimal control policies under dynamically varying environments poses substantial challenges to the off-the-shelf parametric policy gradient methods, such as the Deep Deterministic Policy Gradient (DDPG) and Twin Delayed Deep Deterministic policy gradient (TD3). In this paper, we propose a systematic methodology to dynamically learn a sequence of optimal control policies non-parametrically, while autonomously adapting with the constantly changing environment dynamics. Specifically, our non-parametric kernel-based methodology embeds a policy distribution as the features in a non-decreasing Euclidean space, therefore allowing its search space to be defined as a very high (possible infinite) dimensional RKHS (Reproducing Kernel Hilbert Space). Moreover, by leveraging the similarity metric computed in RKHS, we augmented our non-parametric learning with the technique of AdaptiveH- adaptively selecting a time-frame window of finishing the optimal part of whole action-sequence sampled on some preceding observed state. To validate our proposed approach, we conducted extensive experiments with multiple classic benchmarks and one simulated robotics benchmark equipped with dynamically changing environments. Overall, our methodology has outperformed the well-established DDPG and TD3 methodology by a sizeable margin in terms of learning performance.

READ FULL TEXT

page 1

page 6

page 8

research
10/31/2020

A Policy Gradient Algorithm for Learning to Learn in Multiagent Reinforcement Learning

A fundamental challenge in multiagent reinforcement learning is to learn...
research
02/11/2023

A Policy Gradient Framework for Stochastic Optimal Control Problems with Global Convergence Guarantee

In this work, we consider the stochastic optimal control problem in cont...
research
11/06/2018

Are Deep Policy Gradient Algorithms Truly Policy Gradient Algorithms?

We study how the behavior of deep policy gradient algorithms reflects th...
research
03/26/2021

Composable Learning with Sparse Kernel Representations

We present a reinforcement learning algorithm for learning sparse non-pa...
research
06/30/2020

Policy Gradient Optimization of Thompson Sampling Policies

We study the use of policy gradient algorithms to optimize over a class ...
research
05/18/2023

Deep Metric Tensor Regularized Policy Gradient

Policy gradient algorithms are an important family of deep reinforcement...
research
11/22/2018

Solving Chance Constrained Optimization under Non-Parametric Uncertainty Through Hilbert Space Embedding

In this paper, we present an efficient algorithm for solving a class of ...

Please sign up or login with your details

Forgot password? Click here to reset