Policy Learning based on Deep Koopman Representation

05/24/2023
by   Wenjian Hao, et al.
0

This paper proposes a policy learning algorithm based on the Koopman operator theory and policy gradient approach, which seeks to approximate an unknown dynamical system and search for optimal policy simultaneously, using the observations gathered through interaction with the environment. The proposed algorithm has two innovations: first, it introduces the so-called deep Koopman representation into the policy gradient to achieve a linear approximation of the unknown dynamical system, all with the purpose of improving data efficiency; second, the accumulated errors for long-term tasks induced by approximating system dynamics are avoided by applying Bellman's principle of optimality. Furthermore, a theoretical analysis is provided to prove the asymptotic convergence of the proposed algorithm and characterize the corresponding sampling complexity. These conclusions are also supported by simulations on several challenging benchmark environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/24/2023

Adaptive Policy Learning to Additional Tasks

This paper develops a policy learning method for tuning a pre-trained po...
research
01/06/2021

Smoothed functional-based gradient algorithms for off-policy reinforcement learning

We consider the problem of control in an off-policy reinforcement learni...
research
05/28/2021

Joint Optimization of Multi-Objective Reinforcement Learning with Policy Gradient Based Algorithm

Many engineering problems have multiple objectives, and the overall aim ...
research
03/22/2023

Matryoshka Policy Gradient for Entropy-Regularized RL: Convergence and Global Optimality

A novel Policy Gradient (PG) algorithm, called Matryoshka Policy Gradien...
research
10/29/2020

Low-Variance Policy Gradient Estimation with World Models

In this paper, we propose World Model Policy Gradient (WMPG), an approac...
research
05/26/2023

A Policy Gradient Method for Confounded POMDPs

In this paper, we propose a policy gradient method for confounded partia...

Please sign up or login with your details

Forgot password? Click here to reset