Adaptive Policy Learning to Additional Tasks

05/24/2023
by   Wenjian Hao, et al.
0

This paper develops a policy learning method for tuning a pre-trained policy to adapt to additional tasks without altering the original task. A method named Adaptive Policy Gradient (APG) is proposed in this paper, which combines Bellman's principle of optimality with the policy gradient approach to improve the convergence rate. This paper provides theoretical analysis which guarantees the convergence rate and sample complexity of 𝒪(1/T) and 𝒪(1/ϵ), respectively, where T denotes the number of iterations and ϵ denotes the accuracy of the resulting stationary policy. Furthermore, several challenging numerical simulations, including cartpole, lunar lander, and robot arm, are provided to show that APG obtains similar performance compared to existing deterministic policy gradient methods while utilizing much less data and converging at a faster rate.

READ FULL TEXT
research
05/31/2023

On the Linear Convergence of Policy Gradient under Hadamard Parameterization

The convergence of deterministic policy gradient under the Hadamard para...
research
05/24/2023

Policy Learning based on Deep Koopman Representation

This paper proposes a policy learning algorithm based on the Koopman ope...
research
03/09/2020

Stochastic Recursive Momentum for Policy Gradient Methods

In this paper, we propose a novel algorithm named STOchastic Recursive M...
research
10/17/2022

On the convergence of policy gradient methods to Nash equilibria in general stochastic games

Learning in stochastic games is a notoriously difficult problem because,...
research
02/01/2022

PAGE-PG: A Simple and Loopless Variance-Reduced Policy Gradient Method with Probabilistic Gradient Estimation

Despite their success, policy gradient methods suffer from high variance...
research
06/30/2021

Inverse Design of Grating Couplers Using the Policy Gradient Method from Reinforcement Learning

We present a proof-of-concept technique for the inverse design of electr...
research
02/06/2021

A Hybrid Approach for Reinforcement Learning Using Virtual Policy Gradient for Balancing an Inverted Pendulum

Using the policy gradient algorithm, we train a single-hidden-layer neur...

Please sign up or login with your details

Forgot password? Click here to reset