L2C2: Locally Lipschitz Continuous Constraint towards Stable and Smooth Reinforcement Learning

02/15/2022
by   Taisuke Kobayashi, et al.
0

This paper proposes a new regularization technique for reinforcement learning (RL) towards making policy and value functions smooth and stable. RL is known for the instability of the learning process and the sensitivity of the acquired policy to noise. Several methods have been proposed to resolve these problems, and in summary, the smoothness of policy and value functions learned mainly in RL contributes to these problems. However, if these functions are extremely smooth, their expressiveness would be lost, resulting in not obtaining the global optimal solution. This paper therefore considers RL under local Lipschitz continuity constraint, so-called L2C2. By designing the spatio-temporal locally compact space for L2C2 from the state transition at each time step, the moderate smoothness can be achieved without loss of expressiveness. Numerical noisy simulations verified that the proposed L2C2 outperforms the task performance while smoothing out the robot action generated from the learned policy.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/21/2020

Deep Reinforcement Learning with Smooth Policy

Deep neural networks have been widely adopted in modern reinforcement le...
research
05/29/2020

Reinforcement Learning

Reinforcement learning (RL) is a general framework for adaptive control,...
research
01/15/2020

Lipschitz Lifelong Reinforcement Learning

We consider the problem of knowledge transfer when an agent is facing a ...
research
09/20/2022

Locally Constrained Representations in Reinforcement Learning

The success of Reinforcement Learning (RL) heavily relies on the ability...
research
01/26/2018

Safe Exploration in Continuous Action Spaces

We address the problem of deploying a reinforcement learning (RL) agent ...
research
05/24/2021

Policy Mirror Descent for Regularized Reinforcement Learning: A Generalized Framework with Linear Convergence

Policy optimization, which learns the policy of interest by maximizing t...
research
04/08/2021

Efficient time stepping for numerical integration using reinforcement learning

Many problems in science and engineering require the efficient numerical...

Please sign up or login with your details

Forgot password? Click here to reset