Proximal Policy Optimization for Tracking Control Exploiting Future Reference Information

07/20/2021
by   Jana Mayer, et al.
0

In recent years, reinforcement learning (RL) has gained increasing attention in control engineering. Especially, policy gradient methods are widely used. In this work, we improve the tracking performance of proximal policy optimization (PPO) for arbitrary reference signals by incorporating information about future reference values. Two variants of extending the argument of the actor and the critic taking future reference values into account are presented. In the first variant, global future reference values are added to the argument. For the second variant, a novel kind of residual space with future reference values applicable to model-free reinforcement learning is introduced. Our approach is evaluated against a PI controller on a simple drive train model. We expect our method to generalize to arbitrary references better than previous approaches, pointing towards the applicability of RL to control real systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/31/2022

Steady-State Error Compensation in Reference Tracking and Disturbance Rejection Problems for Reinforcement Learning-Based Control

Reinforcement learning (RL) is a promising, upcoming topic in automatic ...
research
09/28/2022

Reinforcement Learning with Tensor Networks: Application to Dynamical Large Deviations

We present a framework to integrate tensor network (TN) methods with rei...
research
08/03/2020

Proximal Deterministic Policy Gradient

This paper introduces two simple techniques to improve off-policy Reinfo...
research
07/20/2021

Mastering Visual Continuous Control: Improved Data-Augmented Reinforcement Learning

We present DrQ-v2, a model-free reinforcement learning (RL) algorithm fo...
research
08/23/2019

A Comparison of Action Spaces for Learning Manipulation Tasks

Designing reinforcement learning (RL) problems that can produce delicate...
research
07/19/2021

Reinforcement learning based closed‐loop reference model adaptive flight control system design

In this study, we present a reinforcement learning (RL)-based flight con...
research
11/27/2022

Combined Peak Reduction and Self-Consumption Using Proximal Policy Optimization

Residential demand response programs aim to activate demand flexibility ...

Please sign up or login with your details

Forgot password? Click here to reset