Reinforcement Learning with Partial Parametric Model Knowledge

04/26/2023
by   Shuyuan Wang, et al.
0

We adapt reinforcement learning (RL) methods for continuous control to bridge the gap between complete ignorance and perfect knowledge of the environment. Our method, Partial Knowledge Least Squares Policy Iteration (PLSPI), takes inspiration from both model-free RL and model-based control. It uses incomplete information from a partial model and retains RL's data-driven adaption towards optimal performance. The linear quadratic regulator provides a case study; numerical experiments demonstrate the effectiveness and resulting benefits of the proposed method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/09/2018

The Gap Between Model-Based and Model-Free Methods on the Linear Quadratic Regulator: An Asymptotic Viewpoint

The effectiveness of model-based versus model-free methods is a long-sta...
research
07/19/2013

Model-Based Policy Gradients with Parameter-Based Exploration by Least-Squares Conditional Density Estimation

The goal of reinforcement learning (RL) is to let an agent learn an opti...
research
09/13/2022

Data efficient reinforcement learning and adaptive optimal perimeter control of network traffic dynamics

Existing data-driven and feedback traffic control strategies do not cons...
research
05/20/2023

Model-based adaptation for sample efficient transfer in reinforcement learning control of parameter-varying systems

In this paper, we leverage ideas from model-based control to address the...
research
03/13/2021

RL-Controller: a reinforcement learning framework for active structural control

To maintain structural integrity and functionality during the designed l...
research
05/27/2021

A Modular and Transferable Reinforcement Learning Framework for the Fleet Rebalancing Problem

Mobility on demand (MoD) systems show great promise in realizing flexibl...
research
05/22/2023

Policy Representation via Diffusion Probability Model for Reinforcement Learning

Popular reinforcement learning (RL) algorithms tend to produce a unimoda...

Please sign up or login with your details

Forgot password? Click here to reset