D2C 2.0: Decoupled Data-Based Approach for Learning to Control Stochastic Nonlinear Systems via Model-Free ILQR

02/18/2020
by   Karthikeya S Parunandi, et al.
0

In this paper, we propose a structured linear parameterization of a feedback policy to solve the model-free stochastic optimal control problem. This parametrization is corroborated by a decoupling principle that is shown to be near-optimal under a small noise assumption, both in theory and by empirical analyses. Further, we incorporate a model-free version of the Iterative Linear Quadratic Regulator (ILQR) in a sample-efficient manner into our framework. Simulations on systems over a range of complexities reveal that the resulting algorithm is able to harness the superior second-order convergence properties of ILQR. As a result, it is fast and is scalable to a wide variety of higher dimensional systems. Comparisons are made with a state-of-the-art reinforcement learning algorithm, the Deep Deterministic Policy Gradient (DDPG) technique, in order to demonstrate the significant merits of our approach in terms of training-efficiency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/15/2018

Global Convergence of Policy Gradient Methods for Linearized Control Problems

Direct policy gradient methods for reinforcement learning and continuous...
research
08/20/2020

Model-free optimal control of discrete-time systems with additive and multiplicative noises

This paper investigates the optimal control problem for a class of discr...
research
02/25/2021

Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with √(T) Regret

We consider the task of learning to control a linear dynamical system un...
research
07/13/2020

Structured Policy Iteration for Linear Quadratic Regulator

Linear quadratic regulator (LQR) is one of the most popular frameworks t...
research
06/12/2020

Combining Model-Based and Model-Free Methods for Nonlinear Control: A Provably Convergent Policy Gradient Approach

Model-free learning-based control methods have seen great success recent...
research
08/23/2021

Model-Free Learning of Optimal Deterministic Resource Allocations in Wireless Systems via Action-Space Exploration

Wireless systems resource allocation refers to perpetual and challenging...
research
09/19/2023

Oracle Complexity Reduction for Model-free LQR: A Stochastic Variance-Reduced Policy Gradient Approach

We investigate the problem of learning an ϵ-approximate solution for the...

Please sign up or login with your details

Forgot password? Click here to reset