Distributed Reinforcement Learning for Decentralized Linear Quadratic Control: A Derivative-Free Policy Optimization Approach

12/19/2019
by   Yingying Li, et al.
0

This paper considers a distributed reinforcement learning problem for decentralized linear quadratic control with partial state observations and local costs. We propose the Zero-Order Distributed Policy Optimization algorithm (ZODPO) that learns linear local controllers in a distributed fashion, leveraging the ideas of policy gradient, zero-order optimization and consensus algorithms. In ZODPO, each agent estimates the global cost by consensus, and then conducts local policy gradient in parallel based on zero-order gradient estimation. ZODPO only requires limited communication and storage even in large-scale systems. Further, we investigate the nonasymptotic performance of ZODPO and show that the sample complexity to approach a stationary point is polynomial with the error tolerance's inverse and the problem dimensions, demonstrating the scalability of ZODPO. We also show that the controllers generated by ZODPO are stabilizing with high probability. Lastly, we numerically test ZODPO on a multi-zone HVAC system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/06/2021

MDPGT: Momentum-based Decentralized Policy Gradient Tracking

We propose a novel policy gradient method for multi-agent reinforcement ...
research
06/18/2020

Cooperative Multi-Agent Reinforcement Learning with Partial Observations

In this paper, we propose a distributed zeroth-order policy optimization...
research
01/04/2021

Derivative-Free Policy Optimization for Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

Direct policy search serves as one of the workhorses in modern reinforce...
research
12/02/2020

Sample Complexity of Policy Gradient Finding Second-Order Stationary Points

The goal of policy-based reinforcement learning (RL) is to search the ma...
research
01/10/2022

Distributed Cooperative Multi-Agent Reinforcement Learning with Directed Coordination Graph

Existing distributed cooperative multi-agent reinforcement learning (MAR...
research
12/20/2018

Derivative-Free Methods for Policy Optimization: Guarantees for Linear Quadratic Systems

We study derivative-free methods for policy optimization over the class ...
research
09/12/2022

On the Optimization Landscape of Dynamic Output Feedback: A Case Study for Linear Quadratic Regulator

The convergence of policy gradient algorithms in reinforcement learning ...

Please sign up or login with your details

Forgot password? Click here to reset