Learning Zero-Sum Linear Quadratic Games with Improved Sample Complexity

09/08/2023
by   Jiduan Wu, et al.
0

Zero-sum Linear Quadratic (LQ) games are fundamental in optimal control and can be used (i) as a dynamic game formulation for risk-sensitive or robust control, or (ii) as a benchmark setting for multi-agent reinforcement learning with two competing agents in continuous state-control spaces. In contrast to the well-studied single-agent linear quadratic regulator problem, zero-sum LQ games entail solving a challenging nonconvex-nonconcave min-max problem with an objective function that lacks coercivity. Recently, Zhang et al. discovered an implicit regularization property of natural policy gradient methods which is crucial for safety-critical control systems since it preserves the robustness of the controller during learning. Moreover, in the model-free setting where the knowledge of model parameters is not available, Zhang et al. proposed the first polynomial sample complexity algorithm to reach an ϵ-neighborhood of the Nash equilibrium while maintaining the desirable implicit regularization property. In this work, we propose a simpler nested Zeroth-Order (ZO) algorithm improving sample complexity by several orders of magnitude. Our main result guarantees a 𝒪(ϵ^-3) sample complexity under the same assumptions using a single-point ZO estimator. Furthermore, when the estimator is replaced by a two-point estimator, our method enjoys a better 𝒪(ϵ^-2) sample complexity. Our key improvements rely on a more sample-efficient nested algorithm design and finer control of the ZO natural gradient estimation error.

READ FULL TEXT
research
01/04/2021

Derivative-Free Policy Optimization for Risk-Sensitive and Robust Control Design: Implicit Regularization and Sample Complexity

Direct policy search serves as one of the workhorses in modern reinforce...
research
08/17/2023

Model-Free Algorithm with Improved Sample Efficiency for Zero-Sum Markov Games

The problem of two-player zero-sum Markov games has recently attracted i...
research
07/15/2020

Model-Based Multi-Agent RL in Zero-Sum Markov Games with Near-Optimal Sample Complexity

Model-based reinforcement learning (RL), which finds an optimal policy u...
research
10/21/2019

Policy Optimization for H_2 Linear Control with H_∞ Robustness Guarantee: Implicit Regularization and Global Convergence

Policy optimization (PO) is a key ingredient for reinforcement learning ...
research
02/26/2023

A Finite Sample Complexity Bound for Distributionally Robust Q-learning

We consider a reinforcement learning setting in which the deployment env...
research
05/30/2019

Finite-time Analysis of Approximate Policy Iteration for the Linear Quadratic Regulator

We study the sample complexity of approximate policy iteration (PI) for ...
research
02/13/2019

Learning Ising Models with Independent Failures

We give the first efficient algorithm for learning the structure of an I...

Please sign up or login with your details

Forgot password? Click here to reset