Efficient Optimistic Exploration in Linear-Quadratic Regulators via Lagrangian Relaxation

07/13/2020
by   Marc Abeille, et al.
0

We study the exploration-exploitation dilemma in the linear quadratic regulator (LQR) setting. Inspired by the extended value iteration algorithm used in optimistic algorithms for finite MDPs, we propose to relax the optimistic optimization of and cast it into a constrained extended LQR problem, where an additional control variable implicitly selects the system dynamics within a confidence interval. We then move to the corresponding Lagrangian formulation for which we prove strong duality. As a result, we show that an ϵ-optimistic controller can be computed efficiently by solving at most O(log(1/ϵ)) Riccati equations. Finally, we prove that relaxing the original problem does not impact the learning performance, thus recovering the Õ(√(T)) regret of . To the best of our knowledge, this is the first computationally efficient confidence-based algorithm for LQR with worst-case optimal regret guarantees.

READ FULL TEXT

Authors

page 1

page 2

page 3

page 4

02/17/2019

Learning Linear-Quadratic Regulators Efficiently with only √(T) Regret

We present the first computationally-efficient algorithm with O(√(T)) r...
10/23/2020

An Asymptotically Optimal Primal-Dual Incremental Algorithm for Contextual Linear Bandits

In the contextual linear bandit setting, algorithms built on the optimis...
08/30/2020

A Meta-Learning Control Algorithm with Provable Finite-Time Guarantees

In this work we provide provable regret guarantees for an online meta-le...
06/19/2020

Learning Controllers for Unstable Linear Quadratic Regulators from a Single Trajectory

We present the first approach for learning – from a single trajectory – ...
10/21/2020

Meta-Learning Guarantees for Online Receding Horizon Control

In this paper we provide provable regret guarantees for an online meta-l...
06/17/2022

Thompson Sampling Achieves Õ(√(T)) Regret in Linear Quadratic Control

Thompson Sampling (TS) is an efficient method for decision-making under ...
11/10/2019

On the Equivalence of SDP Feasibility and a Convex Hull Relaxation for System of Quadratic Equations

We show semidefinite programming (SDP) feasibility problem is equivalen...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.