Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently

by   Asaf Cassel, et al.

We consider the problem of learning in Linear Quadratic Control systems whose transition parameters are initially unknown. Recent results in this setting have demonstrated efficient learning algorithms with regret growing with the square root of the number of decision steps. We present new efficient algorithms that achieve, perhaps surprisingly, regret that scales only (poly)logarithmically with the number of steps in two scenarios: when only the state transition matrix A is unknown, and when only the state-action transition matrix B is unknown and the optimal policy satisfies a certain non-degeneracy condition. On the other hand, we give a lower bound that shows that when the latter condition is violated, square root regret is unavoidable.


page 1

page 2

page 3

page 4


Minimal Expected Regret in Linear Quadratic Control

We consider the problem of online learning in Linear Quadratic Control s...

Learning Linear-Quadratic Regulators Efficiently with only √(T) Regret

We present the first computationally-efficient algorithm with O(√(T)) r...

On Uninformative Optimal Policies in Adaptive LQR with Unknown B-Matrix

This paper presents local asymptotic minimax regret lower bounds for ada...

Scalable regret for learning to control network-coupled subsystems with unknown dynamics

We consider the problem of controlling an unknown linear quadratic Gauss...

Learning to Control in Metric Space with Optimal Regret

We study online reinforcement learning for finite-horizon deterministic ...

Adaptive Control of Quadratic Costs in Linear Stochastic Differential Equations

We study a canonical problem in adaptive control; design and analysis of...

Regret Lower Bounds for Learning Linear Quadratic Gaussian Systems

This paper presents local minimax regret lower bounds for adaptively con...