On Optimality of Adaptive Linear-Quadratic Regulators
Adaptive regulation of linear systems represents a canonical problem in stochastic control. Performance of adaptive control policies is assessed through the regret with respect to the optimal regulator, that reflects the increase in the operating cost due to uncertainty about the parameters that drive the dynamics of the system. However, available results in the literature do not provide a sharp quantitative characterization of the effect of the unknown dynamics parameters on the regret. Further, there are issues on how easy it is to implement the adaptive policies proposed in the literature. Finally, results regarding the accuracy that the system's parameters are identified are scarce and rather incomplete. This study aims to comprehensively address these three issues. First, by introducing a novel decomposition of adaptive policies, we establish a sharp expression for the regret of an arbitrary policy in terms of the deviations from the optimal regulator. Second, we show that adaptive policies based on a slight modification of the widely used Certainty Equivalence scheme are optimal. Specifically, we establish a regret of (nearly) square-root rate for two families of randomized adaptive policies. The presented regret bounds are obtained by using anti-concentration results on the random matrices employed when randomizing the estimates of the unknown dynamics parameters. Moreover, we study the minimal additional information needed on dynamics matrices for which the regret will become of logarithmic order. Finally, the rate at which the unknown parameters of the system are being identified is specified for the proposed adaptive policies.
READ FULL TEXT