Adaptive Control of Quadratic Costs in Linear Stochastic Differential Equations
We study a canonical problem in adaptive control; design and analysis of policies for minimizing quadratic costs in unknown continuous-time linear dynamical systems. We address important challenges including accuracy of learning the unknown parameters of the underlying stochastic differential equation, as well as full analyses of performance degradation due to sub-optimal actions (i.e., regret). Then, an easy-to-implement algorithm for balancing exploration versus exploitation is proposed, followed by theoretical guarantees showing a square-root of time regret bound. Further, we present tight results for assuring system stability and for specifying fundamental limits for regret. To establish the presented results, multiple novel technical frameworks are developed, which can be of independent interests.
READ FULL TEXT