Augmented RBMLE-UCB Approach for Adaptive Control of Linear Quadratic Systems
We consider the problem of controlling a stochastic linear system with quadratic costs, when its system parameters are not known to the agent – called the adaptive LQG control problem. We re-examine an approach called "Reward-Biased Maximum Likelihood Estimate" (RBMLE) that was proposed more than forty years ago, and which predates the "Upper Confidence Bound" (UCB) method as well as the definition of "regret". It simply added a term favoring parameters with larger rewards to the estimation criterion. We propose an augmented approach that combines the penalty of the RBMLE method with the constraint of the UCB method, uniting the two approaches to optimization in the face of uncertainty. We first establish that theoretically this method retains 𝒪(√(T)) regret, the best known so far. We show through a comprehensive simulation study that this augmented RBMLE method considerably outperforms the UCB and Thompson sampling approaches, with a regret that is typically less than 50% of the better of their regrets. The simulation study includes all examples from earlier papers as well as a large collection of randomly generated systems.
READ FULL TEXT