Regret Analysis of Learning-Based MPC with Partially-Unknown Cost Function

08/04/2021
by   Ilgin Dogan, et al.
0

The exploration/exploitation trade-off is an inherent challenge in data-driven and adaptive control. Though this trade-off has been studied for multi-armed bandits, reinforcement learning (RL) for finite Markov chains, and RL for linear control systems; it is less well-studied for learning-based control of nonlinear control systems. A significant theoretical challenge in the nonlinear setting is that, unlike the linear case, there is no explicit characterization of an optimal controller for a given set of cost and system parameters. We propose in this paper the use of a finite-horizon oracle controller with perfect knowledge of all system parameters as a reference for optimal control actions. First, this allows us to propose a new regret notion with respect to this oracle finite-horizon controller. Second, this allows us to develop learning-based policies that we prove achieve low regret (i.e., square-root regret up to a log-squared factor) with respect to this oracle finite-horizon controller. This policy is developed in the context of learning-based model predictive control (LBMPC). We conduct a statistical analysis to prove finite sample concentration bounds for the estimation step of our policy, and then we perform a control-theoretic analysis using techniques from MPC- and optimization-theory to show this policy ensures closed-loop stability and achieves low regret. We conclude with numerical experiments on a model of heating, ventilation, and air-conditioning (HVAC) systems that show the low regret of our policy in a setting where the cost function is partially-unknown to the controller.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/28/2021

Competitive Control

We consider control from the perspective of competitive analysis. Unlike...
research
11/26/2020

Regret Bounds for Adaptive Nonlinear Control

We study the problem of adaptively controlling a known discrete-time non...
research
10/21/2020

Meta-Learning Guarantees for Online Receding Horizon Control

In this paper we provide provable regret guarantees for an online meta-l...
research
08/30/2020

A Meta-Learning Control Algorithm with Provable Finite-Time Guarantees

In this work we provide provable regret guarantees for an online meta-le...
research
02/23/2021

Recurrent Model Predictive Control

This paper proposes an off-line algorithm, called Recurrent Model Predic...
research
01/07/2020

Infinite-Horizon Differentiable Model Predictive Control

This paper proposes a differentiable linear quadratic Model Predictive C...
research
05/08/2015

An Asymptotically Optimal Policy for Uniform Bandits of Unknown Support

Consider the problem of a controller sampling sequentially from a finite...

Please sign up or login with your details

Forgot password? Click here to reset