Online Learning for Receding Horizon Control with Provable Regret Guarantees

11/30/2021
by   Deepan Muthirayan, et al.
0

We address the problem of learning to control an unknown linear dynamical system with time varying cost functions through the framework of online Receding Horizon Control (RHC). We consider the setting where the control algorithm does not know the true system model and has only access to a fixed-length (that does not grow with the control horizon) preview of the future cost functions. We characterize the performance of an algorithm using the metric of dynamic regret, which is defined as the difference between the cumulative cost incurred by the algorithm and that of the best sequence of actions in hindsight. We propose two different online RHC algorithms to address this problem, namely Certainty Equivalence RHC (CE-RHC) algorithm and Optimistic RHC (O-RHC) algorithm. We show that under the standard stability assumption for the model estimate, the CE-RHC algorithm achieves 𝒪(T^2/3) dynamic regret. We then extend this result to the setting where the stability assumption hold only for the true system model by proposing the O-RHC algorithm. We show that O-RHC algorithm achieves 𝒪(T^2/3) dynamic regret but with some additional computation.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/21/2020

Meta-Learning Guarantees for Online Receding Horizon Control

In this paper we provide provable regret guarantees for an online meta-l...
research
02/02/2021

Strongly Adaptive OCO with Memory

Recent progress in online control has popularized online learning with m...
research
02/19/2021

Learning to Persuade on the Fly: Robustness Against Ignorance

We study a repeated persuasion setting between a sender and a receiver, ...
research
08/30/2020

A Meta-Learning Control Algorithm with Provable Finite-Time Guarantees

In this work we provide provable regret guarantees for an online meta-le...
research
01/21/2023

A Communication-Efficient Adaptive Algorithm for Federated Learning under Cumulative Regret

We consider the problem of online stochastic optimization in a distribut...
research
05/29/2019

Learning to Crawl

Web crawling is the problem of keeping a cache of webpages fresh, i.e., ...
research
03/19/2019

Dynamic Learning of Sequential Choice Bandit Problem under Marketing Fatigue

Motivated by the observation that overexposure to unwanted marketing act...

Please sign up or login with your details

Forgot password? Click here to reset