Making Non-Stochastic Control (Almost) as Easy as Stochastic

06/10/2020
by   Max Simchowitz, et al.
0

Recent literature has made much progress in understanding online LQR: a modern learning-theoretic take on the classical control problem in which a learner attempts to optimally control an unknown linear dynamical system with fully observed state, perturbed by i.i.d. Gaussian noise. It is now understood that the optimal regret on time horizon T against the optimal control law scales as Θ(√(T)). In this paper, we show that the same regret rate (against a suitable benchmark) is attainable even in the considerably more general non-stochastic control model, where the system is driven by arbitrary adversarial noise (Agarwal et al. 2019). In other words, stochasticity confers little benefit in online LQR. We attain the optimal O(√(T)) regret when the dynamics are unknown to the learner, and poly(log T) regret when known, provided that the cost functions are strongly convex (as in LQR). Our algorithm is based on a novel variant of online Newton step (Hazan et al. 2007), which adapts to the geometry induced by possibly adversarial disturbances, and our analysis hinges on generic “policy regret” bounds for certain structured losses in the OCO-with-memory framework (Anava et al. 2015). Moreover, our results accomodate the full generality of the non-stochastic control setting: adversarially chosen (possibly non-quadratic) costs, partial state observation, and fully adversarial process and observation noise.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/25/2020

Improper Learning for Non-Stochastic Control

We consider the problem of controlling a possibly unknown linear dynamic...
research
02/29/2020

Logarithmic Regret for Adversarial Online Control

We introduce a new algorithm for online linear-quadratic control in a kn...
research
02/07/2020

The Power of Linear Controllers in LQR Control

The Linear Quadratic Regulator (LQR) framework considers the problem of ...
research
01/27/2020

Naive Exploration is Optimal for Online LQR

We consider the problem of online adaptive control of the linear quadrat...
research
05/29/2019

Learning to Crawl

Web crawling is the problem of keeping a cache of webpages fresh, i.e., ...
research
02/06/2020

No-Regret Prediction in Marginally Stable Systems

We consider the problem of online prediction in a marginally stable line...
research
09/04/2019

Stochastic Linear Optimization with Adversarial Corruption

We extend the model of stochastic bandits with adversarial corruption (L...

Please sign up or login with your details

Forgot password? Click here to reset