Information Theoretic Regret Bounds for Online Nonlinear Control

06/22/2020
by   Sham Kakade, et al.
14

This work studies the problem of sequential control in an unknown, nonlinear dynamical system, where we model the underlying system dynamics as an unknown function in a known Reproducing Kernel Hilbert Space. This framework yields a general setting that permits discrete and continuous control inputs as well as non-smooth, non-differentiable dynamics. Our main result, the Lower Confidence-based Continuous Control (LC^3) algorithm, enjoys a near-optimal O(√(T)) regret bound against the optimal controller in episodic settings, where T is the number of episodes. The bound has no explicit dependence on dimension of the system dynamics, which could be infinite, but instead only depends on information theoretic quantities. We empirically show its application to a number of nonlinear control tasks and demonstrate the benefit of exploration for learning model dynamics.

READ FULL TEXT

page 13

page 36

research
05/30/2018

An Information-Theoretic Analysis for Thompson Sampling with Many Actions

Information-theoretic Bayesian regret bounds of Russo and Van Roy captur...
research
07/16/2021

Robust Online Control with Model Misspecification

We study online control of an unknown nonlinear dynamical system that is...
research
06/07/2021

Random features for adaptive nonlinear control and prediction

A key assumption in the theory of adaptive control for nonlinear systems...
research
01/31/2012

Empowerment for Continuous Agent-Environment Systems

This paper develops generalizations of empowerment to continuous states....
research
02/23/2019

Online Control with Adversarial Disturbances

We study the control of a linear dynamical system with adversarial distu...
research
03/19/2021

Towards a Dimension-Free Understanding of Adaptive Linear Control

We study the problem of adaptive control of the linear quadratic regulat...
research
06/01/2020

Nonlinear observability algorithms with known and unknown inputs: analysis and implementation

The observability of a dynamical system is affected by the presence of e...

Please sign up or login with your details

Forgot password? Click here to reset