Bandit Linear Control

07/01/2020
by   Asaf Cassel, et al.
0

We consider the problem of controlling a known linear dynamical system under stochastic noise, adversarially chosen costs, and bandit feedback. Unlike the full feedback setting where the entire cost function is revealed after each decision, here only the cost incurred by the learner is observed. We present a new and efficient algorithm that, for strongly convex and smooth costs, obtains regret that grows with the square root of the time horizon T. We also give extensions of this result to general convex, possibly non-smooth costs, and to non-stochastic system noise. A key component of our algorithm is a new technique for addressing bandit optimization of loss functions with memory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/12/2020

Non-Stochastic Control with Bandit Feedback

We study the problem of controlling a linear dynamical system with adver...
research
10/25/2020

Geometric Exploration for Online Control

We study the control of an unknown linear dynamical system under general...
research
02/12/2022

Adaptive Bandit Convex Optimization with Heterogeneous Curvature

We consider the problem of adversarial bandit convex optimization, that ...
research
05/26/2022

On stochastic stabilization via non-smooth control Lyapunov functions

Control Lyapunov function is a central tool in stabilization. It general...
research
03/02/2022

Efficient Online Linear Control with Stochastic Convex Costs and Unknown Dynamics

We consider the problem of controlling an unknown linear dynamical syste...
research
06/02/2016

Stochastic Structured Prediction under Bandit Feedback

Stochastic structured prediction under bandit feedback follows a learnin...
research
11/27/2019

The Nonstochastic Control Problem

We consider the problem of controlling an unknown linear dynamical syste...

Please sign up or login with your details

Forgot password? Click here to reset