
Online Policy Gradient for Model Free Learning of Linear Quadratic Regulators with √(T) Regret
We consider the task of learning to control a linear dynamical system un...
The Pendulum Arrangement: Maximizing the Escape Time of Heterogeneous Random Walks
We identify a fundamental phenomenon of heterogeneous one dimensional ra...
Bandit Linear Control
We consider the problem of controlling a known linear dynamical system u...
Logarithmic Regret for Learning Linear Quadratic Regulators Efficiently
We consider the problem of learning in Linear Quadratic Control systems ...
A General Approach to MultiArmed Bandits Under Risk Criteria
Different riskrelated criteria have received recent interest in learnin...
Asaf Cassel
