In the stochastic linear contextual bandit setting there exist several
m...
Since its introduction a decade ago, relative entropy policy search
(REP...
This work extends the analysis of the theoretical results presented with...
Online learning is a powerful tool for analyzing iterative algorithms.
H...
Existing on-policy imitation learning algorithms, such as DAgger, assume...
We study the dynamic regret of a new class of online learning problems, ...
Generalizing manipulation skills to new situations requires extracting
i...
On-policy imitation learning algorithms such as Dagger evolve a robot co...
Learning from human demonstrations can facilitate automation but is risk...
Learning to accomplish tasks such as driving, grasping or surgery from
s...