Continuous Control with Contexts, Provably

10/30/2019
by   Simon S. Du, et al.
0

A fundamental challenge in artificial intelligence is to build an agent that generalizes and adapts to unseen environments. A common strategy is to build a decoder that takes the context of the unseen new environment as input and generates a policy accordingly. The current paper studies how to build a decoder for the fundamental continuous control task, linear quadratic regulator (LQR), which can model a wide range of real-world physical environments. We present a simple algorithm for this problem, which uses upper confidence bound (UCB) to refine the estimate of the decoder and balance the exploration-exploitation trade-off. Theoretically, our algorithm enjoys a O(√(T)) regret bound in the online setting where T is the number of environments the agent played. This also implies after playing O(1/ϵ^2) environments, the agent is able to transfer the learned knowledge to obtain an ϵ-suboptimal policy for an unseen environment. To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting. While our main focus is theoretical, we also present experiments that demonstrate the effectiveness of our algorithm.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset