Continuous Control with Contexts, Provably

10/30/2019
by   Simon S. Du, et al.
0

A fundamental challenge in artificial intelligence is to build an agent that generalizes and adapts to unseen environments. A common strategy is to build a decoder that takes the context of the unseen new environment as input and generates a policy accordingly. The current paper studies how to build a decoder for the fundamental continuous control task, linear quadratic regulator (LQR), which can model a wide range of real-world physical environments. We present a simple algorithm for this problem, which uses upper confidence bound (UCB) to refine the estimate of the decoder and balance the exploration-exploitation trade-off. Theoretically, our algorithm enjoys a O(√(T)) regret bound in the online setting where T is the number of environments the agent played. This also implies after playing O(1/ϵ^2) environments, the agent is able to transfer the learned knowledge to obtain an ϵ-suboptimal policy for an unseen environment. To our knowledge, this is first provably efficient algorithm to build a decoder in the continuous control setting. While our main focus is theoretical, we also present experiments that demonstrate the effectiveness of our algorithm.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2019

Learning Transferable Graph Exploration

This paper considers the problem of efficient exploration of unseen envi...
research
10/27/2022

Provable Sim-to-real Transfer in Continuous Domain with Partial Observations

Sim-to-real transfer trains RL agents in the simulated environments and ...
research
06/01/2022

Provably Efficient Lifelong Reinforcement Learning with Linear Function Approximation

We study lifelong reinforcement learning (RL) in a regret minimization s...
research
07/11/2017

The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously

This paper introduces the Intentional Unintentional (IU) agent. This age...
research
04/08/2019

Learning to Navigate Unseen Environments: Back Translation with Environmental Dropout

A grand goal in AI is to build a robot that can accurately navigate base...
research
07/26/2017

Guiding Reinforcement Learning Exploration Using Natural Language

In this work we present a technique to use natural language to help rein...
research
06/01/2018

Learning convex bounds for linear quadratic control policy synthesis

Learning to make decisions from observed data in dynamic environments re...

Please sign up or login with your details

Forgot password? Click here to reset