One-Shot High-Fidelity Imitation: Training Large-Scale Deep Nets with RL

10/11/2018
by   Tom Le Paine, et al.
4

Humans are experts at high-fidelity imitation -- closely mimicking a demonstration, often in one attempt. Humans use this ability to quickly solve a task instance, and to bootstrap learning of new tasks. Achieving these abilities in autonomous agents is an open problem. In this paper, we introduce an off-policy RL algorithm (MetaMimic) to narrow this gap. MetaMimic can learn both (i) policies for high-fidelity one-shot imitation of diverse novel skills, and (ii) policies that enable the agent to solve tasks more efficiently than the demonstrators. MetaMimic relies on the principle of storing all experiences in a memory and replaying these to learn massive deep neural network policies by off-policy RL. This paper introduces, to the best of our knowledge, the largest existing neural networks for deep RL and shows that larger networks with normalization are needed to achieve one-shot high-fidelity imitation on a challenging manipulation task. The results also show that both types of policy can be learned from vision, in spite of the task rewards being sparse, and without access to demonstrator actions.

READ FULL TEXT

page 6

page 8

page 17

research
06/10/2022

Multifidelity Reinforcement Learning with Control Variates

In many computational science and engineering applications, the output o...
research
04/01/2019

Guided Meta-Policy Search

Reinforcement learning (RL) algorithms have demonstrated promising resul...
research
05/29/2023

Experience Filter: Using Past Experiences on Unseen Tasks or Environments

One of the bottlenecks of training autonomous vehicle (AV) agents is the...
research
07/10/2018

CIRL: Controllable Imitative Reinforcement Learning for Vision-based Self-driving

Autonomous urban driving navigation with complex multi-agent dynamics is...
research
03/01/2022

FIRL: Fast Imitation and Policy Reuse Learning

Intelligent robotics policies have been widely researched for challengin...
research
02/09/2023

One-shot Visual Imitation via Attributed Waypoints and Demonstration Augmentation

In this paper, we analyze the behavior of existing techniques and design...

Please sign up or login with your details

Forgot password? Click here to reset