Chain-of-Thought Predictive Control

04/03/2023
by   Zhiwei Jia, et al.
0

We study generalizable policy learning from demonstrations for complex low-level control tasks (e.g., contact-rich object manipulations). We propose an imitation learning method that incorporates the idea of temporal abstraction and the planning capabilities from Hierarchical RL (HRL) in a novel and effective manner. As a step towards decision foundation models, our design can utilize scalable, albeit highly sub-optimal, demonstrations. Specifically, we find certain short subsequences of the demos, i.e. the chain-of-thought (CoT), reflect their hierarchical structures by marking the completion of subgoals in the tasks. Our model learns to dynamically predict the entire CoT as coherent and structured long-term action guidance and consistently outperforms typical two-stage subgoal-conditioned policies. On the other hand, such CoT facilitates generalizable policy learning as they exemplify the decision patterns shared among demos (even those with heavy noises and randomness). Our method, Chain-of-Thought Predictive Control (CoTPC), significantly outperforms existing ones on challenging low-level manipulation tasks from scalable yet highly sub-optimal demos.

READ FULL TEXT

page 2

page 4

page 7

page 9

research
04/01/2020

Learning Sparse Rewarded Tasks from Sub-Optimal Demonstrations

Model-free deep reinforcement learning (RL) has demonstrated its superio...
research
05/09/2022

Disturbance-Injected Robust Imitation Learning with Task Achievement

Robust imitation learning using disturbance injections overcomes issues ...
research
10/25/2019

Relay Policy Learning: Solving Long-Horizon Tasks via Imitation and Reinforcement Learning

We present relay policy learning, a method for imitation and reinforceme...
research
03/08/2022

Learning Sensorimotor Primitives of Sequential Manipulation Tasks from Visual Demonstrations

This work aims to learn how to perform complex robot manipulation tasks ...
research
06/10/2023

PEAR: Primitive enabled Adaptive Relabeling for boosting Hierarchical Reinforcement Learning

Hierarchical reinforcement learning (HRL) has the potential to solve com...
research
09/29/2020

Learning Skills to Patch Plans Based on Inaccurate Models

Planners using accurate models can be effective for accomplishing manipu...
research
12/04/2019

Learning from Interventions using Hierarchical Policies for Safe Learning

Learning from Demonstrations (LfD) via Behavior Cloning (BC) works well ...

Please sign up or login with your details

Forgot password? Click here to reset