K-VIL: Keypoints-based Visual Imitation Learning

by   Jianfeng Gao, et al.

Visual imitation learning provides efficient and intuitive solutions for robotic systems to acquire novel manipulation skills. However, simultaneously learning geometric task constraints and control policies from visual inputs alone remains a challenging problem. In this paper, we propose an approach for keypoint-based visual imitation (K-VIL) that automatically extracts sparse, object-centric, and embodiment-independent task representations from a small number of human demonstration videos. The task representation is composed of keypoint-based geometric constraints on principal manifolds, their associated local frames, and the movement primitives that are then needed for the task execution. Our approach is capable of extracting such task representations from a single demonstration video, and of incrementally updating them when new demonstrations become available. To reproduce manipulation skills using the learned set of prioritized geometric constraints in novel scenes, we introduce a novel keypoint-based admittance controller. We evaluate our approach in several real-world applications, showcasing its ability to deal with cluttered scenes, new instances of categorical objects, and large object pose and shape variations, as well as its efficiency and robustness in both one-shot and few-shot imitation learning settings. Videos and source code are available at https://sites.google.com/view/k-vil.


page 1

page 4

page 10

page 11

page 12

page 13

page 14

page 15


Imitation from Observation: Learning to Imitate Behaviors from Raw Video via Context Translation

Imitation learning is an effective approach for autonomous systems to ac...

Demonstrate Once, Imitate Immediately (DOME): Learning Visual Servoing for One-Shot Imitation Learning

We present DOME, a novel method for one-shot imitation learning, where a...

Graph-Structured Visual Imitation

We cast visual imitation as a visual correspondence problem. Our robotic...

Generalizable task representation learning from human demonstration videos: a geometric approach

We study the problem of generalizable task learning from human demonstra...

One-shot Visual Imitation via Attributed Waypoints and Demonstration Augmentation

In this paper, we analyze the behavior of existing techniques and design...

Learning by Watching: Physical Imitation of Manipulation Skills from Human Videos

We present an approach for physical imitation from human videos for robo...

Imitation of Manipulation Skills Using Multiple Geometries

Daily manipulation tasks are characterized by regular characteristics as...

Please sign up or login with your details

Forgot password? Click here to reset