Learning Rewards and Skills to Follow Commands with A Data Efficient Visual-Audio Representation

01/23/2023
by   Peixin Chang, et al.
0

Based on the recent advancements in representation learning, we propose a novel framework for command-following robots with raw sensor inputs. Previous RL-based methods are either difficult to continuously improve after the deployment or require a large number of new labels during the fine-tuning. Motivated by (self-)supervised contrastive learning literature, we propose a novel representation, named VAR++, that generates an intrinsic reward function for command-following robot tasks by associating images with sound commands. After the robot is deployed in a new domain, the representation can be updated intuitively and data-efficiently by non-experts, and the robot is able to fulfill sound commands without any hand-crafted reward functions. We demonstrate our approach on various sound types and robotic tasks, including navigation and manipulation with raw sensor inputs. In the simulated experiments, we show that our system can continually self-improve in previously unseen scenarios given fewer new labeled data, yet achieves better performance, compared with previous methods.

READ FULL TEXT

page 2

page 14

page 15

page 16

page 17

research
09/07/2021

Robot Sound Interpretation: Learning Visual-Audio Representations for Voice-Controlled Robots

Inspired by sensorimotor theory, we propose a novel pipeline for voice-c...
research
10/06/2018

Robustness via Retrying: Closed-Loop Robotic Manipulation with Self-Supervised Learning

Prediction is an appealing objective for self-supervised learning of beh...
research
11/17/2020

Can Semantic Labels Assist Self-Supervised Visual Representation Learning?

Recently, contrastive learning has largely advanced the progress of unsu...
research
05/19/2022

Image Augmentation Based Momentum Memory Intrinsic Reward for Sparse Reward Visual Scenes

Many scenes in real life can be abstracted to the sparse reward visual s...
research
10/03/2022

That Sounds Right: Auditory Self-Supervision for Dynamic Robot Manipulation

Learning to produce contact-rich, dynamic behaviors from raw sensory dat...
research
02/28/2022

A Mutually Reinforced Framework for Pretrained Sentence Embeddings

The lack of labeled data is a major obstacle to learning high-quality se...
research
05/04/2021

Self-Improving Semantic Perception on a Construction Robot

We propose a novel robotic system that can improve its semantic percepti...

Please sign up or login with your details

Forgot password? Click here to reset