That Sounds Right: Auditory Self-Supervision for Dynamic Robot Manipulation

10/03/2022
by   Abitha Thankaraj, et al.
0

Learning to produce contact-rich, dynamic behaviors from raw sensory data has been a longstanding challenge in robotics. Prominent approaches primarily focus on using visual or tactile sensing, where unfortunately one fails to capture high-frequency interaction, while the other can be too delicate for large-scale data collection. In this work, we propose a data-centric approach to dynamic manipulation that uses an often ignored source of information: sound. We first collect a dataset of 25k interaction-sound pairs across five dynamic tasks using commodity contact microphones. Then, given this data, we leverage self-supervised learning to accelerate behavior prediction from sound. Our experiments indicate that this self-supervised 'pretraining' is crucial to achieving high performance, with a 34.5 learning and a 54.3 when asked to generate desired sound profiles, online rollouts of our models on a UR10 robot can produce dynamic behavior that achieves an average of 11.5 improvement over supervised learning on audio similarity metrics.

READ FULL TEXT

page 1

page 3

page 4

page 5

page 7

research
06/11/2020

Telling Left from Right: Learning Spatial Correspondence between Sight and Sound

Self-supervised audio-visual learning aims to capture useful representat...
research
06/11/2020

Telling Left from Right: Learning Spatial Correspondence of Sight and Sound

Self-supervised audio-visual learning aims to capture useful representat...
research
07/28/2020

Self-supervised Neural Audio-Visual Sound Source Localization via Probabilistic Spatial Modeling

Detecting sound source objects within visual observation is important fo...
research
03/21/2023

Dexterity from Touch: Self-Supervised Pre-Training of Tactile Representations with Robotic Play

Teaching dexterity to multi-fingered robots has been a longstanding chal...
research
10/24/2018

Making Sense of Vision and Touch: Self-Supervised Learning of Multimodal Representations for Contact-Rich Tasks

Contact-rich manipulation tasks in unstructured environments often requi...
research
01/23/2023

Learning Rewards and Skills to Follow Commands with A Data Efficient Visual-Audio Representation

Based on the recent advancements in representation learning, we propose ...
research
08/04/2017

CASSL: Curriculum Accelerated Self-Supervised Learning

Recent self-supervised learning approaches focus on using a few thousand...

Please sign up or login with your details

Forgot password? Click here to reset