Robot Sound Interpretation: Combining Sight and Sound in Learning-Based Control

09/19/2019
by   Peixin Chang, et al.
10

We explore the interpretation of sound for robot decision-making, inspired by human speech comprehension. While previous methods use natural language processing to translate sound to text, we propose an end-to-end deep neural network which directly learns control polices from images and sound signals. The network is trained using reinforcement learning with auxiliary losses on the sight and sound network branches. We demonstrate our approach on two robots, a TurtleBot3 and a Kuka-IIWA arm, which hear a command word, identify the associated target object, and perform precise control to reach the target. For both systems, we perform ablation studies in simulation to show the effectiveness of our network empirically. We also successfully transfer the policy learned in simulator to a real-world TurtleBot3, which effectively understands word commands, searches for the object, and moves toward that location with more intuitive motion than a traditional motion planner with perfect information.

READ FULL TEXT

page 1

page 5

page 6

research
09/07/2021

Robot Sound Interpretation: Learning Visual-Audio Representations for Voice-Controlled Robots

Inspired by sensorimotor theory, we propose a novel pipeline for voice-c...
research
09/20/2018

Zero-shot Sim-to-Real Transfer with Modular Priors

Current end-to-end Reinforcement Learning (RL) approaches are severely l...
research
11/05/2020

Learning a Decentralized Multi-arm Motion Planner

We present a closed-loop multi-arm motion planner that is scalable and f...
research
02/09/2023

Robot Synesthesia: A Sound and Emotion Guided AI Painter

If a picture paints a thousand words, sound may voice a million. While r...
research
06/18/2023

Language-Guided Generation of Physically Realistic Robot Motion and Control

We aim to control a robot to physically behave in the real world followi...
research
07/22/2021

Controlling the Perceived Sound Quality for Dialogue Enhancement with Deep Learning

Speech enhancement attenuates interfering sounds in speech signals but m...

Please sign up or login with your details

Forgot password? Click here to reset