Robot Sound Interpretation: Combining Sight and Sound in Learning-Based Control

by   Peixin Chang, et al.

We explore the interpretation of sound for robot decision-making, inspired by human speech comprehension. While previous methods use natural language processing to translate sound to text, we propose an end-to-end deep neural network which directly learns control polices from images and sound signals. The network is trained using reinforcement learning with auxiliary losses on the sight and sound network branches. We demonstrate our approach on two robots, a TurtleBot3 and a Kuka-IIWA arm, which hear a command word, identify the associated target object, and perform precise control to reach the target. For both systems, we perform ablation studies in simulation to show the effectiveness of our network empirically. We also successfully transfer the policy learned in simulator to a real-world TurtleBot3, which effectively understands word commands, searches for the object, and moves toward that location with more intuitive motion than a traditional motion planner with perfect information.



There are no comments yet.


page 1

page 5

page 6


Robot Sound Interpretation: Learning Visual-Audio Representations for Voice-Controlled Robots

Inspired by sensorimotor theory, we propose a novel pipeline for voice-c...

Zero-shot Sim-to-Real Transfer with Modular Priors

Current end-to-end Reinforcement Learning (RL) approaches are severely l...

Learning a Decentralized Multi-arm Motion Planner

We present a closed-loop multi-arm motion planner that is scalable and f...

Decentralized Structural-RNN for Robot Crowd Navigation with Deep Reinforcement Learning

Safe and efficient navigation through human crowds is an essential capab...

Sim2Real for Peg-Hole Insertion with Eye-in-Hand Camera

Even though the peg-hole insertion is one of the well-studied problems i...

Human-like general language processing

Using language makes human beings surpass animals in wisdom. To let mach...

A Dual-Attention Neural Network for Pun Location and Using Pun-Gloss Pairs for Interpretation

Pun location is to identify the punning word (usually a word or a phrase...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.