Enhanced Robot Speech Recognition Using Biomimetic Binaural Sound Source Localization

02/13/2019
by   Jorge, et al.
0

Inspired by the behavior of humans talking in noisy environments, we propose an embodied embedded cognition approach to improve automatic speech recognition (ASR) systems for robots in challenging environments, such as with ego noise, using binaural sound source localization (SSL). The approach is verified by measuring the impact of SSL with a humanoid robot head on the performance of an ASR system. More specifically, a robot orients itself toward the angle where the signal-to-noise ratio (SNR) of speech is maximized for one microphone before doing an ASR task. First, a spiking neural network inspired by the midbrain auditory system based on our previous work is applied to calculate the sound signal angle. Then, a feedforward neural network is used to handle high levels of ego noise and reverberation in the signal. Finally, the sound signal is fed into an ASR system. For ASR, we use a system developed by our group and compare its performance with and without the support from SSL. We test our SSL and ASR systems on two humanoid platforms with different structural and material properties. With our approach we halve the sentence error rate with respect to the common downmixing of both channels. Surprisingly, the ASR performance is more than two times better when the angle between the humanoid head and the sound source allows sound waves to be reflected most intensely from the pinna to the ear microphone, rather than when sound waves arrive perpendicularly to the membrane.

READ FULL TEXT

page 1

page 7

page 13

research
11/01/2021

SNRi Target Training for Joint Speech Enhancement and Recognition

This study aims to improve the performance of automatic speech recogniti...
research
05/12/2020

Automatic Estimation of Inteligibility Measure for Consonants in Speech

In this article, we provide a model to estimate a real-valued measure of...
research
12/21/2017

Indoor Sound Source Localization with Probabilistic Neural Network

It is known that adverse environments such as high reverberation and low...
research
06/02/2021

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition

Although recent advances in deep learning technology improved automatic ...
research
11/12/2020

Self-supervised reinforcement learning for speaker localisation with the iCub humanoid robot

In the future robots will interact more and more with humans and will ha...
research
10/26/2022

There is more than one kind of robustness: Fooling Whisper with adversarial examples

Whisper is a recent Automatic Speech Recognition (ASR) model displaying ...

Please sign up or login with your details

Forgot password? Click here to reset