Improved Robust ASR for Social Robots in Public Spaces

01/14/2020
by   Charles Jankowski, et al.
0

Social robots deployed in public spaces present a challenging task for ASR because of a variety of factors, including noise SNR of 20 to 5 dB. Existing ASR models perform well for higher SNRs in this range, but degrade considerably with more noise. This work explores methods for providing improved ASR performance in such conditions. We use the AiShell-1 Chinese speech corpus and the Kaldi ASR toolkit for evaluations. We were able to exceed state-of-the-art ASR performance with SNR lower than 20 dB, demonstrating the feasibility of achieving relatively high performing ASR with open-source toolkits and hundreds of hours of training data, which is commonly available.

READ FULL TEXT

page 1

page 2

research
10/24/2019

ESPnet-TTS: Unified, Reproducible, and Integratable Open Source End-to-End Text-to-Speech Toolkit

This paper introduces a new end-to-end text-to-speech (E2E-TTS) toolkit ...
research
06/20/2022

The Makerere Radio Speech Corpus: A Luganda Radio Corpus for Automatic Speech Recognition

Building a usable radio monitoring automatic speech recognition (ASR) sy...
research
05/12/2020

Automatic Estimation of Inteligibility Measure for Consonants in Speech

In this article, we provide a model to estimate a real-valued measure of...
research
05/27/2020

CAT: A CTC-CRF based ASR Toolkit Bridging the Hybrid and the End-to-end Approaches towards Data Efficiency and Low Latency

In this paper, we present a new open source toolkit for speech recogniti...
research
03/30/2021

MediaSpeech: Multilanguage ASR Benchmark and Dataset

The performance of automated speech recognition (ASR) systems is well kn...
research
08/03/2021

The Performance Evaluation of Attention-Based Neural ASR under Mixed Speech Input

In order to evaluate the performance of the attention based neural ASR u...

Please sign up or login with your details

Forgot password? Click here to reset