Accented Speech Recognition Inspired by Human Perception

04/09/2021
by   Xiangyun Chu, et al.
0

While improvements have been made in automatic speech recognition performance over the last several years, machines continue to have significantly lower performance on accented speech than humans. In addition, the most significant improvements on accented speech primarily arise by overwhelming the problem with hundreds or even thousands of hours of data. Humans typically require much less data to adapt to a new accent. This paper explores methods that are inspired by human perception to evaluate possible performance improvements for recognition of accented speech, with a specific focus on recognizing speech with a novel accent relative to that of the training data. Our experiments are run on small, accessible datasets that are available to the research community. We explore four methodologies: pre-exposure to multiple accents, grapheme and phoneme-based pronunciations, dropout (to improve generalization to a novel accent), and the identification of the layers in the neural network that can specifically be associated with accent modeling. Our results indicate that methods based on human perception are promising in reducing WER and understanding how accented speech is modeled in neural networks for novel accents.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/30/2019

English Broadcast News Speech Recognition by Humans and Machines

With recent advances in deep learning, considerable attention has been g...
research
11/28/2017

Exploiting Nontrivial Connectivity for Automatic Speech Recognition

Nontrivial connectivity has allowed the training of very deep networks b...
research
04/06/2022

Successes and critical failures of neural networks in capturing human-like speech recognition

Natural and artificial audition can in principle evolve different soluti...
research
12/13/2016

Evaluating Automatic Speech Recognition Systems in Comparison With Human Perception Results Using Distinctive Feature Measures

This paper describes methods for evaluating automatic speech recognition...
research
06/11/2021

Improving RNN-T ASR Performance with Date-Time and Location Awareness

In this paper, we explore the benefits of incorporating context into a R...
research
06/12/2020

"Notic My Speech" – Blending Speech Patterns With Multimedia

Speech as a natural signal is composed of three parts - visemes (visual ...
research
03/27/2020

Can you hear me now? Sensitive comparisons of human and machine perception

The rise of sophisticated machine-recognition systems has brought with i...

Please sign up or login with your details

Forgot password? Click here to reset