Anchored Speech Recognition with Neural Transducers

10/20/2022
by   Desh Raj, et al.
0

Neural transducers have gained popularity in production ASR systems, achieving human level recognition accuracy on standard benchmark datasets. However, their performance significantly degrades in the presence of crosstalks, especially when the background speech/noise is non-negligible as compared to the primary speech (i.e. low signal-to-noise ratio). Anchored speech recognition refers to a class of methods that use information from an anchor segment (e.g., wake-words) to recognize device-directed speech while ignoring interfering background speech/noise. In this paper, we investigate anchored speech recognition in the context of neural transducers. We use a tiny auxiliary network to extract context information from the anchor segment, and explore encoder biasing and joiner gating to guide the transducer towards the target speech. Moreover, to improve the robustness of context embedding extraction, we propose auxiliary training objectives to disentagle lexical content from speaking style. Our proposed methods are evaluated on synthetic LibriSpeech-based mixtures, where they improve word error rates by up to 36 compared to a background augmentation baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/06/2019

End-to-end Anchored Speech Recognition

Voice-controlled house-hold devices, like Amazon Echo or Google Home, fa...
research
06/02/2021

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition

Although recent advances in deep learning technology improved automatic ...
research
05/19/2022

Content-Context Factorized Representations for Automated Speech Recognition

Deep neural networks have largely demonstrated their ability to perform ...
research
09/14/2016

An Adaptive Psychoacoustic Model for Automatic Speech Recognition

Compared with automatic speech recognition (ASR), the human auditory sys...
research
02/24/2021

Thoughts on the potential to compensate a hearing loss in noise

The effect of hearing impairment on speech perception was described by P...
research
01/13/2022

The Effectiveness of Time Stretching for Enhancing Dysarthric Speech for Improved Dysarthric Speech Recognition

In this paper, we investigate several existing and a new state-of-the-ar...
research
06/08/2020

Learning to Count Words in Fluent Speech enables Online Speech Recognition

Sequence to Sequence models, in particular the Transformer, achieve stat...

Please sign up or login with your details

Forgot password? Click here to reset