Contextual Adapters for Personalized Speech Recognition in Neural Transducers

Personal rare word recognition in end-to-end Automatic Speech Recognition (E2E ASR) models is a challenge due to the lack of training data. A standard way to address this issue is with shallow fusion methods at inference time. However, due to their dependence on external language models and the deterministic approach to weight boosting, their performance is limited. In this paper, we propose training neural contextual adapters for personalization in neural transducer based ASR models. Our approach can not only bias towards user-defined words, but also has the flexibility to work with pretrained ASR models. Using an in-house dataset, we demonstrate that contextual adapters can be applied to any general purpose pretrained ASR model to improve personalization. Our method outperforms shallow fusion, while retaining functionality of the pretrained models by not altering any of the model weights. We further show that the adapter style training is superior to full-fine-tuning of the ASR models on datasets with user-defined content.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2018

Deep context: end-to-end contextual speech recognition

In automatic speech recognition (ASR) what a user says depends on the pa...
research
09/02/2022

Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model

Contextual ASR, which takes a list of bias terms as input along with aud...
research
10/05/2021

Fast Contextual Adaptation with Neural Associative Memory for On-Device Personalized Speech Recognition

Fast contextual adaptation has shown to be effective in improving Automa...
research
06/27/2018

Unsupervised and Efficient Vocabulary Expansion for Recurrent Neural Network Language Models in ASR

In automatic speech recognition (ASR) systems, recurrent neural network ...
research
02/15/2021

Personalization Strategies for End-to-End Speech Recognition Systems

The recognition of personalized content, such as contact names, remains ...
research
04/20/2022

Detecting Unintended Memorization in Language-Model-Fused ASR

End-to-end (E2E) models are often being accompanied by language models (...
research
06/04/2023

SpellMapper: A non-autoregressive neural spellchecker for ASR customization with candidate retrieval based on n-gram mappings

Contextual spelling correction models are an alternative to shallow fusi...

Please sign up or login with your details

Forgot password? Click here to reset