On-the-fly Text Retrieval for End-to-End ASR Adaptation

03/20/2023
by   Bolaji Yusuf, et al.
0

End-to-end speech recognition models are improved by incorporating external text sources, typically by fusion with an external language model. Such language models have to be retrained whenever the corpus of interest changes. Furthermore, since they store the entire corpus in their parameters, rare words can be challenging to recall. In this work, we propose augmenting a transducer-based ASR model with a retrieval language model, which directly retrieves from an external text corpus plausible completions for a partial ASR hypothesis. These completions are then integrated into subsequent predictions by an adapter, which is trained once, so that the corpus of interest can be switched without incurring the computational overhead of retraining. Our experiments show that the proposed model significantly improves the performance of a transducer baseline on a pair of question-answering datasets. Further, it outperforms shallow fusion on recognition of named entities by about 7 relative; when the two are combined, the relative improvement increases to 13

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/19/2019

A spelling correction model for end-to-end speech recognition

Attention-based sequence-to-sequence models for speech recognition joint...
research
06/29/2022

Contextual Density Ratio for Language Model Biasing of Sequence to Sequence ASR Systems

End-2-end (E2E) models have become increasingly popular in some ASR task...
research
04/22/2021

Fast Text-Only Domain Adaptation of RNN-Transducer Prediction Network

Adaption of end-to-end speech recognition systems to new tasks is known ...
research
10/10/2016

End-to-end Concept Word Detection for Video Captioning, Retrieval, and Question Answering

We propose a high-level concept word detector that can be integrated wit...
research
02/12/2022

USTED: Improving ASR with a Unified Speech and Text Encoder-Decoder

Improving end-to-end speech recognition by incorporating external text d...
research
04/20/2022

Detecting Unintended Memorization in Language-Model-Fused ASR

End-to-end (E2E) models are often being accompanied by language models (...
research
09/02/2022

Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model

Contextual ASR, which takes a list of bias terms as input along with aud...

Please sign up or login with your details

Forgot password? Click here to reset