Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model

09/02/2022
by   Jennifer Drexler Fox, et al.
0

Contextual ASR, which takes a list of bias terms as input along with audio, has drawn recent interest as ASR use becomes more widespread. We are releasing contextual biasing lists to accompany the Earnings21 dataset, creating a public benchmark for this task. We present baseline results on this benchmark using a pretrained end-to-end ASR model from the WeNet toolkit. We show results for shallow fusion contextual biasing applied to two different decoding algorithms. Our baseline results confirm observations that end-to-end models struggle in particular with words that are rarely or never seen during training, and that existing shallow fusion techniques do not adequately address this problem. We propose an alternate spelling prediction model that improves recall of rare words by 34.7 compared to contextual biasing without alternate spellings. This model is conceptually similar to ones used in prior work, but is simpler to implement as it does not rely on either a pronunciation dictionary or an existing text-to-speech system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/26/2022

Contextual Adapters for Personalized Speech Recognition in Neural Transducers

Personal rare word recognition in end-to-end Automatic Speech Recognitio...
research
04/05/2021

Contextualized Streaming End-to-End Speech Recognition with Trie-Based Deep Biasing and Shallow Fusion

How to leverage dynamic contextual information in end-to-end speech reco...
research
11/05/2021

Context-Aware Transformer Transducer for Speech Recognition

End-to-end (E2E) automatic speech recognition (ASR) systems often have d...
research
03/20/2023

On-the-fly Text Retrieval for End-to-End ASR Adaptation

End-to-end speech recognition models are improved by incorporating exter...
research
04/15/2022

Improving Rare Word Recognition with LM-aware MWER Training

Language models (LMs) significantly improve the recognition accuracy of ...
research
07/02/2022

Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition

Incorporating biasing words obtained as contextual knowledge is critical...
research
05/30/2023

Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator

The incorporation of biasing words obtained through contextual knowledge...

Please sign up or login with your details

Forgot password? Click here to reset