Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator

05/18/2022
by   Guangzhi Sun, et al.
0

Contextual knowledge is essential for reducing speech recognition errors on high-valued long-tail words. This paper proposes a novel tree-constrained pointer generator (TCPGen) component that enables end-to-end ASR models to bias towards a list of long-tail words obtained using external contextual information. With only a small overhead in memory use and computation cost, TCPGen can structure thousands of biasing words efficiently into a symbolic prefix-tree and creates a neural shortcut between the tree and the final ASR output to facilitate the recognition of the biasing words. To enhance TCPGen, we further propose a novel minimum biasing word error (MBWE) loss that directly optimises biasing word errors during training, along with a biasing-word-driven language model discounting (BLMD) method during the test. All contextual ASR systems were evaluated on the public Librispeech audiobook corpus and the data from the dialogue state tracking challenges (DSTC) with the biasing lists extracted from the dialogue-system ontology. Consistent word error rate (WER) reductions were achieved with TCPGen, which were particularly significant on the biasing words with around 40% relative reductions in the recognition error rates. MBWE and BLMD further improved the effectiveness of TCPGen and achieved more significant WER reductions on the biasing words. TCPGen also achieved zero-shot learning of words not in the audio training set with large WER reductions on the out-of-vocabulary words in the biasing list.

READ FULL TEXT
research
09/01/2021

Tree-constrained Pointer Generator for End-to-end Contextual Speech Recognition

Contextual knowledge is important for real-world automatic speech recogn...
research
05/30/2023

Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator

The incorporation of biasing words obtained through contextual knowledge...
research
01/17/2023

Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

It is difficult for an end-to-end (E2E) ASR system to recognize words su...
research
06/02/2023

Can Contextual Biasing Remain Effective with Whisper and GPT-2?

End-to-end automatic speech recognition (ASR) and large language models,...
research
07/02/2022

Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition

Incorporating biasing words obtained as contextual knowledge is critical...
research
12/30/2022

Memory Augmented Lookup Dictionary based Language Modeling for Automatic Speech Recognition

Recent studies have shown that using an external Language Model (LM) ben...
research
05/18/2020

Approaches to Improving Recognition of Underrepresented Named Entities in Hybrid ASR Systems

In this paper, we present a series of complementary approaches to improv...

Please sign up or login with your details

Forgot password? Click here to reset