Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator

05/30/2023
by   Guangzhi Sun, et al.
0

The incorporation of biasing words obtained through contextual knowledge is of paramount importance in automatic speech recognition (ASR) applications. This paper proposes an innovative method for achieving end-to-end contextual ASR using graph neural network (GNN) encodings based on the tree-constrained pointer generator method. GNN node encodings facilitate lookahead for future word pieces in the process of ASR decoding at each tree node by incorporating information about all word pieces on the tree branches rooted from it. This results in a more precise prediction of the generation probability of the biasing words. The study explores three GNN encoding techniques, namely tree recursive neural networks, graph convolutional network (GCN), and GraphSAGE, along with different combinations of the complementary GCN and GraphSAGE structures. The performance of the systems was evaluated using the Librispeech and AMI corpus, following the visual-grounded contextual ASR pipeline. The findings indicate that using GNN encodings achieved consistent and significant reductions in word error rate (WER), particularly for words that are rare or have not been seen during the training process. Notably, the most effective combination of GNN encodings obtained more than 60 unseen words compared to standard end-to-end systems.

READ FULL TEXT
research
07/02/2022

Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition

Incorporating biasing words obtained as contextual knowledge is critical...
research
09/01/2021

Tree-constrained Pointer Generator for End-to-end Contextual Speech Recognition

Contextual knowledge is important for real-world automatic speech recogn...
research
05/18/2022

Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator

Contextual knowledge is essential for reducing speech recognition errors...
research
06/02/2023

Can Contextual Biasing Remain Effective with Whisper and GPT-2?

End-to-end automatic speech recognition (ASR) and large language models,...
research
09/02/2022

Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model

Contextual ASR, which takes a list of bias terms as input along with aud...
research
05/16/2023

Enhancing Keyphrase Extraction from Long Scientific Documents using Graph Embeddings

In this study, we investigate using graph neural network (GNN) represent...

Please sign up or login with your details

Forgot password? Click here to reset