Tree-constrained Pointer Generator with Graph Neural Network Encodings for Contextual Speech Recognition

07/02/2022
by   Guangzhi Sun, et al.
0

Incorporating biasing words obtained as contextual knowledge is critical for many automatic speech recognition (ASR) applications. This paper proposes the use of graph neural network (GNN) encodings in a tree-constrained pointer generator (TCPGen) component for end-to-end contextual ASR. By encoding the biasing words in the prefix-tree with a tree-based GNN, lookahead for future wordpieces in end-to-end ASR decoding is achieved at each tree node by incorporating information about all wordpieces on the tree branches rooted from it, which allows a more accurate prediction of the generation probability of the biasing words. Systems were evaluated on the Librispeech corpus using simulated biasing tasks, and on the AMI corpus by proposing a novel visual-grounded contextual ASR pipeline that extracts biasing words from slides alongside each meeting. Results showed that TCPGen with GNN encodings achieved about a further 15 original TCPGen, with a negligible increase in the computation cost for decoding.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2023

Graph Neural Networks for Contextual ASR with the Tree-Constrained Pointer Generator

The incorporation of biasing words obtained through contextual knowledge...
research
06/02/2023

Can Contextual Biasing Remain Effective with Whisper and GPT-2?

End-to-end automatic speech recognition (ASR) and large language models,...
research
09/01/2021

Tree-constrained Pointer Generator for End-to-end Contextual Speech Recognition

Contextual knowledge is important for real-world automatic speech recogn...
research
05/18/2022

Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator

Contextual knowledge is essential for reducing speech recognition errors...
research
09/02/2022

Improving Contextual Recognition of Rare Words with an Alternate Spelling Prediction Model

Contextual ASR, which takes a list of bias terms as input along with aud...
research
01/25/2020

Lattice-based Improvements for Voice Triggering Using Graph Neural Networks

Voice-triggered smart assistants often rely on detection of a trigger-ph...
research
10/06/2021

NUS-IDS at FinCausal 2021: Dependency Tree in Graph Neural Network for Better Cause-Effect Span Detection

Automatic identification of cause-effect spans in financial documents is...

Please sign up or login with your details

Forgot password? Click here to reset