Improving End-to-End Contextual Speech Recognition with Fine-Grained Contextual Knowledge Selection

01/30/2022
by   Minglun Han, et al.
0

Nowadays, most methods in end-to-end contextual speech recognition bias the recognition process towards contextual knowledge. Since all-neural contextual biasing methods rely on phrase-level contextual modeling and attention-based relevance modeling, they may encounter confusion between similar context-specific phrases, which hurts predictions at the token level. In this work, we focus on mitigating confusion problems with fine-grained contextual knowledge selection (FineCoS). In FineCoS, we introduce fine-grained knowledge to reduce the uncertainty of token predictions. Specifically, we first apply phrase selection to narrow the range of phrase candidates, and then conduct token attention on the tokens in the selected phrase candidates. Moreover, we re-normalize the attention weights of most relevant phrases in inference to obtain more focused phrase-level contextual representations, and inject position information to better discriminate phrases or tokens. On LibriSpeech and an in-house 160,000-hour dataset, we explore the proposed methods based on a controllable all-neural biasing method, collaborative decoding (ColDec). The proposed methods provide at most 6.1 LibriSpeech and 16.4 dataset over ColDec.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/18/2022

End-to-end contextual asr based on posterior distribution adaptation for hybrid ctc/attention system

End-to-end (E2E) speech recognition architectures assemble all component...
research
12/06/2021

Learning to Reason from General Concepts to Fine-grained Tokens for Discriminative Phrase Detection

Phrase detection requires methods to identify if a phrase is relevant to...
research
06/06/2021

Attend and Select: A Segment Attention based Selection Mechanism for Microblog Hashtag Generation

Automatic microblog hashtag generation can help us better and faster und...
research
07/08/2020

Research on multi-dimensional end-to-end phrase recognition algorithm based on background knowledge

At present, the deep end-to-end method based on supervised learning is u...
research
04/18/2023

Approximate Nearest Neighbour Phrase Mining for Contextual Speech Recognition

This paper presents an extension to train end-to-end Context-Aware Trans...
research
05/09/2023

Robust Acoustic and Semantic Contextual Biasing in Neural Transducers for Speech Recognition

Attention-based contextual biasing approaches have shown significant imp...
research
10/28/2022

Assessing Phrase Break of ESL speech with Pre-trained Language Models

This work introduces an approach to assessing phrase break in ESL learne...

Please sign up or login with your details

Forgot password? Click here to reset