Two Stage Contextual Word Filtering for Context bias in Unified Streaming and Non-streaming Transducer

01/17/2023
by   Zhanheng Yang, et al.
0

It is difficult for an end-to-end (E2E) ASR system to recognize words such as named entities appearing infrequently in the training data. A widely used method to mitigate this issue is feeding contextual information into the acoustic model. A contextual word list is necessary, which lists all possible contextual word candidates. Previous works have proven that the size and quality of the list are crucial. A compact and accurate list can boost the performance significantly. In this paper, we propose an efficient approach to obtain a high quality contextual word list for a unified streaming and non-streaming based Conformer-Transducer (C-T) model. Specifically, we make use of the phone-level streaming output to first filter the predefined contextual word list. During the subsequent non-streaming inference, the words in the filtered list are regarded as contextual information fused into non-casual encoder and decoder to generate the final recognition results. Our approach can take advantage of streaming recognition hypothesis, improve the accuracy of the contextual ASR system and speed up the inference process as well. Experiments on two datasets demonstrates over 20 (CERR) comparing to the baseline system. Meanwile, the RTF of our system can be stabilized within 0.15 when the size of the contextual word list grows over 6,000.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/13/2023

DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer

Conformer-based end-to-end models have become ubiquitous these days and ...
research
05/18/2022

Minimising Biasing Word Errors for Contextual ASR with the Tree-Constrained Pointer Generator

Contextual knowledge is essential for reducing speech recognition errors...
research
03/02/2022

Towards Contextual Spelling Correction for Customization of End-to-end Speech Recognition Systems

Contextual biasing is an important and challenging task for end-to-end a...
research
07/10/2020

Class LM and word mapping for contextual biasing in End-to-End ASR

In recent years, all-neural, end-to-end (E2E) ASR systems gained rapid i...
research
03/29/2022

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

Recently, we made available WeNet, a production-oriented end-to-end spee...
research
09/01/2023

Contextual Biasing of Named-Entities with Large Language Models

This paper studies contextual biasing with Large Language Models (LLMs),...
research
06/02/2023

Can Contextual Biasing Remain Effective with Whisper and GPT-2?

End-to-end automatic speech recognition (ASR) and large language models,...

Please sign up or login with your details

Forgot password? Click here to reset