End-to-end contextual speech recognition using class language models and a token passing decoder

12/05/2018
by   Zhehuai Chen, et al.
0

End-to-end modeling (E2E) of automatic speech recognition (ASR) blends all the components of a traditional speech recognition system into a unified model. Although it simplifies training and decoding pipelines, the unified model is hard to adapt when mismatch exists between training and test data. In this work, we focus on contextual speech recognition, which is particularly challenging for E2E models because it introduces significant mismatch between training and test data. To improve the performance in the presence of complex contextual information, we propose to use class-based language models(CLM) that can populate the classes with contextdependent information in real-time. To enable this approach to scale to a large number of class members and minimize search errors, we propose a token passing decoder with efficient token recombination for E2E systems for the first time. We evaluate the proposed system on general and contextual ASR, and achieve relative 62 Rate(WER) reduction for contextual ASR without hurting performance for general ASR. We show that the proposed method performs well without modification of the decoding hyper-parameters across tasks, making it a general solution for E2E ASR.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2022

Korean Tokenization for Beam Search Rescoring in Speech Recognition

The performance of automatic speech recognition (ASR) models can be grea...
research
09/01/2023

Contextual Biasing of Named-Entities with Large Language Models

This paper studies contextual biasing with Large Language Models (LLMs),...
research
12/17/2020

CIF-based Collaborative Decoding for End-to-end Contextual Speech Recognition

End-to-end (E2E) models have achieved promising results on multiple spee...
research
05/25/2022

Improving CTC-based ASR Models with Gated Interlayer Collaboration

For Automatic Speech Recognition (ASR), the CTC-based methods have becom...
research
03/29/2022

WeNet 2.0: More Productive End-to-End Speech Recognition Toolkit

Recently, we made available WeNet, a production-oriented end-to-end spee...
research
06/26/2018

Contextual Language Model Adaptation for Conversational Agents

Statistical language models (LM) play a key role in Automatic Speech Rec...
research
06/23/2023

Implementing contextual biasing in GPU decoder for online ASR

GPU decoding significantly accelerates the output of ASR predictions. Wh...

Please sign up or login with your details

Forgot password? Click here to reset