Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

12/05/2022
by   Rui Zhao, et al.
0

Neural transducer is now the most popular end-to-end model for speech recognition, due to its naturally streaming ability. However, it is challenging to adapt it with text-only data. Factorized neural transducer (FNT) model was proposed to mitigate this problem. The improved adaptation ability of FNT on text-only adaptation data came at the cost of lowered accuracy compared to the standard neural transducer model. We propose several methods to improve the performance of the FNT model. They are: adding CTC criterion during training, adding KL divergence loss during adaptation, using a pre-trained language model to seed the vocabulary predictor, and an efficient adaptation approach by interpolating the vocabulary predictor with the n-gram language model. A combination of these approaches results in a relative word-error-rate reduction of 9.48% from the standard FNT model. Furthermore, n-gram interpolation with the vocabulary predictor improves the adaptation speed hugely with satisfactory adaptation performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/27/2021

Factorized Neural Transducer for Efficient Language Model Adaptation

In recent years, end-to-end (E2E) based automatic speech recognition (AS...
research
05/17/2019

End-to-end Adaptation with Backpropagation through WFST for On-device Speech Recognition System

An on-device DNN-HMM speech recognition system efficiently works with a ...
research
08/05/2020

Efficient MDI Adaptation for n-gram Language Models

This paper presents an efficient algorithm for n-gram language model ada...
research
09/19/2019

A Comparison of Hybrid and End-to-End Models for Syllable Recognition

This paper presents a comparison of a traditional hybrid speech recognit...
research
11/17/2022

LongFNT: Long-form Speech Recognition with Factorized Neural Transducer

Traditional automatic speech recognition (ASR) systems usually focus on ...
research
09/18/2023

Improved Factorized Neural Transducer Model For text-only Domain Adaptation

End-to-end models, such as the neural Transducer, have been successful i...
research
12/11/2018

Scalable language model adaptation for spoken dialogue systems

Language models (LM) for interactive speech recognition systems are trai...

Please sign up or login with your details

Forgot password? Click here to reset