Improved Factorized Neural Transducer Model For text-only Domain Adaptation

09/18/2023
by   Junzhe Liu, et al.
0

End-to-end models, such as the neural Transducer, have been successful in integrating acoustic and linguistic information jointly to achieve excellent recognition performance. However, adapting these models with text-only data is challenging. Factorized neural Transducer (FNT) aims to address this issue by introducing a separate vocabulary decoder to predict the vocabulary, which can effectively perform traditional text data adaptation. Nonetheless, this approach has limitations in fusing acoustic and language information seamlessly. Moreover, a degradation in word error rate (WER) on the general test sets was also observed, leading to doubts about its overall performance. In response to this challenge, we present an improved factorized neural Transducer (IFNT) model structure designed to comprehensively integrate acoustic and language information while enabling effective text adaptation. We evaluate the performance of our proposed methods through in-domain experiments on GigaSpeech and out-of-domain experiments adapting to EuroParl, TED-LIUM, and Medical datasets. After text-only adaptation, IFNT yields 7.9 relative WER improvements over the standard neural Transducer with shallow fusion, and relative WER reductions ranging from 1.6 sets compared to the FNT model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/14/2023

Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

Attention-based encoder-decoder (AED) speech recognition model has been ...
research
12/05/2022

Fast and accurate factorized neural transducer for text adaption of end-to-end speech recognition models

Neural transducer is now the most popular end-to-end model for speech re...
research
08/25/2023

Decoupled Structure for Improved Adaptability of End-to-End Models

Although end-to-end (E2E) trainable automatic speech recognition (ASR) h...
research
10/31/2022

Modular Hybrid Autoregressive Transducer

Text-only adaptation of a transducer model remains challenging for end-t...
research
11/02/2022

Internal Language Model Estimation based Adaptive Language Model Fusion for Domain Adaptation

ASR model deployment environment is ever-changing, and the incoming spee...
research
03/31/2016

Differentiable Pooling for Unsupervised Acoustic Model Adaptation

We present a deep neural network (DNN) acoustic model that includes para...
research
06/27/2019

Lattice-Based Unsupervised Test-Time Adaptation of Neural Network Acoustic Models

Acoustic model adaptation to unseen test recordings aims to reduce the m...

Please sign up or login with your details

Forgot password? Click here to reset