A Deliberation-based Joint Acoustic and Text Decoder

03/23/2023
by   Sepand Mavandadi, et al.
0

We propose a new two-pass E2E speech recognition model that improves ASR performance by training on a combination of paired data and unpaired text data. Previously, the joint acoustic and text decoder (JATD) has shown promising results through the use of text data during model training and the recently introduced deliberation architecture has reduced recognition errors by leveraging first-pass decoding results. Our method, dubbed Deliberation-JATD, combines the spelling correcting abilities of deliberation with JATD's use of unpaired text data to further improve performance. The proposed model produces substantial gains across multiple test sets, especially those focused on rare words, where it reduces word error rate (WER) by between 12 relative. This is done without increasing model size or requiring multi-stage training, making Deliberation-JATD an efficient candidate for on-device applications.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2022

Joint Audio/Text Training for Transformer Rescorer of Streaming Speech Recognition

Recently, there has been an increasing interest in two-pass streaming en...
research
10/22/2019

G2G: TTS-Driven Pronunciation Learning for Graphemic Hybrid ASR

Grapheme-based acoustic modeling has recently been shown to outperform p...
research
10/11/2022

Scaling Up Deliberation for Multilingual ASR

Multilingual end-to-end automatic speech recognition models are attracti...
research
06/20/2023

Multi-pass Training and Cross-information Fusion for Low-resource End-to-end Accented Speech Recognition

Low-resource accented speech recognition is one of the important challen...
research
07/27/2018

A Comparison of Techniques for Language Model Integration in Encoder-Decoder Speech Recognition

Attention-based recurrent neural encoder-decoder models present an elega...
research
11/05/2020

Exploring End-to-End Multi-channel ASR with Bias Information for Meeting Transcription

Joint optimization of multi-channel front-end and automatic speech recog...
research
01/26/2020

Multi-task Learning for Voice Trigger Detection

We describe the design of a voice trigger detection system for smart spe...

Please sign up or login with your details

Forgot password? Click here to reset