Multitask Training with Text Data for End-to-End Speech Recognition

10/27/2020
by   Peidong Wang, et al.
0

We propose a multitask training method for attention-based end-to-end speech recognition models to better incorporate language level information. We regularize the decoder in a sequence-to-sequence architecture by multitask training it on both the speech recognition task and a next-token prediction language modeling task. Trained on either the 100 hour subset of LibriSpeech or the full 960 hour dataset, the proposed method leads to an 11 performance improvement over the baseline and is comparable to language model shallow fusion, without requiring an additional neural network during decoding. Analyses of sample output sentences and the word error rate on rare words demonstrate that the proposed method can incorporate language level information effectively.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/14/2022

LeVoice ASR Systems for the ISCSLP 2022 Intelligent Cockpit Speech Recognition Challenge

This paper describes LeVoice automatic speech recognition systems to tra...
research
01/28/2022

Neural-FST Class Language Model for End-to-End Speech Recognition

We propose Neural-FST Class Language Model (NFCLM) for end-to-end speech...
research
09/19/2023

End-to-End Speech Recognition Contextualization with Large Language Models

In recent years, Large Language Models (LLMs) have garnered significant ...
research
05/21/2020

Worse WER, but Better BLEU? Leveraging Word Embedding as Intermediate in Multitask End-to-End Speech Translation

Speech translation (ST) aims to learn transformations from speech in the...
research
05/26/2021

Multitask Learning for Grapheme-to-Phoneme Conversion of Anglicisms in German Speech Recognition

Loanwords, such as Anglicisms, are a challenge in German speech recognit...
research
07/02/2019

Attention model for articulatory features detection

Articulatory distinctive features, as well as phonetic transcription, pl...
research
03/30/2017

Simplified End-to-End MMI Training and Voting for ASR

A simplified speech recognition system that uses the maximum mutual info...

Please sign up or login with your details

Forgot password? Click here to reset