Correction of Automatic Speech Recognition with Transformer Sequence-to-sequence Model

10/23/2019
by   Oleksii Hrinchuk, et al.
0

In this work, we introduce a simple yet efficient post-processing model for automatic speech recognition (ASR). Our model has Transformer-based encoder-decoder architecture which "translates" ASR model output into grammatically and semantically correct text. We investigate different strategies for regularizing and optimizing the model and show that extensive data augmentation and the initialization with pre-trained weights are required to achieve good performance. On the LibriSpeech benchmark, our method demonstrates significant improvement in word error rate over the baseline acoustic model with greedy decoding, especially on much noisier dev-other and test-other portions of the evaluation dataset. Our model also outperforms baseline with 6-gram language model re-scoring and approaches the performance of re-scoring with Transformer-XL neural language model.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/22/2021

Generating Human Readable Transcript for Automatic Speech Recognition with Pre-trained Language Model

Modern Automatic Speech Recognition (ASR) systems can achieve high perfo...
research
03/26/2021

BART based semantic correction for Mandarin automatic speech recognition system

Although automatic speech recognition (ASR) systems achieved significant...
research
10/25/2022

Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition

The recent emergence of joint CTC-Attention model shows significant impr...
research
07/04/2021

Cross-Modal Transformer-Based Neural Correction Models for Automatic Speech Recognition

We propose a cross-modal transformer-based neural correction models that...
research
12/10/2021

Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition

The sparsely-gated Mixture of Experts (MoE) can magnify a network capaci...
research
04/20/2020

WHALETRANS: E2E WHisper to nAturaL spEech conversion using modified TRANSformer network

In this article, we investigate whispered-to natural-speech conversion m...
research
01/04/2020

Transformer-based language modeling and decoding for conversational speech recognition

We propose a way to use a transformer-based language model in conversati...

Please sign up or login with your details

Forgot password? Click here to reset