Automatic Spelling Correction with Transformer for CTC-based End-to-End Speech Recognition

03/27/2019
by   Shiliang Zhang, et al.
0

Connectionist Temporal Classification (CTC) based end-to-end speech recognition system usually need to incorporate an external language model by using WFST-based decoding in order to achieve promising results. This is more essential to Mandarin speech recognition since it owns a special phenomenon, namely homophone, which causes a lot of substitution errors. The linguistic information introduced by language model will help to distinguish these substitution errors. In this work, we propose a transformer based spelling correction model to automatically correct errors especially the substitution errors made by CTC-based Mandarin speech recognition system. Specifically, we investigate using the recognition results generated by CTC-based systems as input and the ground-truth transcriptions as output to train a transformer with encoder-decoder architecture, which is much similar to machine translation. Results in a 20,000 hours Mandarin speech recognition task show that the proposed spelling correction model can achieve a CER of 3.41 22.9 decoded with and without language model respectively.

READ FULL TEXT
research
12/14/2020

A review of on-device fully neural end-to-end automatic speech recognition algorithms

In this paper, we review various end-to-end automatic speech recognition...
research
01/04/2020

Transformer-based language modeling and decoding for conversational speech recognition

We propose a way to use a transformer-based language model in conversati...
research
04/08/2019

Exploring Methods for the Automatic Detection of Errors in Manual Transcription

Quality of data plays an important role in most deep learning tasks. In ...
research
08/02/2021

User-Initiated Repetition-Based Recovery in Multi-Utterance Dialogue Systems

Recognition errors are common in human communication. Similar errors oft...
research
07/24/2022

Improving Mandarin Speech Recogntion with Block-augmented Transformer

Recently Convolution-augmented Transformer (Conformer) has shown promisi...
research
10/25/2022

Linguistic-Enhanced Transformer with CTC Embedding for Speech Recognition

The recent emergence of joint CTC-Attention model shows significant impr...
research
02/18/2021

Fixing Errors of the Google Voice Recognizer through Phonetic Distance Metrics

Speech recognition systems for the Spanish language, such as Google's, p...

Please sign up or login with your details

Forgot password? Click here to reset