Syllable-Based Sequence-to-Sequence Speech Recognition with the Transformer in Mandarin Chinese

04/28/2018
by   Shiyu Zhou, et al.
0

Sequence-to-sequence attention-based models have recently shown very promising results on automatic speech recognition (ASR) tasks, which integrate an acoustic, pronunciation and language model into a single neural network. In these models, the Transformer, a new sequence-to-sequence attention-based model relying entirely on self-attention without using RNNs or convolutions, achieves a new single-model state-of-the-art BLEU on neural machine translation (NMT) tasks. Since the outstanding performance of the Transformer, we extend it to speech and concentrate on it as the basic architecture of sequence-to-sequence attention-based model on Mandarin Chinese ASR tasks. Furthermore, we investigate a comparison between syllable based model and context-independent phoneme (CI-phoneme) based model with the Transformer in Mandarin Chinese. Additionally, a greedy cascading decoder with the Transformer is proposed for mapping CI-phoneme sequences and syllable sequences into word sequences. Experiments on HKUST datasets demonstrate that syllable based model with the Transformer performs better than CI-phoneme based counterpart, and achieves a character error rate (CER) of 28.77%, which is competitive to the state-of-the-art CER of 28.0% by the joint CTC-attention based encoder-decoder network.

READ FULL TEXT
research
05/16/2018

A Comparison of Modeling Units in Sequence-to-Sequence Speech Recognition with the Transformer on Mandarin Chinese

The choice of modeling units is critical to automatic speech recognition...
research
10/31/2022

Structured State Space Decoder for Speech Recognition and Synthesis

Automatic speech recognition (ASR) systems developed in recent years hav...
research
01/20/2020

Single headed attention based sequence-to-sequence model for state-of-the-art results on Switchboard-300

It is generally believed that direct sequence-to-sequence (seq2seq) spee...
research
04/14/2020

Transformer based Grapheme-to-Phoneme Conversion

Attention mechanism is one of the most successful techniques in deep lea...
research
11/01/2019

Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding

Transformer showed promising results in many sequence to sequence transf...
research
11/29/2019

Neural Chinese Word Segmentation as Sequence to Sequence Translation

Recently, Chinese word segmentation (CWS) methods using neural networks ...

Please sign up or login with your details

Forgot password? Click here to reset