Sequence-to-sequence Automatic Speech Recognition with Word Embedding Regularization and Fused Decoding

10/28/2019
by   Alexander H. Liu, et al.
0

In this paper, we investigate the benefit that off-the-shelf word embedding can bring to the sequence-to-sequence (seq-to-seq) automatic speech recognition (ASR). We first introduced the word embedding regularization by maximizing the cosine similarity between a transformed decoder feature and the target word embedding. Based on the regularized decoder, we further proposed the fused decoding mechanism. This allows the decoder to consider the semantic consistency during decoding by absorbing the information carried by the transformed decoder feature, which is learned to be close to the target word embedding. Initial results on LibriSpeech demonstrated that pre-trained word embedding can significantly lower ASR recognition error with a negligible cost, and the choice of word embedding algorithms among Skip-gram, CBOW and BERT is important.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2018

Towards Unsupervised Automatic Speech Recognition Trained by Unaligned Speech and Text only

Automatic speech recognition (ASR) has been widely researched with super...
research
01/03/2019

Feature reinforcement with word embedding and parsing information in neural TTS

In this paper, we propose a feature reinforcement method under the seque...
research
04/06/2021

Non-autoregressive Mandarin-English Code-switching Speech Recognition with Pinyin Mask-CTC and Word Embedding Regularization

Mandarin-English code-switching (CS) is frequently used among East and S...
research
03/29/2019

Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition

Direct acoustics-to-word (A2W) systems for end-to-end automatic speech r...
research
11/16/2018

Investigating the Effects of Word Substitution Errors on Sentence Embeddings

A key initial step in several natural language processing (NLP) tasks in...
research
12/10/2021

Building a great multi-lingual teacher with sparsely-gated mixture of experts for speech recognition

The sparsely-gated Mixture of Experts (MoE) can magnify a network capaci...
research
02/12/2020

Attentional Speech Recognition Models Misbehave on Out-of-domain Utterances

We discuss the problem of echographic transcription in autoregressive se...

Please sign up or login with your details

Forgot password? Click here to reset