Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings

07/01/2020
by   Bowen Shi, et al.
0

Segmental models are sequence prediction models in which scores of hypotheses are based on entire variable-length segments of frames. We consider segmental models for whole-word ("acoustic-to-word") speech recognition, with the segment feature vectors defined using acoustic word embeddings. Such models are computationally challenging as the number of paths is proportional to the vocabulary size, which can be orders of magnitude larger than when using subword units like phones. We describe an efficient approach for end-to-end whole-word segmental models, with forward-backward and Viterbi decoding performed on a GPU and a simple segment scoring function that reduces space complexity. In addition, we investigate the use of pre-training via jointly trained acoustic word embeddings (AWEs) and acoustically grounded word embeddings (AGWEs) of written word labels. We find that word error rate can be reduced by a large margin by pre-training the acoustic representation with AWEs, and additional (smaller) gains can be obtained by pre-training the word prediction layer with AGWEs. Our final models improve over comparable A2W models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/29/2019

Acoustically Grounded Word Embeddings for Improved Acoustics-to-Word Speech Recognition

Direct acoustics-to-word (A2W) systems for end-to-end automatic speech r...
research
06/10/2019

Word-level Speech Recognition with a Dynamic Lexicon

We propose a direct-to-word sequence model with a dynamic lexicon. Our w...
research
02/18/2019

Learned In Speech Recognition: Contextual Acoustic Word Embeddings

End-to-end acoustic-to-word speech recognition models have recently gain...
research
10/05/2015

Deep convolutional acoustic word embeddings using word-pair side information

Recent studies have been revisiting whole words as the basic modelling u...
research
07/21/2017

An Error-Oriented Approach to Word Embedding Pre-Training

We propose a novel word embedding pre-training approach that exploits wr...
research
06/05/2023

End-to-End Word-Level Pronunciation Assessment with MASK Pre-training

Pronunciation assessment is a major challenge in the computer-aided pron...
research
03/09/2016

Unsupervised word segmentation and lexicon discovery using acoustic word embeddings

In settings where only unlabelled speech data is available, speech techn...

Please sign up or login with your details

Forgot password? Click here to reset