Learned In Speech Recognition: Contextual Acoustic Word Embeddings

02/18/2019
by   Shruti Palaskar, et al.
0

End-to-end acoustic-to-word speech recognition models have recently gained popularity because they are easy to train, scale well to large amounts of training data, and do not require a lexicon. In addition, word models may also be easier to integrate with downstream tasks such as spoken language understanding, because inference (search) is much simplified compared to phoneme, character or any other sort of sub-word units. In this paper, we describe methods to construct contextual acoustic word embeddings directly from a supervised sequence-to-sequence acoustic-to-word speech recognition model using the learned attention distribution. On a suite of 16 standard sentence evaluation tasks, our embeddings show competitive performance against a word2vec model trained on the speech transcriptions. In addition, we evaluate these embeddings on a spoken language understanding task, and observe that our embeddings match the performance of text-based embeddings in a pipeline of first performing speech recognition and then constructing word embeddings from transcriptions.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/10/2019

Word-level Speech Recognition with a Dynamic Lexicon

We propose a direct-to-word sequence model with a dynamic lexicon. Our w...
research
02/20/2019

Audio-Linguistic Embeddings for Spoken Sentences

We propose spoken sentence embeddings which capture both acoustic and li...
research
10/16/2019

Joint Learning of Word and Label Embeddings for Sequence Labelling in Spoken Language Understanding

We propose an architecture to jointly learn word and label embeddings fo...
research
04/12/2023

Acoustic absement in detail: Quantifying acoustic differences across time-series representations of speech data

The speech signal is a consummate example of time-series data. The acous...
research
07/01/2020

Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings

Segmental models are sequence prediction models in which scores of hypot...
research
01/08/2023

Analyzing the Representational Geometry of Acoustic Word Embeddings

Acoustic word embeddings (AWEs) are vector representations such that dif...
research
10/30/2017

Deep word embeddings for visual speech recognition

In this paper we present a deep learning architecture for extracting wor...

Please sign up or login with your details

Forgot password? Click here to reset