DeepAI AI Chat
Log In Sign Up

Deep convolutional acoustic word embeddings using word-pair side information

by   Herman Kamper, et al.
Toyota Technological Institute at Chicago

Recent studies have been revisiting whole words as the basic modelling unit in speech recognition and query applications, instead of phonetic units. Such whole-word segmental systems rely on a function that maps a variable-length speech segment to a vector in a fixed-dimensional space; the resulting acoustic word embeddings need to allow for accurate discrimination between different word types, directly in the embedding space. We compare several old and new approaches in a word discrimination task. Our best approach uses side information in the form of known word pairs to train a Siamese convolutional neural network (CNN): a pair of tied networks that take two speech segments as input and produce their embeddings, trained with a hinge loss that separates same-word pairs and different-word pairs by some margin. A word classifier CNN performs similarly, but requires much stronger supervision. Both types of CNNs yield large improvements over the best previously published results on the word discrimination task.


page 1

page 2

page 3

page 4


Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches

Acoustic word embeddings --- fixed-dimensional vector representations of...

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search

We propose to learn acoustic word embeddings with temporal context for q...

Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study

Several variants of deep neural networks have been successfully employed...

Learning Joint Acoustic-Phonetic Word Embeddings

Most speech recognition tasks pertain to mapping words across two modali...

Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings

Segmental models are sequence prediction models in which scores of hypot...

Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings

Acoustic word embeddings (AWEs) are discriminative representations of sp...

Acoustic Neighbor Embeddings

This paper proposes a novel acoustic word embedding called Acoustic Neig...