Deep convolutional acoustic word embeddings using word-pair side information

10/05/2015
by   Herman Kamper, et al.
0

Recent studies have been revisiting whole words as the basic modelling unit in speech recognition and query applications, instead of phonetic units. Such whole-word segmental systems rely on a function that maps a variable-length speech segment to a vector in a fixed-dimensional space; the resulting acoustic word embeddings need to allow for accurate discrimination between different word types, directly in the embedding space. We compare several old and new approaches in a word discrimination task. Our best approach uses side information in the form of known word pairs to train a Siamese convolutional neural network (CNN): a pair of tied networks that take two speech segments as input and produce their embeddings, trained with a hinge loss that separates same-word pairs and different-word pairs by some margin. A word classifier CNN performs similarly, but requires much stronger supervision. Both types of CNNs yield large improvements over the best previously published results on the word discrimination task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/08/2016

Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches

Acoustic word embeddings --- fixed-dimensional vector representations of...
research
06/10/2018

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search

We propose to learn acoustic word embeddings with temporal context for q...
research
06/16/2021

Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study

Several variants of deep neural networks have been successfully employed...
research
08/01/2019

Learning Joint Acoustic-Phonetic Word Embeddings

Most speech recognition tasks pertain to mapping words across two modali...
research
07/01/2020

Whole-Word Segmental Speech Recognition with Acoustic Word Embeddings

Segmental models are sequence prediction models in which scores of hypot...
research
03/30/2022

Asymmetric Proxy Loss for Multi-View Acoustic Word Embeddings

Acoustic word embeddings (AWEs) are discriminative representations of sp...
research
07/20/2020

Acoustic Neighbor Embeddings

This paper proposes a novel acoustic word embedding called Acoustic Neig...

Please sign up or login with your details

Forgot password? Click here to reset