Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches

11/08/2016
by   Shane Settle, et al.
0

Acoustic word embeddings --- fixed-dimensional vector representations of variable-length spoken word segments --- have begun to be considered for tasks such as speech recognition and query-by-example search. Such embeddings can be learned discriminatively so that they are similar for speech segments corresponding to the same word, while being dissimilar for segments corresponding to different words. Recent work has found that acoustic word embeddings can outperform dynamic time warping on query-by-example search and related word discrimination tasks. However, the space of embedding models and training approaches is still relatively unexplored. In this paper we present new discriminative embedding models based on recurrent neural networks (RNNs). We consider training losses that have been successful in prior work, in particular a cross entropy loss for word classification and a contrastive loss that explicitly aims to separate same-word and different-word pairs in a "Siamese network" training setting. We find that both classifier-based and Siamese RNN embeddings improve over previously reported results on a word discrimination task, with Siamese RNNs outperforming classification models. In addition, we present analyses of the learned embeddings and the effects of variables such as dimensionality and network structure.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/12/2017

Query-by-Example Search with Discriminative Neural Acoustic Word Embeddings

Query-by-example search often uses dynamic time warping (DTW) for compar...
research
08/28/2023

Neural approaches to spoken content embedding

Comparing spoken segments is a central operation to speech processing. T...
research
10/05/2015

Deep convolutional acoustic word embeddings using word-pair side information

Recent studies have been revisiting whole words as the basic modelling u...
research
05/24/2020

Acoustic Word Embedding System for Code-Switching Query-by-example Spoken Term Detection

In this paper, we propose a deep convolutional neural network-based acou...
research
09/18/2021

Fast query-by-example speech search using separable model

Traditional Query-by-Example (QbE) speech search approaches usually use ...
research
11/07/2018

Improved Audio Embeddings by Adjacency-Based Clustering with Applications in Spoken Term Detection

Embedding audio signal segments into vectors with fixed dimensionality i...
research
07/27/2020

Evaluating the reliability of acoustic speech embeddings

Speech embeddings are fixed-size acoustic representations of variable-le...

Please sign up or login with your details

Forgot password? Click here to reset