Evaluating the reliability of acoustic speech embeddings

07/27/2020
by   Robin Algayres, et al.
0

Speech embeddings are fixed-size acoustic representations of variable-length speech sequences. They are increasingly used for a variety of tasks ranging from information retrieval to unsupervised term discovery and speech segmentation. However, there is currently no clear methodology to compare or optimise the quality of these embeddings in a task-neutral way. Here, we systematically compare two popular metrics, ABX discrimination and Mean Average Precision (MAP), on 5 languages across 17 embedding methods, ranging from supervised to fully unsupervised, and using different loss functions (autoencoders, correspondence autoencoders, siamese). Then we use the ABX and MAP to predict performances on a new downstream task: the unsupervised estimation of the frequencies of speech segments in a given corpus. We find that overall, ABX and MAP correlate with one another and with frequency estimation. However, substantial discrepancies appear in the fine-grained distinctions across languages and/or embedding methods. This makes it unrealistic at present to propose a task-independent silver bullet method for computing the intrinsic quality of speech embeddings. There is a need for more detailed analysis of the metrics currently used to evaluate such embeddings.

READ FULL TEXT
research
11/01/2018

Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models

We investigate unsupervised models that can map a variable-duration spee...
research
06/02/2020

Improved acoustic word embeddings for zero-resource languages using multilingual transfer

Acoustic word embeddings are fixed-dimensional representations of variab...
research
02/06/2020

Multilingual acoustic word embedding models for processing zero-resource languages

Acoustic word embeddings are fixed-dimensional representations of variab...
research
11/08/2016

Discriminative Acoustic Word Embeddings: Recurrent Neural Network-Based Approaches

Acoustic word embeddings --- fixed-dimensional vector representations of...
research
11/14/2016

Multi-view Recurrent Neural Acoustic Word Embeddings

Recent work has begun exploring neural acoustic word embeddings---fixed-...
research
10/24/2019

Towards Fine-Grained Prosody Control for Voice Conversion

In a typical voice conversion system, prior works utilize various acoust...
research
04/18/2018

Unspeech: Unsupervised Speech Context Embeddings

We introduce "Unspeech" embeddings, which are based on unsupervised lear...

Please sign up or login with your details

Forgot password? Click here to reset