Improved acoustic word embeddings for zero-resource languages using multilingual transfer

06/02/2020
by   Herman Kamper, et al.
0

Acoustic word embeddings are fixed-dimensional representations of variable-length speech segments. Such embeddings can form the basis for speech search, indexing and discovery systems when conventional speech recognition is not possible. In zero-resource settings where unlabelled speech is the only available resource, we need a method that gives robust embeddings on an arbitrary language. Here we explore multilingual transfer: we train a single supervised embedding model on labelled data from multiple well-resourced languages and then apply it to unseen zero-resource languages. We consider three multilingual recurrent neural network (RNN) models: a classifier trained on the joint vocabularies of all training languages; a Siamese RNN trained to discriminate between same and different words from multiple languages; and a correspondence autoencoder (CAE) RNN trained to reconstruct word pairs. In a word discrimination task on six target languages, all of these models outperform state-of-the-art unsupervised models trained on the zero-resource languages themselves, giving relative improvements of more than 30 precision. When using only a few training languages, the multilingual CAE performs better, but with more training languages the other multilingual models perform similarly. Using more training languages is generally beneficial, but improvements are marginal on some languages. We present probing experiments which show that the CAE encodes more phonetic, word duration, language identity and speaker information than the other multilingual models.

READ FULL TEXT
research
02/06/2020

Multilingual acoustic word embedding models for processing zero-resource languages

Acoustic word embeddings are fixed-dimensional representations of variab...
research
03/19/2021

Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation

Acoustic word embeddings (AWEs) are fixed-dimensional representations of...
research
07/05/2023

Leveraging multilingual transfer for unsupervised semantic acoustic word embeddings

Acoustic word embeddings (AWEs) are fixed-dimensional vector representat...
research
11/24/2020

Acoustic span embeddings for multilingual query-by-example search

Query-by-example (QbE) speech search is the task of matching spoken quer...
research
06/24/2021

Multilingual transfer of acoustic word embeddings improves when training on languages related to the target zero-resource language

Acoustic word embedding models map variable duration speech segments to ...
research
12/03/2020

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings

We propose a new unsupervised model for mapping a variable-duration spee...
research
07/27/2020

Evaluating the reliability of acoustic speech embeddings

Speech embeddings are fixed-size acoustic representations of variable-le...

Please sign up or login with your details

Forgot password? Click here to reset