Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

06/03/2023
by   Ramon Sanabria, et al.
0

Acoustic word embeddings are typically created by training a pooling function using pairs of word-like units. For unsupervised systems, these are mined using k-nearest neighbor (KNN) search, which is slow. Recently, mean-pooled representations from a pre-trained self-supervised English model were suggested as a promising alternative, but their performance on target languages was not fully competitive. Here, we explore improvements to both approaches: we use continued pre-training to adapt the self-supervised model to the target language, and we use a multilingual phone recognizer (MPR) to mine phone n-gram pairs for training the pooling function. Evaluating on four languages, we show that both methods outperform a recent approach on word discrimination. Moreover, the MPR method is orders of magnitude faster than KNN, and is highly data efficient. We also show a small improvement from performing learned pooling on top of the continued pre-trained representations.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2022

Analyzing Acoustic Word Embeddings from Pre-trained Self-supervised Speech Models

Given the strong results of self-supervised models on various tasks, the...
research
03/02/2023

Denoising-based UNMT is more robust to word-order divergence than MASS-based UNMT

We aim to investigate whether UNMT approaches with self-supervised pre-t...
research
01/03/2023

Supervised Acoustic Embeddings And Their Transferability Across Languages

In speech recognition, it is essential to model the phonetic content of ...
research
06/24/2020

Multilingual Jointly Trained Acoustic and Written Word Embeddings

Acoustic word embeddings (AWEs) are vector representations of spoken wor...
research
11/09/2022

Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models

In this paper, we extend previous self-supervised approaches for languag...
research
12/03/2020

A Correspondence Variational Autoencoder for Unsupervised Acoustic Word Embeddings

We propose a new unsupervised model for mapping a variable-duration spee...
research
08/31/2020

Discovering Bilingual Lexicons in Polyglot Word Embeddings

Bilingual lexicons and phrase tables are critical resources for modern M...

Please sign up or login with your details

Forgot password? Click here to reset