Leveraging multilingual transfer for unsupervised semantic acoustic word embeddings

07/05/2023
by   Christiaan Jacobs, et al.
0

Acoustic word embeddings (AWEs) are fixed-dimensional vector representations of speech segments that encode phonetic content so that different realisations of the same word have similar embeddings. In this paper we explore semantic AWE modelling. These AWEs should not only capture phonetics but also the meaning of a word (similar to textual word embeddings). We consider the scenario where we only have untranscribed speech in a target language. We introduce a number of strategies leveraging a pre-trained multilingual AWE model – a phonetic AWE model trained on labelled data from multiple languages excluding the target. Our best semantic AWE approach involves clustering word segments using the multilingual AWE model, deriving soft pseudo-word labels from the cluster centroids, and then training a Skipgram-like model on the soft vectors. In an intrinsic word similarity task measuring semantics, this multilingual transfer approach outperforms all previous semantic AWE methods. We also show – for the first time – that AWEs can be used for downstream semantic query-by-example search.

READ FULL TEXT
research
06/02/2020

Improved acoustic word embeddings for zero-resource languages using multilingual transfer

Acoustic word embeddings are fixed-dimensional representations of variab...
research
06/24/2021

Multilingual transfer of acoustic word embeddings improves when training on languages related to the target zero-resource language

Acoustic word embedding models map variable duration speech segments to ...
research
03/19/2021

Acoustic word embeddings for zero-resource languages using self-supervised contrastive learning and multilingual adaptation

Acoustic word embeddings (AWEs) are fixed-dimensional representations of...
research
05/16/2023

Distilling Semantic Concept Embeddings from Contrastively Fine-Tuned Language Models

Learning vectors that capture the meaning of concepts remains a fundamen...
research
02/16/2017

Fast and unsupervised methods for multilingual cognate clustering

In this paper we explore the use of unsupervised methods for detecting c...
research
12/14/2016

Multilingual Word Embeddings using Multigraphs

We present a family of neural-network--inspired models for computing con...
research
06/07/2016

Multilingual Visual Sentiment Concept Matching

The impact of culture in visual emotion perception has recently captured...

Please sign up or login with your details

Forgot password? Click here to reset