Analyzing autoencoder-based acoustic word embeddings

04/03/2020
by   Yevgen Matusevych, et al.
0

Recent studies have introduced methods for learning acoustic word embeddings (AWEs)—fixed-size vector representations of words which encode their acoustic features. Despite the widespread use of AWEs in speech processing research, they have only been evaluated quantitatively in their ability to discriminate between whole word tokens. To better understand the applications of AWEs in various downstream tasks and in cognitive modeling, we need to analyze the representation spaces of AWEs. Here we analyze basic properties of AWE spaces learned by a sequence-to-sequence encoder-decoder model in six typologically diverse languages. We first show that these AWEs preserve some information about words' absolute duration and speaker. At the same time, the representation space of these AWEs is organized such that the distance between words' embeddings increases with those words' phonetic dissimilarity. Finally, the AWEs exhibit a word onset bias, similar to patterns reported in various studies on human speech processing and lexical access. We argue this is a promising result and encourage further evaluation of AWEs as a potentially useful tool in cognitive science, which could provide a link between speech processing and lexical memory.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2018

Speech2Vec: A Sequence-to-Sequence Framework for Learning Word Embeddings from Speech

In this paper, we propose a novel deep neural network architecture, Spee...
research
11/01/2018

Truly unsupervised acoustic word embeddings using weak top-down constraints in encoder-decoder models

We investigate unsupervised models that can map a variable-duration spee...
research
10/24/2019

Combining Acoustics, Content and Interaction Features to Find Hot Spots in Meetings

Involvement hot spots have been proposed as a useful concept for meeting...
research
07/23/2023

SCRAPS: Speech Contrastive Representations of Acoustic and Phonetic Spaces

Numerous examples in the literature proved that deep learning models hav...
research
05/29/2020

Prosody leaks into the memories of words

The average predictability (aka informativity) of a word in context has ...
research
08/01/2019

Learning Joint Acoustic-Phonetic Word Embeddings

Most speech recognition tasks pertain to mapping words across two modali...

Please sign up or login with your details

Forgot password? Click here to reset