Integrating Form and Meaning: A Multi-Task Learning Model for Acoustic Word Embeddings

09/14/2022
by   Badr M. Abdullah, et al.
0

Models of acoustic word embeddings (AWEs) learn to map variable-length spoken word segments onto fixed-dimensionality vector representations such that different acoustic exemplars of the same word are projected nearby in the embedding space. In addition to their speech technology applications, AWE models have been shown to predict human performance on a variety of auditory lexical processing tasks. Current AWE models are based on neural networks and trained in a bottom-up approach that integrates acoustic cues to build up a word representation given an acoustic or symbolic supervision signal. Therefore, these models do not leverage or capture high-level lexical knowledge during the learning process. In this paper, we propose a multi-task learning model that incorporates top-down lexical knowledge into the training procedure of AWEs. Our model learns a mapping between the acoustic input and a lexical representation that encodes high-level information such as word semantics in addition to bottom-up form-based supervision. We experiment with three languages and demonstrate that incorporating lexical knowledge improves the embedding space discriminability and encourages the model to better separate lexical categories.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/08/2023

Analyzing the Representational Geometry of Acoustic Word Embeddings

Acoustic word embeddings (AWEs) are vector representations such that dif...
research
06/16/2021

Do Acoustic Word Embeddings Capture Phonological Similarity? An Empirical Study

Several variants of deep neural networks have been successfully employed...
research
11/17/2017

Phonological (un)certainty weights lexical activation

Spoken word recognition involves at least two basic computations. First ...
research
10/24/2019

Combining Acoustics, Content and Interaction Features to Find Hot Spots in Meetings

Involvement hot spots have been proposed as a useful concept for meeting...
research
11/13/2019

Word-level Lexical Normalisation using Context-Dependent Embeddings

Lexical normalisation (LN) is the process of correcting each word in a d...
research
04/10/2023

Oh, Jeez! or Uh-huh? A Listener-aware Backchannel Predictor on ASR Transcriptions

This paper presents our latest investigation on modeling backchannel in ...
research
04/02/2021

Query2Prod2Vec Grounded Word Embeddings for eCommerce

We present Query2Prod2Vec, a model that grounds lexical representations ...

Please sign up or login with your details

Forgot password? Click here to reset