Supervised Acoustic Embeddings And Their Transferability Across Languages

01/03/2023
by   Sreepratha Ram, et al.
0

In speech recognition, it is essential to model the phonetic content of the input signal while discarding irrelevant factors such as speaker variations and noise, which is challenging in low-resource settings. Self-supervised pre-training has been proposed as a way to improve both supervised and unsupervised speech recognition, including frame-level feature representations and Acoustic Word Embeddings (AWE) for variable-length segments. However, self-supervised models alone cannot learn perfect separation of the linguistic content as they are trained to optimize indirect objectives. In this work, we experiment with different pre-trained self-supervised features as input to AWE models and show that they work best within a supervised framework. Models trained on English can be transferred to other languages with no adaptation and outperform self-supervised models trained solely on the target languages.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/28/2022

Analyzing Acoustic Word Embeddings from Pre-trained Self-supervised Speech Models

Given the strong results of self-supervised models on various tasks, the...
research
04/08/2021

Layer Reduction: Accelerating Conformer-Based Self-Supervised Model via Layer Consistency

Transformer-based self-supervised models are trained as feature extracto...
research
09/17/2023

Improving Speech Inversion Through Self-Supervised Embeddings and Enhanced Tract Variables

The performance of deep learning models depends significantly on their c...
research
04/05/2022

Repeat after me: Self-supervised learning of acoustic-to-articulatory mapping by vocal imitation

We propose a computational model of speech production combining a pre-tr...
research
06/03/2023

Acoustic Word Embeddings for Untranscribed Target Languages with Continued Pretraining and Learned Pooling

Acoustic word embeddings are typically created by training a pooling fun...
research
02/28/2023

deHuBERT: Disentangling Noise in a Self-supervised Model for Robust Speech Recognition

Existing self-supervised pre-trained speech models have offered an effec...
research
06/01/2023

Stuttering Detection Using Speaker Representations and Self-supervised Contextual Embeddings

The adoption of advanced deep learning architectures in stuttering detec...

Please sign up or login with your details

Forgot password? Click here to reset