Stuttering Detection Using Speaker Representations and Self-supervised Contextual Embeddings

06/01/2023
by   Shakeel A. Sheikh, et al.
0

The adoption of advanced deep learning architectures in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets. To this end, this work introduces the application of speech embeddings extracted from pre-trained deep learning models trained on large audio datasets for different tasks. In particular, we explore audio representations obtained using emphasized channel attention, propagation, and aggregation time delay neural network (ECAPA-TDNN) and Wav2Vec2.0 models trained on VoxCeleb and LibriSpeech datasets respectively. After extracting the embeddings, we benchmark with several traditional classifiers, such as the K-nearest neighbour (KNN), Gaussian naive Bayes, and neural network, for the SD tasks. In comparison to the standard SD systems trained only on the limited SEP-28k dataset, we obtain a relative improvement of 12.08 recall (UAR) over the baselines. Finally, we have shown that combining two embeddings and concatenating multiple layers of Wav2Vec2.0 can further improve the UAR by up to 2.60

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/04/2022

Introducing ECAPA-TDNN and Wav2Vec2.0 Embeddings to Stuttering Detection

The adoption of advanced deep learning (DL) architecture in stuttering d...
research
06/24/2022

BYOL-S: Learning Self-supervised Speech Representations by Bootstrapping

Methods for extracting audio and speech features have been studied since...
research
09/28/2022

MeWEHV: Mel and Wave Embeddings for Human Voice Tasks

A recent trend in speech processing is the use of embeddings created thr...
research
01/03/2023

Supervised Acoustic Embeddings And Their Transferability Across Languages

In speech recognition, it is essential to model the phonetic content of ...
research
10/26/2022

AVES: Animal Vocalization Encoder based on Self-Supervision

The lack of annotated training data in bioacoustics hinders the use of l...
research
12/27/2017

Multiple Instance Deep Learning for Weakly Supervised Audio Event Detection

State-of-the-art audio event detection (AED) systems rely on supervised ...
research
06/30/2021

Using Self-Supervised Feature Extractors with Attention for Automatic COVID-19 Detection from Speech

The ComParE 2021 COVID-19 Speech Sub-challenge provides a test-bed for t...

Please sign up or login with your details

Forgot password? Click here to reset