Introducing ECAPA-TDNN and Wav2Vec2.0 Embeddings to Stuttering Detection

04/04/2022
by   Shakeel Ahmad Sheikh, et al.
0

The adoption of advanced deep learning (DL) architecture in stuttering detection (SD) tasks is challenging due to the limited size of the available datasets. To this end, this work introduces the application of speech embeddings extracted with pre-trained deep models trained on massive audio datasets for different tasks. In particular, we explore audio representations obtained using emphasized channel attention, propagation, and aggregation-time-delay neural network (ECAPA-TDNN) and Wav2Vec2.0 model trained on VoxCeleb and LibriSpeech datasets respectively. After extracting the embeddings, we benchmark with several traditional classifiers, such as a k-nearest neighbor, Gaussian naive Bayes, and neural network, for the stuttering detection tasks. In comparison to the standard SD system trained only on the limited SEP-28k dataset, we obtain a relative improvement of 16.74 in terms of overall accuracy over baseline. Finally, we have shown that combining two embeddings and concatenating multiple layers of Wav2Vec2.0 can further improve SD performance up to 1

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/01/2023

Stuttering Detection Using Speaker Representations and Self-supervised Contextual Embeddings

The adoption of advanced deep learning architectures in stuttering detec...
research
07/20/2023

Transfer Learning and Bias Correction with Pre-trained Audio Embeddings

Deep neural network models have become the dominant approach to a large ...
research
12/27/2017

Multiple Instance Deep Learning for Weakly Supervised Audio Event Detection

State-of-the-art audio event detection (AED) systems rely on supervised ...
research
12/27/2017

Multiple Instance Deep Learning for Weakly Supervised Small-Footprint Audio Event Detection

State-of-the-art audio event detection (AED) systems rely on supervised ...
research
02/28/2023

FacEDiM: A Face Embedding Distribution Model for Few-Shot Biometric Authentication of Cattle

This work proposes to solve the problem of few-shot biometric authentica...
research
07/02/2022

Speech Emotion: Investigating Model Representations, Multi-Task Learning and Knowledge Distillation

Estimating dimensional emotions, such as activation, valence and dominan...
research
07/03/2019

Cover Detection using Dominant Melody Embeddings

Automatic cover detection – the task of finding in an audio database all...

Please sign up or login with your details

Forgot password? Click here to reset