DeepAI AI Chat
Log In Sign Up

Personalizing ASR for Dysarthric and Accented Speech with Limited Data

by   Joel Shor, et al.

Automatic speech recognition (ASR) systems have dramatically improved over the last few years. ASR systems are most often trained from 'typical' speech, which means that underrepresented groups don't experience the same level of improvement. In this paper, we present and evaluate finetuning techniques to improve ASR for users with non-standard speech. We focus on two types of non-standard speech: speech from people with amyotrophic lateral sclerosis (ALS) and accented speech. We train personalized models that achieve 62 35 ALS speakers, on a test set of message bank phrases, down to 10 dysarthria and 20 improvement comes from only 5 minutes of training data. Finetuning a particular subset of layers (with many fewer parameters) often gives better results than finetuning the entire model. This is the first step towards building state of the art ASR models for dysarthric speech.


page 1

page 2

page 3

page 4


Personalized Automatic Speech Recognition Trained on Small Disordered Speech Datasets

This study investigates the performance of personalized automatic speech...

Improving Unsupervised Sparsespeech Acoustic Models with Categorical Reparameterization

The Sparsespeech model is an unsupervised acoustic model that can genera...

Enhancing ASR for Stuttered Speech with Limited Data Using Detect and Pass

It is estimated that around 70 million people worldwide are affected by ...

Refining Automatic Speech Recognition System for older adults

Building a high quality automatic speech recognition (ASR) system with l...

Accented Speech Recognition: Benchmarking, Pre-training, and Diverse Data

Building inclusive speech recognition systems is a crucial step towards ...

Transcription free filler word detection with Neural semi-CRFs

Non-linguistic filler words, such as "uh" or "um", are prevalent in spon...