Improving Children's Speech Recognition by Fine-tuning Self-supervised Adult Speech Representations

11/14/2022
by   Renee Lu, et al.
0

Children's speech recognition is a vital, yet largely overlooked domain when building inclusive speech technologies. The major challenge impeding progress in this domain is the lack of adequate child speech corpora; however, recent advances in self-supervised learning have created a new opportunity for overcoming this problem of data scarcity. In this paper, we leverage self-supervised adult speech representations and use three well-known child speech corpora to build models for children's speech recognition. We assess the performance of fine-tuning on both native and non-native children's speech, examine the effect of cross-domain child corpora, and investigate the minimum amount of child speech required to fine-tune a model which outperforms a state-of-the-art adult model. We also analyze speech recognition performance across children's ages. Our results demonstrate that fine-tuning with cross-domain child corpora leads to relative improvements of up to 46.08 45.53 improvements of 14.70 of transcribed children's speech, it is possible to fine-tune a children's speech recognition system that outperforms a state-of-the-art adult model fine-tuned on 960 hours of adult speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/09/2020

Data Augmentation For Children's Speech Recognition – The "Ethiopian" System For The SLT 2021 Children Speech Recognition Challenge

This paper presents the "Ethiopian" system for the SLT 2021 Children Spe...
research
05/18/2023

Self-supervised Fine-tuning for Improved Content Representations by Speaker-invariant Clustering

Self-supervised speech representation models have succeeded in various t...
research
02/01/2021

On Scaling Contrastive Representations for Low-Resource Speech Recognition

Recent advances in self-supervised learning through contrastive training...
research
02/18/2023

Front-End Adapter: Adapting Front-End Input of Speech based Self-Supervised Learning for Speech Recognition

Recent years have witnessed a boom in self-supervised learning (SSL) in ...
research
04/07/2022

Detecting Dysfluencies in Stuttering Therapy Using wav2vec 2.0

Stuttering is a varied speech disorder that harms an individual's commun...
research
04/06/2021

Comparing CTC and LFMMI for out-of-domain adaptation of wav2vec 2.0 acoustic model

In this work, we investigate if the wav2vec 2.0 self-supervised pretrain...
research
09/13/2023

Enhancing Child Vocalization Classification in Multi-Channel Child-Adult Conversations Through Wav2vec2 Children ASR Features

Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder that oft...

Please sign up or login with your details

Forgot password? Click here to reset