Speaker- and Age-Invariant Training for Child Acoustic Modeling Using Adversarial Multi-Task Learning

10/19/2022
by   Mostafa Shahin, et al.
0

One of the major challenges in acoustic modelling of child speech is the rapid changes that occur in the children's articulators as they grow up, their differing growth rates and the subsequent high variability in the same age group. These high acoustic variations along with the scarcity of child speech corpora have impeded the development of a reliable speech recognition system for children. In this paper, a speaker- and age-invariant training approach based on adversarial multi-task learning is proposed. The system consists of one generator shared network that learns to generate speaker- and age-invariant features connected to three discrimination networks, for phoneme, age, and speaker. The generator network is trained to minimize the phoneme-discrimination loss and maximize the speaker- and age-discrimination losses in an adversarial multi-task learning fashion. The generator network is a Time Delay Neural Network (TDNN) architecture while the three discriminators are feed-forward networks. The system was applied to the OGI speech corpora and achieved a 13

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/22/2023

Leveraging Speaker Embeddings with Adversarial Multi-task Learning for Age Group Classification

Recently, researchers have utilized neural network-based speaker embeddi...
research
12/09/2018

To Reverse the Gradient or Not: An Empirical Comparison of Adversarial and Multi-task Learning in Speech Recognition

Transcribed datasets typically contain speaker identity for each instanc...
research
06/24/2022

Burst2Vec: An Adversarial Multi-Task Approach for Predicting Emotion, Age, and Origin from Vocal Bursts

We present Burst2Vec, our multi-task learning approach to predict emotio...
research
09/03/2021

Phone Duration Modeling for Speaker Age Estimation in Children

Automatic inference of important paralinguistic information such as age ...
research
11/16/2022

Psychophysiology-aided Perceptually Fluent Speech Analysis of Children Who Stutter

This first-of-its-kind paper presents a novel approach named PASAD that ...
research
09/09/2022

Longitudinal Acoustic Speech Tracking Following Pediatric Traumatic Brain Injury

Recommendations for common outcome measures following pediatric traumati...
research
11/30/2018

Advance Prediction of Ventricular Tachyarrhythmias using Patient Metadata and Multi-Task Networks

We describe a novel neural network architecture for the prediction of ve...

Please sign up or login with your details

Forgot password? Click here to reset