Senone-aware Adversarial Multi-task Training for Unsupervised Child to Adult Speech Adaptation

02/23/2021
by   Richeng Duan, et al.
0

Acoustic modeling for child speech is challenging due to the high acoustic variability caused by physiological differences in the vocal tract. The dearth of publicly available datasets makes the task more challenging. In this work, we propose a feature adaptation approach by exploiting adversarial multi-task training to minimize acoustic mismatch at the senone (tied triphone states) level between adult and child speech and leverage large amounts of transcribed adult speech. We validate the proposed method on three tasks: child speech recognition, child pronunciation assessment, and child fluency score prediction. Empirical results indicate that our proposed approach consistently outperforms competitive baselines, achieving 7.7 speech recognition and up to 25.2

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/21/2018

Adversarial Learning of Raw Speech Features for Domain Invariant Speech Recognition

Recent advances in neural network based acoustic modelling have shown si...
research
02/07/2018

Joint Modeling of Accents and Acoustics for Multi-Accent Speech Recognition

The performance of automatic speech recognition systems degrades with in...
research
11/05/2020

Multi-Accent Adaptation based on Gate Mechanism

When only a limited amount of accented speech data is available, to prom...
research
11/10/2017

Reinforcement Learning of Speech Recognition System Based on Policy Gradient and Hypothesis Selection

Speech recognition systems have achieved high recognition performance fo...
research
12/24/2013

Speech Recognition Front End Without Information Loss

Speech representation and modelling in high-dimensional spaces of acoust...
research
09/05/2019

Bandwidth Embeddings for Mixed-bandwidth Speech Recognition

In this paper, we tackle the problem of handling narrowband and wideband...
research
08/31/2017

Leveraging Deep Neural Network Activation Entropy to cope with Unseen Data in Speech Recognition

Unseen data conditions can inflict serious performance degradation on sy...

Please sign up or login with your details

Forgot password? Click here to reset