Neural Network-Based Modeling of Phonetic Durations

09/06/2019
by   Xizi Wei, et al.
0

A deep neural network (DNN)-based model has been developed to predict non-parametric distributions of durations of phonemes in specified phonetic contexts and used to explore which factors influence durations most. Major factors in US English are pre-pausal lengthening, lexical stress, and speaking rate. The model can be used to check that text-to-speech (TTS) training speech follows the script and words are pronounced as expected. Duration prediction is poorer with training speech for automatic speech recognition (ASR) because the training corpus typically consists of single utterances from many speakers and is often noisy or casually spoken. Low probability durations in ASR training material nevertheless mostly correspond to non-standard speech, with some having disfluencies. Children's speech is disproportionately present in these utterances, since children show much more variation in timing.

READ FULL TEXT
research
04/13/2021

Experiments of ASR-based mispronunciation detection for children and adult English learners

Pronunciation is one of the fundamentals of language learning, and it is...
research
06/18/2021

Low Resource German ASR with Untranscribed Data Spoken by Non-native Children – INTERSPEECH 2021 Shared Task SPAPL System

This paper describes the SPAPL system for the INTERSPEECH 2021 Challenge...
research
09/12/2023

Kid-Whisper: Towards Bridging the Performance Gap in Automatic Speech Recognition for Children VS. Adults

Recent advancements in Automatic Speech Recognition (ASR) systems, exemp...
research
09/06/2017

Spoken English Intelligibility Remediation with PocketSphinx Alignment and Feature Extraction Improves Substantially over the State of the Art

Automatic speech recognition is used to assess spoken English learner pr...
research
05/08/2018

Transfer Learning from Adult to Children for Speech Recognition: Evaluation, Analysis and Recommendations

Children speech recognition is challenging mainly due to the inherent hi...
research
01/22/2020

TLT-school: a Corpus of Non Native Children Speech

This paper describes "TLT-school" a corpus of speech utterances collecte...
research
03/17/2022

Prediction of speech intelligibility with DNN-based performance measures

This paper presents a speech intelligibility model based on automatic sp...

Please sign up or login with your details

Forgot password? Click here to reset