End-to-end Whispered Speech Recognition with Frequency-weighted Approaches and Layer-wise Transfer Learning

05/05/2020
by   Heng-Jui Chang, et al.
0

Whispering is an important mode of human speech, but no end-to-end recognition results for it were reported yet, probably due to the scarcity of available whispered speech data. In this paper, we present several approaches for end-to-end (E2E) recognition of whispered speech considering the special characteristics of whispered speech and the scarcity of data. This includes a frequency-weighted SpecAugment policy and a frequency-divided CNN feature extractor for better capturing the high frequency structures of whispered speech, and a layer-wise transfer learning approach to pre-train a model with normal speech then fine-tuning it with whispered speech to bridge the gap between whispered and normal speech. We achieve an overall relative reduction of 19.8 The results indicate as long as we have a good E2E model pre-trained on normal speech, a relatively small set of whispered speech may suffice to obtain a reasonably good E2E whispered speech recognizer.

READ FULL TEXT
research
04/03/2022

Deep Speech Based End-to-End Automated Speech Recognition (ASR) for Indian-English Accents

Automated Speech Recognition (ASR) is an interdisciplinary application o...
research
06/07/2023

Transfer Learning from Pre-trained Language Models Improves End-to-End Speech Summarization

End-to-end speech summarization (E2E SSum) directly summarizes input spe...
research
12/07/2022

Low-Resource End-to-end Sanskrit TTS using Tacotron2, WaveGlow and Transfer Learning

End-to-end text-to-speech (TTS) systems have been developed for European...
research
11/27/2019

AIPNet: Generative Adversarial Pre-training of Accent-invariant Networks for End-to-end Speech Recognition

As one of the major sources in speech variability, accents have posed a ...
research
11/12/2020

The CUHK-TUDELFT System for The SLT 2021 Children Speech Recognition Challenge

This technical report describes our submission to the 2021 SLT Children ...
research
06/14/2023

Research on an improved Conformer end-to-end Speech Recognition Model with R-Drop Structure

To address the issue of poor generalization ability in end-to-end speech...
research
03/04/2021

End-to-end acoustic modelling for phone recognition of young readers

Automatic recognition systems for child speech are lagging behind those ...

Please sign up or login with your details

Forgot password? Click here to reset