Importance of Different Temporal Modulations of Speech: A Tale of Two Perspectives

03/31/2022
by   Samik Sadhu, et al.
0

How important are different temporal speech modulations for speech recognition? We answer this question from two complementary perspectives. Firstly, we quantify the amount of phonetic information in the modulation spectrum of speech by computing the mutual information between temporal modulations with frame-wise phoneme labels. Looking from another perspective, we ask - which speech modulations does an Automatic Speech Recognition (ASR) system prefer for its operation. Data-driven weights are learnt over the modulation spectrum and optimized for an end-to-end ASR task. Both methods unanimously agree that speech information is mostly contained in slow modulation. Maximum mutual information occurs around 3-6 Hz which also happens to be the range of modulations most preferred by the ASR. In addition, we show that incorporation of this knowledge into ASRs significantly reduces its dependency on the amount of training data.

READ FULL TEXT
research
05/20/2020

PyChain: A Fully Parallelized PyTorch Implementation of LF-MMI for End-to-End ASR

We present PyChain, a fully parallelized PyTorch implementation of end-t...
research
01/14/2023

Acoustic correlates of the syllabic rhythm of speech: Modulation spectrum or local features of the temporal envelope

The syllable is a perceptually salient unit in speech. Since both the sy...
research
03/24/2022

Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech

Conventional Frequency Domain Linear Prediction (FDLP) technique models ...
research
06/09/2023

Improving Frame-level Classifier for Word Timings with Non-peaky CTC in End-to-End Automatic Speech Recognition

End-to-end (E2E) systems have shown comparable performance to hybrid sys...
research
10/14/2022

Learning to Jointly Transcribe and Subtitle for End-to-End Spontaneous Speech Recognition

TV subtitles are a rich source of transcriptions of many types of speech...
research
03/25/2021

Radically Old Way of Computing Spectra: Applications in End-to-End ASR

We propose a technique to compute spectrograms using Frequency Domain Li...
research
02/14/2017

On the Relevance of Auditory-Based Gabor Features for Deep Learning in Automatic Speech Recognition

Previous studies support the idea of merging auditory-based Gabor featur...

Please sign up or login with your details

Forgot password? Click here to reset