Radically Old Way of Computing Spectra: Applications in End-to-End ASR

03/25/2021
by   Samik Sadhu, et al.
0

We propose a technique to compute spectrograms using Frequency Domain Linear Prediction (FDLP) that uses all-pole models to fit the squared Hilbert envelope of speech in different frequency sub-bands. The spectrogram of a complete speech utterance is computed by overlap-add of contiguous all-pole model responses. A long context window of 1.5 seconds allows us to capture the low frequency temporal modulations of speech in the spectrogram. For an end-to-end automatic speech recognition task, the FDLP spectrogram performs on par with the standard mel spectrogram features for clean read speech training and test data. For more realistic speech data with train-test domain mismatches or reverberations, FDLP spectrogram shows up to 25 improvements over mel spectrogram respectively.

READ FULL TEXT
research
03/24/2022

Complex Frequency Domain Linear Prediction: A Tool to Compute Modulation Spectrum of Speech

Conventional Frequency Domain Linear Prediction (FDLP) technique models ...
research
08/07/2020

Deep Learning Based Dereverberation of Temporal Envelopesfor Robust Speech Recognition

Automatic speech recognition in reverberant conditions is a challenging ...
research
09/30/2022

Blind Signal Dereverberation for Machine Speech Recognition

We present a method to remove unknown convolutive noise introduced to sp...
research
10/28/2022

Improving short-video speech recognition using random utterance concatenation

One of the limitations in end-to-end automatic speech recognition framew...
research
03/27/2022

Listen, Adapt, Better WER: Source-free Single-utterance Test-time Adaptation for Automatic Speech Recognition

Although deep learning-based end-to-end Automatic Speech Recognition (AS...
research
02/15/2020

Small energy masking for improved neural network training for end-to-end speech recognition

In this paper, we present a Small Energy Masking (SEM) algorithm, which ...
research
03/31/2022

Importance of Different Temporal Modulations of Speech: A Tale of Two Perspectives

How important are different temporal speech modulations for speech recog...

Please sign up or login with your details

Forgot password? Click here to reset