On the Relevance of Auditory-Based Gabor Features for Deep Learning in Automatic Speech Recognition

Previous studies support the idea of merging auditory-based Gabor features with deep learning architectures to achieve robust automatic speech recognition, however, the cause behind the gain of such combination is still unknown. We believe these representations provide the deep learning decoder with more discriminable cues. Our aim with this paper is to validate this hypothesis by performing experiments with three different recognition tasks (Aurora 4, CHiME 2 and CHiME 3) and assess the discriminability of the information encoded by Gabor filterbank features. Additionally, to identify the contribution of low, medium and high temporal modulation frequencies subsets of the Gabor filterbank were used as features (dubbed LTM, MTM and HTM respectively). With temporal modulation frequencies between 16 and 25 Hz, HTM consistently outperformed the remaining ones in every condition, highlighting the robustness of these representations against channel distortions, low signal-to-noise ratios and acoustically challenging real-life scenarios with relative improvements from 11 to 56 explain the results, a measure of similarity between phoneme classes from DNN activations is proposed and linked to their acoustic properties. We find this measure to be consistent with the observed error rates and highlight specific differences on phoneme level to pinpoint the benefit of the proposed features.

READ FULL TEXT

page 4

page 5

page 8

page 9

page 10

research
04/08/2022

Exploiting Hidden Representations from a DNN-based Speech Recogniser for Speech Intelligibility Prediction in Hearing-impaired Listeners

An accurate objective speech intelligibility prediction algorithms is of...
research
11/23/2018

Improved Frequency Modulation Features for Multichannel Distant Speech Recognition

Frequency modulation features capture the fine structure of speech forma...
research
12/24/2013

Speech Recognition Front End Without Information Loss

Speech representation and modelling in high-dimensional spaces of acoust...
research
02/24/2021

Thoughts on the potential to compensate a hearing loss in noise

The effect of hearing impairment on speech perception was described by P...
research
11/03/2022

Leveraging Domain Features for Detecting Adversarial Attacks Against Deep Speech Recognition in Noise

In recent years, significant progress has been made in deep model-based ...
research
03/31/2022

Importance of Different Temporal Modulations of Speech: A Tale of Two Perspectives

How important are different temporal speech modulations for speech recog...

Please sign up or login with your details

Forgot password? Click here to reset