Learning Multiscale Features Directly From Waveforms

03/31/2016
by   Zhenyao Zhu, et al.
0

Deep learning has dramatically improved the performance of speech recognition systems through learning hierarchies of features optimized for the task at hand. However, true end-to-end learning, where features are learned directly from waveforms, has only recently reached the performance of hand-tailored representations based on the Fourier transform. In this paper, we detail an approach to use convolutional filters to push past the inherent tradeoff of temporal and frequency resolution that exists for spectral representations. At increased computational cost, we show that increasing temporal resolution via reduced stride and increasing frequency resolution via additional filters delivers significant performance improvements. Further, we find more efficient representations by simultaneously learning at multiple scales, leading to an overall decrease in word error rate on a difficult internal speech test set by 20.7 spectrograms.

READ FULL TEXT
research
01/01/2019

Exploring spectro-temporal features in end-to-end convolutional neural networks

Triangular, overlapping Mel-scaled filters ("f-banks") are the current s...
research
02/08/2021

Non-linear frequency warping using constant-Q transformation for speech emotion recognition

In this work, we explore the constant-Q transform (CQT) for speech emoti...
research
10/23/2019

Low-frequency compensated synthetic impulse responses for improved far-field speech recognition

We propose a method for generating low-frequency compensated synthetic i...
research
07/13/2023

Personalization for BERT-based Discriminative Speech Recognition Rescoring

Recognition of personalized content remains a challenge in end-to-end sp...
research
11/29/2022

Analysis of constant-Q filterbank based representations for speech emotion recognition

This work analyzes the constant-Q filterbank-based time-frequency repres...
research
03/12/2021

Learning spectro-temporal representations of complex sounds with parameterized neural networks

Deep Learning models have become potential candidates for auditory neuro...

Please sign up or login with your details

Forgot password? Click here to reset