Learning Filterbanks from Raw Speech for Phone Recognition

11/03/2017
by   Neil Zeghidour, et al.
0

We train a bank of complex filters that operates on the raw waveform and feeds into a convolutional neural network for end-to-end phone recognition. These time-domain filterbanks (TD-filterbanks) are initialized as an approximation of mel-filterbanks (MFSC, for mel-frequency spectral coefficients), and then fine-tuned jointly with the remaining convolutional architecture. We perform phone recognition experiments on TIMIT and show that for several architectures, models trained on TD-filterbanks consistently outperform their counterparts trained on comparable MFSC. We get our best performance by learning all front-end steps, from pre-emphasis up to averaging. Finally, we observe that the filters at convergence have an asymmetric impulse response, and that some of them remain almost analytic.

READ FULL TEXT

page 1

page 4

research
01/01/2019

Exploring spectro-temporal features in end-to-end convolutional neural networks

Triangular, overlapping Mel-scaled filters ("f-banks") are the current s...
research
06/19/2018

End-to-End Speech Recognition From the Raw Waveform

State-of-the-art speech recognition systems rely on fixed, hand-crafted ...
research
03/25/2018

Learning Environmental Sounds with Multi-scale Convolutional Neural Network

Deep learning has dramatically improved the performance of sounds recogn...
research
09/07/2017

Basic Filters for Convolutional Neural Networks Applied to Music: Training or Design?

When convolutional neural networks are used to tackle learning problems ...
research
11/07/2017

End-to-end learning for music audio tagging at scale

The lack of data tends to limit the outcomes of deep learning research -...
research
03/04/2021

End-to-End Mispronunciation Detection and Diagnosis From Raw Waveforms

Mispronunciation detection and diagnosis (MDD) is designed to identify p...
research
10/08/2021

A study of the robustness of raw waveform based speaker embeddings under mismatched conditions

In this paper, we conduct a cross-dataset study on parametric and non-pa...

Please sign up or login with your details

Forgot password? Click here to reset