Speech and Speaker Recognition from Raw Waveform with SincNet

12/13/2018
by   Mirco Ravanelli, et al.
0

Deep neural networks can learn complex and abstract representations, that are progressively obtained by combining simpler ones. A recent trend in speech and speaker recognition consists in discovering these representations starting from raw audio samples directly. Differently from standard hand-crafted features such as MFCCs or FBANK, the raw waveform can potentially help neural networks discover better and more customized representations. The high-dimensional raw inputs, however, can make training significantly more challenging. This paper summarizes our recent efforts to develop a neural architecture that efficiently processes speech from audio waveforms. In particular, we propose SincNet, a novel Convolutional Neural Network (CNN) that encourages the first layer to discover meaningful filters by exploiting parametrized sinc functions. In contrast to standard CNNs, which learn all the elements of each filter, only low and high cutoff frequencies of band-pass filters are directly learned from data. This inductive bias offers a very compact way to derive a customized front-end, that only depends on some parameters with a clear physical meaning. Our experiments, conducted on both speaker and speech recognition, show that the proposed architecture converges faster, performs better, and is more computationally efficient than standard CNNs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/23/2018

Interpretable Convolutional Filters with SincNet

Deep learning is currently playing a crucial role toward higher levels o...
research
07/29/2018

Speaker Recognition from raw waveform with SincNet

Deep learning is progressively gaining popularity as a viable alternativ...
research
03/04/2021

End-to-End Mispronunciation Detection and Diagnosis From Raw Waveforms

Mispronunciation detection and diagnosis (MDD) is designed to identify p...
research
11/27/2018

Learning to detect dysarthria from raw speech

Speech classifiers of paralinguistic traits traditionally learn from div...
research
09/30/2019

Acoustic Model Adaptation from Raw Waveforms with SincNet

Raw waveform acoustic modelling has recently gained interest due to neur...
research
05/31/2021

Speaker Identification from Raw Waveform with LineNet

Speaker Identification using i-vector has gradually been replaced by spe...
research
04/05/2020

Speaker Recognition using SincNet and X-Vector Fusion

In this paper, we propose an innovative approach to perform speaker reco...

Please sign up or login with your details

Forgot password? Click here to reset