Speaker Recognition from raw waveform with SincNet

07/29/2018
by   Mirco Ravanelli, et al.
0

Deep learning is progressively gaining popularity as a viable alternative to i-vectors for speaker recognition. Promising results have been recently obtained with Convolutional Neural Networks (CNNs) when fed by raw speech samples directly. Rather than employing standard hand-crafted features, the latter CNNs learn low-level speech representations from waveforms, potentially allowing the network to better capture important narrow-band speaker characteristics such as pitch and formants. Proper design of the neural network is crucial to achieve this goal. This paper proposes a novel CNN architecture, called SincNet, that encourages the first convolutional layer to discover more meaningful filters. SincNet is based on parametrized sinc functions, which implement band-pass filters. In contrast to standard CNNs, that learn all elements of each filter, only low and high cutoff frequencies are directly learned from data with the proposed method. This offers a very compact and efficient way to derive a customized filter bank specifically tuned for the desired application. Our experiments, conducted on both speaker identification and speaker verification tasks, show that the proposed architecture converges faster and performs better than a standard CNN on raw waveforms.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/13/2018

Speech and Speaker Recognition from Raw Waveform with SincNet

Deep neural networks can learn complex and abstract representations, tha...
research
05/31/2021

Speaker Identification from Raw Waveform with LineNet

Speaker Identification using i-vector has gradually been replaced by spe...
research
11/23/2018

Interpretable Convolutional Filters with SincNet

Deep learning is currently playing a crucial role toward higher levels o...
research
11/24/2022

A new Speech Feature Fusion method with cross gate parallel CNN for Speaker Recognition

In this paper, a new speech feature fusion method is proposed for speake...
research
11/12/2019

WaveletKernelNet: An Interpretable Deep Neural Network for Industrial Intelligent Diagnosis

Convolutional neural network (CNN), with ability of feature learning and...
research
12/01/2018

Learning Speaker Representations with Mutual Information

Learning good representations is of crucial importance in deep learning....
research
04/05/2020

Speaker Recognition using SincNet and X-Vector Fusion

In this paper, we propose an innovative approach to perform speaker reco...

Please sign up or login with your details

Forgot password? Click here to reset