Prosodic-Enhanced Siamese Convolutional Neural Networks for Cross-Device Text-Independent Speaker Verification

07/31/2018
by   Sobhan Soleymani, et al.
0

In this paper a novel cross-device text-independent speaker verification architecture is proposed. Majority of the state-of-the-art deep architectures that are used for speaker verification tasks consider Mel-frequency cepstral coefficients. In contrast, our proposed Siamese convolutional neural network architecture uses Mel-frequency spectrogram coefficients to benefit from the dependency of the adjacent spectro-temporal features. Moreover, although spectro-temporal features have proved to be highly reliable in speaker verification models, they only represent some aspects of short-term acoustic level traits of the speaker's voice. However, the human voice consists of several linguistic levels such as acoustic, lexicon, prosody, and phonetics, that can be utilized in speaker verification models. To compensate for these inherited shortcomings in spectro-temporal features, we propose to enhance the proposed Siamese convolutional neural network architecture by deploying a multilayer perceptron network to incorporate the prosodic, jitter, and shimmer features. The proposed end-to-end verification architecture performs feature extraction and verification simultaneously. This proposed architecture displays significant improvement over classical signal processing approaches and deep algorithms for forensic cross-device speaker verification.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/02/2018

Text-Independent Speaker Verification Using Long Short-Term Memory Networks

In this paper, an architecture based on Long Short-Term Memory Networks ...
research
10/16/2019

Frequency and temporal convolutional attention for text-independent speaker recognition

Majority of the recent approaches for text-independent speaker recogniti...
research
03/28/2018

Siamese Cookie Embedding Networks for Cross-Device User Matching

Over the last decade, the number of devices per person has increased sub...
research
03/14/2018

Speaker Verification using Convolutional Neural Networks

In this paper, a novel Convolutional Neural Network architecture has bee...
research
10/21/2020

The UPC Speaker Verification System Submitted to VoxCeleb Speaker Recognition Challenge 2020 (VoxSRC-20)

This report describes the submission from Technical University of Catalo...
research
04/17/2019

RawNet: Advanced end-to-end deep neural network using raw waveforms for text-independent speaker verification

Recently, direct modeling of raw waveforms using deep neural networks ha...
research
09/26/2022

Text Independent Speaker Identification System for Access Control

Even human intelligence system fails to offer 100 speeches from a specif...

Please sign up or login with your details

Forgot password? Click here to reset