Deep multi-metric learning for text-independent speaker verification

07/17/2020
by   Jiwei Xu, et al.
32

Text-independent speaker verification is an important artificial intelligence problem that has a wide spectrum of applications, such as criminal investigation, payment certification, and interest-based customer services. The purpose of text-independent speaker verification is to determine whether two given uncontrolled utterances originate from the same speaker or not. Extracting speech features for each speaker using deep neural networks is a promising direction to explore and a straightforward solution is to train the discriminative feature extraction network by using a metric learning loss function. However, a single loss function often has certain limitations. Thus, we use deep multi-metric learning to address the problem and introduce three different losses for this problem, i.e., triplet loss, n-pair loss and angular loss. The three loss functions work in a cooperative way to train a feature extraction network equipped with Residual connections and squeeze-and-excitation attention. We conduct experiments on the large-scale VoxCeleb2 dataset, which contains over a million utterances from over 6,000 speakers, and the proposed deep neural network obtains an equal error rate of 3.48%, which is a very competitive result. Codes for both training and testing and pretrained models are available at <https://github.com/GreatJiweix/DmmlTiSV>, which is the first publicly available code repository for large-scale text-independent speaker verification with performance on par with the state-of-the-art systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2019

End-to-end losses based on speaker basis vectors and all-speaker hard negative mining for speaker verification

In recent years, speaker verification has been primarily performed using...
research
10/21/2020

Multi-task Metric Learning for Text-independent Speaker Verification

In this work, we introduce metric learning (ML) to enhance the deep embe...
research
02/22/2022

Contrastive-mixup learning for improved speaker verification

This paper proposes a novel formulation of prototypical loss with mixup ...
research
09/05/2021

Efficient Attention Branch Network with Combined Loss Function for Automatic Speaker Verification Spoof Detection

Many endeavors have sought to develop countermeasure techniques as enhan...
research
07/22/2018

Unified Hypersphere Embedding for Speaker Recognition

Incremental improvements in accuracy of Convolutional Neural Networks ar...
research
07/17/2023

Exploring Binary Classification Loss For Speaker Verification

The mismatch between close-set training and open-set testing usually lea...
research
03/31/2020

A Comparison of Metric Learning Loss Functions for End-To-End Speaker Verification

Despite the growing popularity of metric learning approaches, very littl...

Please sign up or login with your details

Forgot password? Click here to reset