VoxWatch: An open-set speaker recognition benchmark on VoxCeleb

06/30/2023
by   Raghuveer Peri, et al.
0

Despite its broad practical applications such as in fraud prevention, open-set speaker identification (OSI) has received less attention in the speaker recognition community compared to speaker verification (SV). OSI deals with determining if a test speech sample belongs to a speaker from a set of pre-enrolled individuals (in-set) or if it is from an out-of-set speaker. In addition to the typical challenges associated with speech variability, OSI is prone to the "false-alarm problem"; as the size of the in-set speaker population (a.k.a watchlist) grows, the out-of-set scores become larger, leading to increased false alarm rates. This is in particular challenging for applications in financial institutions and border security where the watchlist size is typically of the order of several thousand speakers. Therefore, it is important to systematically quantify the false-alarm problem, and develop techniques that alleviate the impact of watchlist size on detection performance. Prior studies on this problem are sparse, and lack a common benchmark for systematic evaluations. In this paper, we present the first public benchmark for OSI, developed using the VoxCeleb dataset. We quantify the effect of the watchlist size and speech duration on the watchlist-based speaker detection task using three strong neural network based systems. In contrast to the findings from prior research, we show that the commonly adopted adaptive score normalization is not guaranteed to improve the performance for this task. On the other hand, we show that score calibration and score fusion, two other commonly used techniques in SV, result in significant improvements in OSI performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/07/2016

Incorporation of Speech Duration Information in Score Fusion of Speaker Recognition Systems

In recent years identity-vector (i-vector) based speaker verification (S...
research
02/07/2020

LEAP System for SRE19 Challenge – Improvements and Error Analysis

The NIST Speaker Recognition Evaluation - Conversational Telephone Speec...
research
08/08/2020

Extrapolating false alarm rates in automatic speaker verification

Automatic speaker verification (ASV) vendors and corpus providers would ...
research
04/04/2022

On The Model Size Selection For Speaker Identification

In this paper we evaluate the relevance of the model size for speaker id...
research
03/28/2022

Investigation of Different Calibration Methods for Deep Speaker Embedding based Verification Systems

Deep speaker embedding extractors have already become new state-of-the-a...
research
05/25/2021

Utterance partitioning for speaker recognition: an experimental review and analysis with new findings under GMM-SVM framework

The performance of speaker recognition system is highly dependent on the...
research
05/10/2022

Gamified Speaker Comparison by Listening

We address speaker comparison by listening in a game-like environment, h...

Please sign up or login with your details

Forgot password? Click here to reset