VoxBlink: A Large Scale Speaker Verification Dataset on Camera

08/14/2023
by   Yuke Lin, et al.
0

In this paper, we introduce a large-scale and high-quality audio-visual speaker verification dataset, named VoxBlink. We propose an innovative and robust automatic audio-visual data mining pipeline to curate this dataset, which contains 1.45M utterances from 38K speakers. Due to the inherent nature of automated data collection, introducing noisy data is inevitable. Therefore, we also utilize a multi-modal purification step to generate a cleaner version of the VoxBlink, named VoxBlink-clean, comprising 18K identities and 1.02M utterances. In contrast to the VoxCeleb, the VoxBlink sources from short videos of ordinary users, and the covered scenarios can better align with real-life situations. To our best knowledge, the VoxBlink dataset is one of the largest publicly available speaker verification datasets. Leveraging the VoxCeleb and VoxBlink-clean datasets together, we employ diverse speaker verification models with multiple architectural backbones to conduct comprehensive evaluations on the VoxCeleb test sets. Experimental results indicate a substantial enhancement in performance,ranging from 12 architectures upon incorporating the VoxBlink-clean into the training process. The details of the dataset can be found on http://voxblink.github.io

READ FULL TEXT
research
06/14/2018

VoxCeleb2: Deep Speaker Recognition

The objective of this paper is speaker recognition under noisy and uncon...
research
04/07/2023

Margin-Mixup: A Method for Robust Speaker Verification in Multi-Speaker Audio

This paper is concerned with the task of speaker verification on audio w...
research
01/16/2023

OLKAVS: An Open Large-Scale Korean Audio-Visual Speech Dataset

Inspired by humans comprehending speech in a multi-modal manner, various...
research
06/26/2017

VoxCeleb: a large-scale speaker identification dataset

Most existing datasets for speaker identification contain samples obtain...
research
02/25/2023

Speaker Recognition in Realistic Scenario Using Multimodal Data

In recent years, an association is established between faces and voices ...
research
09/20/2021

Improving Text-Independent Speaker Verification with Auxiliary Speakers Using Graph

The paper presents a novel approach to refining similarity scores betwee...
research
09/12/2018

Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances

With the widespread use of intelligent systems, such as smart speakers, ...

Please sign up or login with your details

Forgot password? Click here to reset