EACELEB: An East Asian Language Speaking Celebrity Dataset for Speaker Recognition

03/10/2022
by   Desmond Caulley, et al.
0

Large datasets are very useful for training speaker recognition systems, and various research groups have constructed several over the years. Voxceleb is a large dataset for speaker recognition that is extracted from Youtube videos. This paper presents an audio-visual method for acquiring audio data from Youtube given the speaker's name as input. The system follows a pipeline similar to that of the Voxceleb data acquisition method. However, our work focuses on fast data acquisition by using face-tracking in subsequent frames once a face has been detected – this is preferable over face detection for every frame considering its computational cost. We show that applying audio diarization to our data after acquiring it can yield equal error rates comparable to Voxceleb. A secondary set of experiments showed that we could further decrease the error rate by fine-tuning a pre-trained x-vector system with the acquired data. Like Voxceleb, the work here focuses primarily on developing audio for celebrities. However, unlike Voxceleb, our target audio data is from celebrities in East Asian countries. Finally, we set up a speaker verification task to evaluate the accuracy of our acquired data. After diarization and fine-tuning, we achieved an equal error rate of approximately 4% across our entire dataset.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/08/2020

HLT-NUS Submission for NIST 2019 Multimedia Speaker Recognition Evaluation

This work describes the speaker verification system developed by Human L...
research
07/05/2018

Detection and Analysis of Content Creator Collaborations in YouTube Videos using Face- and Speaker-Recognition

This work discusses and implements the application of speaker recognitio...
research
02/25/2023

Speaker Recognition in Realistic Scenario Using Multimodal Data

In recent years, an association is established between faces and voices ...
research
07/02/2020

Spot the conversation: speaker diarisation in the wild

The goal of this paper is speaker diarisation of videos collected 'in th...
research
11/30/2022

MSV Challenge 2022: NPU-HC Speaker Verification System for Low-resource Indian Languages

This report describes the NPU-HC speaker verification system submitted t...
research
11/06/2020

Large-scale multilingual audio visual dubbing

We describe a system for large-scale audiovisual translation and dubbing...
research
03/07/2022

Visually Supervised Speaker Detection and Localization via Microphone Array

Active speaker detection (ASD) is a multi-modal task that aims to identi...

Please sign up or login with your details

Forgot password? Click here to reset