Speaker Recognition in Realistic Scenario Using Multimodal Data

02/25/2023
by   Saqlain Hussain Shah, et al.
0

In recent years, an association is established between faces and voices of celebrities leveraging large scale audio-visual information from YouTube. The availability of large scale audio-visual datasets is instrumental in developing speaker recognition methods based on standard Convolutional Neural Networks. Thus, the aim of this paper is to leverage large scale audio-visual information to improve speaker recognition task. To achieve this task, we proposed a two-branch network to learn joint representations of faces and voices in a multimodal system. Afterwards, features are extracted from the two-branch network to train a classifier for speaker recognition. We evaluated our proposed framework on a large scale audio-visual dataset named VoxCeleb1. Our results show that addition of facial information improved the performance of speaker recognition. Moreover, our results indicate that there is an overlap between face and voice.

READ FULL TEXT
research
01/21/2023

A Multi-Purpose Audio-Visual Corpus for Multi-Modal Persian Speech Recognition: the Arman-AV Dataset

In recent years, significant progress has been made in automatic lip rea...
research
06/14/2018

VoxCeleb2: Deep Speaker Recognition

The objective of this paper is speaker recognition under noisy and uncon...
research
09/06/2021

Fruit-CoV: An Efficient Vision-based Framework for Speedy Detection and Diagnosis of SARS-CoV-2 Infections Through Recorded Cough Sounds

SARS-CoV-2 is colloquially known as COVID-19 that had an initial outbrea...
research
03/10/2022

EACELEB: An East Asian Language Speaking Celebrity Dataset for Speaker Recognition

Large datasets are very useful for training speaker recognition systems,...
research
12/05/2020

SpeakingFaces: A Large-Scale Multimodal Dataset of Voice Commands with Visual and Thermal Video Streams

We present SpeakingFaces as a publicly-available large-scale multimodal ...
research
04/21/2022

The 2021 NIST Speaker Recognition Evaluation

The 2021 Speaker Recognition Evaluation (SRE21) was the latest cycle of ...
research
08/14/2023

VoxBlink: A Large Scale Speaker Verification Dataset on Camera

In this paper, we introduce a large-scale and high-quality audio-visual ...

Please sign up or login with your details

Forgot password? Click here to reset