The HUAWEI Speaker Diarisation System for the VoxCeleb Speaker Diarisation Challenge

10/22/2020
by   Renyu Wang, et al.
0

This paper describes system setup of our submission to speaker diarisation track (Track 4) of VoxCeleb Speaker Recognition Challenge 2020. Our diarisation system consists of a well-trained neural network based speech enhancement model as pre-processing front-end of input speech signals. We replace conventional energy-based voice activity detection (VAD) with a neural network based VAD. The neural network based VAD provides more accurate annotation of speech segments containing only background music, noise, and other interference, which is crucial to diarisation performance. We apply agglomerative hierarchical clustering (AHC) of x-vectors and variational Bayesian hidden Markov model (VB-HMM) based iterative clustering for speaker clustering. Experimental results demonstrate that our proposed system achieves substantial improvements over the baseline system, yielding diarisation error rate (DER) of 10.45 Jacard error rate (JER) of 22.46

READ FULL TEXT
research
10/22/2020

Analysis of the BUT Diarization System for VoxConverse Challenge

This paper describes the system developed by the BUT team for the fourth...
research
02/14/2020

Speaker Diarization with Region Proposal Network

Speaker diarization is an important pre-processing step for many speech ...
research
09/23/2022

The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022

This technical report describes our system for track 1, 2 and 4 of the V...
research
05/19/2022

Bi-LSTM Scoring Based Similarity Measurement with Agglomerative Hierarchical Clustering (AHC) for Speaker Diarization

Majority of speech signals across different scenarios are never availabl...
research
07/19/2013

Speaker Independent Continuous Speech to Text Converter for Mobile Application

An efficient speech to text converter for mobile application is presente...
research
02/15/2019

An improved uncertainty propagation method for robust i-vector based speaker recognition

The performance of automatic speaker recognition systems degrades when f...
research
09/15/2023

A Real-Time Active Speaker Detection System Integrating an Audio-Visual Signal with a Spatial Querying Mechanism

We introduce a distinctive real-time, causal, neural network-based activ...

Please sign up or login with your details

Forgot password? Click here to reset