Multi-target Filter and Detector for Unknown-number Speaker Diarization

03/30/2022
by   Chin-yi Cheng, et al.
0

A strong representation of a target speaker can aid in extracting important information regarding the speaker and detecting the corresponding temporal regions in a multi-speaker conversation. In this study, we propose a neural architecture that simultaneously extracts speaker representations that are consistent with the speaker diarization objective and detects the presence of each speaker frame by frame, regardless of the number of speakers in the conversation. A speaker representation (known as a z-vector) extractor and frame-speaker contextualizer, which is realized by a residual network and processing data in both the temporal and speaker dimensions, are integrated into a unified framework. Testing on the CALLHOME corpus reveals that our model outperforms most methods presented to date. An evaluation in a more challenging case of concurrent speakers ranging from two to seven demonstrates that our model also achieves relative diarization error rate reductions of 26.35 6.4 model and attention-based model, respectively.

READ FULL TEXT
research
03/30/2021

Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker Speech

Speaker verification has been studied mostly under the single-talker con...
research
07/20/2021

A Real-time Speaker Diarization System Based on Spatial Spectrum

In this paper we describe a speaker diarization system that enables loca...
research
06/28/2022

Speaker Verification in Multi-Speaker Environments Using Temporal Feature Fusion

Verifying the identity of a speaker is crucial in modern human-machine i...
research
03/17/2020

High-Resolution Speaker Counting In Reverberant Rooms Using CRNN With Ambisonics Features

Speaker counting is the task of estimating the number of people that are...
research
11/08/2022

High-resolution embedding extractor for speaker diarisation

Speaker embedding extractors significantly influence the performance of ...
research
03/30/2022

Multi-scale Speaker Diarization with Dynamic Scale Weighting

Speaker diarization systems are challenged by a trade-off between the te...
research
06/20/2019

Unleashing the Unused Potential of I-Vectors Enabled by GPU Acceleration

Speaker embeddings are continuous-value vector representations that allo...

Please sign up or login with your details

Forgot password? Click here to reset