TSUP Speaker Diarization System for Conversational Short-phrase Speaker Diarization Challenge

10/26/2022
by   Bowen Pang, et al.
0

This paper describes the TSUP team's submission to the ISCSLP 2022 conversational short-phrase speaker diarization (CSSD) challenge which particularly focuses on short-phrase conversations with a new evaluation metric called conversational diarization error rate (CDER). In this challenge, we explore three kinds of typical speaker diarization systems, which are spectral clustering(SC) based diarization, target-speaker voice activity detection(TS-VAD) and end-to-end neural diarization(EEND) respectively. Our major findings are summarized as follows. First, the SC approach is more favored over the other two approaches under the new CDER metric. Second, tuning on hyperparameters is essential to CDER for all three types of speaker diarization systems. Specifically, CDER becomes smaller when the length of sub-segments setting longer. Finally, multi-system fusion through DOVER-LAP will worsen the CDER metric on the challenge data. Our submitted SC system eventually ranks the third place in the challenge.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

Analysis of the BUT Diarization System for VoxConverse Challenge

This paper describes the system developed by the BUT team for the fourth...
research
09/20/2022

The BUCEA Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2022

This paper describes the BUCEA speaker diarization system for the 2022 V...
research
09/05/2021

The ByteDance Speaker Diarization System for the VoxCeleb Speaker Recognition Challenge 2021

This paper describes the ByteDance speaker diarization system for the fo...
research
03/09/2023

X-SepFormer: End-to-end Speaker Extraction Network with Explicit Optimization on Speaker Confusion

Target speech extraction (TSE) systems are designed to extract target sp...
research
09/20/2022

MultiMediate '22: Backchannel Detection and Agreement Estimation in Group Interactions

Backchannels, i.e. short interjections of the listener, serve important ...
research
11/12/2022

Multi-Speaker and Wide-Band Simulated Conversations as Training Data for End-to-End Neural Diarization

End-to-end diarization presents an attractive alternative to standard ca...
research
04/09/2023

An investigation of speaker independent phrase break models in End-to-End TTS systems

This paper presents our work on phrase break prediction in the context o...

Please sign up or login with your details

Forgot password? Click here to reset