Speaker Diarization with Region Proposal Network

02/14/2020
by   Zili Huang, et al.
0

Speaker diarization is an important pre-processing step for many speech applications, and it aims to solve the "who spoke when" problem. Although the standard diarization systems can achieve satisfactory results in various scenarios, they are composed of several independently-optimized modules and cannot deal with the overlapped speech. In this paper, we propose a novel speaker diarization method: Region Proposal Network based Speaker Diarization (RPNSD). In this method, a neural network generates overlapped speech segment proposals, and compute their speaker embeddings at the same time. Compared with standard diarization systems, RPNSD has a shorter pipeline and can handle the overlapped speech. Experimental results on three diarization datasets reveal that RPNSD achieves remarkable improvements over the state-of-the-art x-vector baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

The HUAWEI Speaker Diarisation System for the VoxCeleb Speaker Diarisation Challenge

This paper describes system setup of our submission to speaker diarisati...
research
04/05/2022

What can predictive speech coders learn from speaker recognizers?

This paper compares the speech coder and speaker recognizer applications...
research
04/10/2023

Modeling Speaker-Listener Interaction for Backchannel Prediction

We present our latest findings on backchannel modeling novelly motivated...
research
01/16/2023

Improving Target Speaker Extraction with Sparse LDA-transformed Speaker Embeddings

As a practical alternative of speech separation, target speaker extracti...
research
07/06/2023

DSARSR: Deep Stacked Auto-encoders Enhanced Robust Speaker Recognition

Speaker recognition is a biometric modality that utilizes the speaker's ...
research
03/29/2022

NeuraGen-A Low-Resource Neural Network based approach for Gender Classification

Human voice is the source of several important information. This is in t...
research
06/16/2021

Detection of Consonant Errors in Disordered Speech Based on Consonant-vowel Segment Embedding

Speech sound disorder (SSD) refers to a type of developmental disorder i...

Please sign up or login with your details

Forgot password? Click here to reset