Speaker Overlap-aware Neural Diarization for Multi-party Meeting Analysis

11/18/2022
by   Zhihao Du, et al.
0

Recently, hybrid systems of clustering and neural diarization models have been successfully applied in multi-party meeting analysis. However, current models always treat overlapped speaker diarization as a multi-label classification problem, where speaker dependency and overlaps are not well considered. To overcome the disadvantages, we reformulate overlapped speaker diarization task as a single-label prediction problem via the proposed power set encoding (PSE). Through this formulation, speaker dependency and overlaps can be explicitly modeled. To fully leverage this formulation, we further propose the speaker overlap-aware neural diarization (SOND) model, which consists of a context-independent (CI) scorer to model global speaker discriminability, a context-dependent scorer (CD) to model local discriminability, and a speaker combining network (SCN) to combine and reassign speaker activities. Experimental results show that using the proposed formulation can outperform the state-of-the-art methods based on target speaker voice activity detection, and the performance can be further improved with SOND, resulting in a 6.30

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/08/2023

TOLD: A Novel Two-Stage Overlap-Aware Framework for Speaker Diarization

Recently, end-to-end neural diarization (EEND) is introduced and achieve...
research
11/28/2021

Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information

Overlapping speech diarization is always treated as a multi-label classi...
research
08/10/2017

Towards Neural Speaker Modeling in Multi-Party Conversation: The Task, Dataset, and Models

Neural network-based dialog systems are attracting increasing attention ...
research
03/18/2022

Speaker Embedding-aware Neural Diarization: an Efficient Framework for Overlapping Speech Diarization in Meeting Scenarios

Overlapping speech diarization has been traditionally treated as a multi...
research
11/03/2020

DOVER-Lap: A Method for Combining Overlap-aware Diarization Outputs

Several advances have been made recently towards handling overlapping sp...
research
02/10/2022

The USTC-Ximalaya system for the ICASSP 2022 multi-channel multi-party meeting transcription (M2MeT) challenge

We propose two improvements to target-speaker voice activity detection (...
research
11/14/2022

Multi-Label Training for Text-Independent Speaker Identification

In this paper, we propose a novel strategy for text-independent speaker ...

Please sign up or login with your details

Forgot password? Click here to reset