Improving Speaker Diarization using Semantic Information: Joint Pairwise Constraints Propagation

09/19/2023
by   Luyao Cheng, et al.
0

Speaker diarization has gained considerable attention within speech processing research community. Mainstream speaker diarization rely primarily on speakers' voice characteristics extracted from acoustic signals and often overlook the potential of semantic information. Considering the fact that speech signals can efficiently convey the content of a speech, it is of our interest to fully exploit these semantic cues utilizing language models. In this work we propose a novel approach to effectively leverage semantic information in clustering-based speaker diarization systems. Firstly, we introduce spoken language understanding modules to extract speaker-related semantic information and utilize these information to construct pairwise constraints. Secondly, we present a novel framework to integrate these constraints into the speaker diarization pipeline, enhancing the performance of the entire system. Extensive experiments conducted on the public dataset demonstrate the consistent superiority of our proposed approach over acoustic-only speaker diarization systems.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/22/2023

Exploring Speaker-Related Information in Spoken Language Understanding for Better Speaker Diarization

Speaker diarization(SD) is a classic task in speech processing and is cr...
research
07/25/2022

ConceptBeam: Concept Driven Target Speech Extraction

We propose a novel framework for target speech extraction based on seman...
research
08/03/2022

A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis

In human speech, the attitude of a speaker cannot be fully expressed onl...
research
09/11/2023

Enhancing Speaker Diarization with Large Language Models: A Contextual Beam Search Approach

Large language models (LLMs) have shown great promise for capturing cont...
research
10/24/2022

Weak-Supervised Dysarthria-invariant Features for Spoken Language Understanding using an FHVAE and Adversarial Training

The scarcity of training data and the large speaker variation in dysarth...
research
11/17/2022

Token-level Speaker Change Detection Using Speaker Difference and Speech Content via Continuous Integrate-and-fire

In multi-talker scenarios such as meetings and conversations, speech pro...
research
10/27/2020

Deep generative factorization for speech signal

Various information factors are blended in speech signals, which forms t...

Please sign up or login with your details

Forgot password? Click here to reset