A Real-time Speaker Diarization System Based on Spatial Spectrum

07/20/2021
by   Siqi Zheng, et al.
0

In this paper we describe a speaker diarization system that enables localization and identification of all speakers present in a conversation or meeting. We propose a novel systematic approach to tackle several long-standing challenges in speaker diarization tasks: (1) to segment and separate overlapping speech from two speakers; (2) to estimate the number of speakers when participants may enter or leave the conversation at any time; (3) to provide accurate speaker identification on short text-independent utterances; (4) to track down speakers movement during the conversation; (5) to detect speaker change incidence real-time. First, a differential directional microphone array-based approach is exploited to capture the target speakers' voice in far-field adverse environment. Second, an online speaker-location joint clustering approach is proposed to keep track of speaker location. Third, an instant speaker number detector is developed to trigger the mechanism that separates overlapped speech. The results suggest that our system effectively incorporates spatial information and achieves significant gains.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/07/2019

Target Speaker Extraction for Overlapped Multi-Talker Speaker Verification

The performance of speaker verification degrades significantly when the ...
research
11/09/2022

Absolute decision corrupts absolutely: conservative online speaker diarisation

Our focus lies in developing an online speaker diarisation framework whi...
research
10/20/2017

Multi-Task Learning for Speaker-Role Adaptation in Neural Conversation Models

Building a persona-based conversation agent is challenging owing to the ...
research
03/30/2022

Multi-target Filter and Detector for Unknown-number Speaker Diarization

A strong representation of a target speaker can aid in extracting import...
research
10/30/2021

Real-time Speaker counting in a cocktail party scenario using Attention-guided Convolutional Neural Network

Most current speech technology systems are designed to operate well even...
research
09/18/2019

RTTD-ID: Tracked Captions with Multiple Speakers for Deaf Students

Students who are deaf and hard of hearing cannot hear in class and do no...
research
09/23/2021

Joint speaker diarisation and tracking in switching state-space model

Speakers may move around while diarisation is being performed. When a mi...

Please sign up or login with your details

Forgot password? Click here to reset