Multi-microphone Automatic Speech Segmentation in Meetings Based on Circular Harmonics Features

06/07/2023
by   Theo Mariotte, et al.
0

Speaker diarization is the task of answering Who spoke and when? in an audio stream. Pipeline systems rely on speech segmentation to extract speakers' segments and achieve robust speaker diarization. This paper proposes a common framework to solve three segmentation tasks in the distant speech scenario: Voice Activity Detection (VAD), Overlapped Speech Detection (OSD), and Speaker Change Detection (SCD). In the literature, a few studies investigate the multi-microphone distant speech scenario. In this work, we propose a new set of spatial features based on direction-of-arrival estimations in the circular harmonic domain (CH-DOA). These spatial features are extracted from multi-microphone audio data and combined with standard acoustic features. Experiments on the AMI meeting corpus show that CH-DOA can improve the segmentation while being robust in the case of deactivated microphones.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/08/2012

A Novel Method For Speech Segmentation Based On Speakers' Characteristics

Speech Segmentation is the process change point detection for partitioni...
research
07/24/2023

Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

Voice activity and overlapped speech detection (respectively VAD and OSD...
research
09/21/2020

End-to-End Speaker-Dependent Voice Activity Detection

Voice activity detection (VAD) is an essential pre-processing step for t...
research
04/08/2021

End-to-end speaker segmentation for overlap-aware resegmentation

Speaker segmentation consists in partitioning a conversation between one...
research
02/26/2023

Two-Stream Joint-Training for Speaker Independent Acoustic-to-Articulatory Inversion

Acoustic-to-articulatory inversion (AAI) aims to estimate the parameters...
research
04/08/2021

AISHELL-4: An Open Source Dataset for Speech Enhancement, Separation, Recognition and Speaker Diarization in Conference Scenario

In this paper, we present AISHELL-4, a sizable real-recorded Mandarin sp...
research
08/26/2020

DeepVOX: Discovering Features from Raw Audio for Speaker Recognition in Degraded Audio Signals

Automatic speaker recognition algorithms typically use pre-defined filte...

Please sign up or login with your details

Forgot password? Click here to reset