Speaker Diarization Using Stereo Audio Channels: Preliminary Study on Utterance Clustering

09/10/2020
by   Yingjun Dong, et al.
0

Speaker diarization is one of the actively researched topics in audio signal processing and machine learning. Utterance clustering is a critical part of a speaker diarization task. In this study, we aim to improve the performance of utterance clustering by processing multichannel (stereo) audio signals. We generated processed audio signals by combining left- and right-channel audio signals in a few different ways and then extracted embedded features (also called d-vectors) from those processed audio signals. We applied the Gaussian mixture model (GMM) for supervised utterance clustering. In the training phase, we used a parameter sharing GMM to train the model for each speaker. In the testing phase, we selected the speaker with the maximum likelihood as the detected speaker. Results of experiments with real audio recordings of multi-person discussion sessions showed that our proposed method that used multichannel audio signals achieved significantly better performance than a conventional method with mono audio signals.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/22/2018

Speaker Clustering With Neural Networks And Audio Processing

Speaker clustering is the task of differentiating speakers in a recordin...
research
10/17/2019

H-VECTORS: Utterance-level Speaker Embedding Using A Hierarchical Attention Model

In this paper, a hierarchical attention network to generate utterance-le...
research
04/01/2022

Multimodal Clustering with Role Induced Constraints for Speaker Diarization

Speaker clustering is an essential step in conventional speaker diarizat...
research
08/18/2018

Robust Speaker Clustering using Mixtures of von Mises-Fisher Distributions for Naturalistic Audio Streams

Speaker Diarization (i.e. determining who spoke and when?) for multi-spe...
research
10/25/2019

Adaptive blind audio source extraction supervised by dominant speaker identification using x-vectors

We propose a novel algorithm for adaptive blind audio source extraction....
research
06/05/2022

Geometrically-Motivated Primary-Ambient Decomposition With Center-Channel Extraction

A geometrically-motivated method for primary-ambient decomposition is pr...
research
12/29/2017

Spectral analysis for nonstationary audio

A new approach for the analysis of nonstationary signals is proposed, wi...

Please sign up or login with your details

Forgot password? Click here to reset