Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

09/28/2018
by   Xiaofei Li, et al.
0

This paper addresses the problem of online multiple-speaker localization and tracking in reverberant environments. We propose to use the direct-path relative transfer function (DP-RTF) -- a feature that encodes the inter-channel direct-path information robust against reverberation, hence well suited for reliable localization. A complex Gaussian mixture model (CGMM) is then used, such that each component weight represents the probability that an active speaker is present at a corresponding candidate source direction. Exponentiated gradient descent is used to update these weights online by minimizing a combination of negative log-likelihood and entropy. The latter imposes sparsity over the number of audio sources, since in practice only a few speakers are simultaneously active. The outputs of this online localization process are then used as observations within a Bayesian filtering process whose computation is made tractable via an instance of variational expectation-maximization. Birth and sleeping processes are used to account for the intermittent nature of speech. The method is thoroughly evaluated using several datasets.

READ FULL TEXT
research
12/11/2018

A cascaded multiple-speaker localization and tracking system

This paper presents an online multiple-speaker localization and tracking...
research
12/07/2020

Reverberant Sound Localization with a Robot Head Based on Direct-Path Relative Transfer Function

This paper addresses the problem of sound-source localization (SSL) with...
research
12/19/2018

Tracking Multiple Audio Sources with the von Mises Distribution and Variational EM

In this paper, we address the problem of simultaneously tracking several...
research
02/16/2022

Learning Deep Direct-Path Relative Transfer Function for Binaural Sound Source Localization

Direct-path relative transfer function (DP-RTF) refers to the ratio betw...
research
07/13/2020

DNN Speaker Tracking with Embeddings

In multi-speaker applications is common to have pre-computed models from...
research
08/07/2020

Multi-speaker Text-to-speech Synthesis Using Deep Gaussian Processes

Multi-speaker speech synthesis is a technique for modeling multiple spea...
research
01/28/2020

Subband Weighting for Binaural Speech Source Localization

We consider the task of speech source localization from a bin-aural reco...

Please sign up or login with your details

Forgot password? Click here to reset