Multichannel CRNN for Speaker Counting: an Analysis of Performance

01/06/2021
by   Pierre-Amaury Grumiaux, et al.
0

Speaker counting is the task of estimating the number of people that are simultaneously speaking in an audio recording. For several audio processing tasks such as speaker diarization, separation, localization and tracking, knowing the number of speakers at each timestep is a prerequisite, or at least it can be a strong advantage, in addition to enabling a low latency processing. In a previous work, we addressed the speaker counting problem with a multichannel convolutional recurrent neural network which produces an estimation at a short-term frame resolution. In this work, we show that, for a given frame, there is an optimal position in the input sequence for best prediction accuracy. We empirically demonstrate the link between that optimal position, the length of the input sequence and the size of the convolutional filters.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/17/2020

High-Resolution Speaker Counting In Reverberant Rooms Using CRNN With Ambisonics Features

Speaker counting is the task of estimating the number of people that are...
research
03/31/2022

EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of Speakers

In this paper, we present a novel framework that jointly performs speake...
research
06/10/2020

Speaker Diarization: Using Recurrent Neural Networks

Speaker Diarization is the problem of separating speakers in an audio. T...
research
02/21/2019

All-neural online source separation, counting, and diarization for meeting analysis

Automatic meeting analysis comprises the tasks of speaker counting, spea...
research
05/18/2020

A Thousand Words are Worth More Than One Recording: NLP Based Speaker Change Point Detection

Speaker Diarization (SD) consists of splitting or segmenting an input au...
research
04/25/2019

Divide and Conquer: A Deep CASA Approach to Talker-independent Monaural Speaker Separation

We address talker-independent monaural speaker separation from the persp...
research
12/12/2017

Classification vs. Regression in Supervised Learning for Single Channel Speaker Count Estimation

The task of estimating the maximum number of concurrent speakers from si...

Please sign up or login with your details

Forgot password? Click here to reset