Multi-Channel End-to-End Neural Diarization with Distributed Microphones

10/10/2021
by   Shota Horiguchi, et al.
0

Recent progress on end-to-end neural diarization (EEND) has enabled overlap-aware speaker diarization with a single neural network. This paper proposes to enhance EEND by using multi-channel signals from distributed microphones. We replace Transformer encoders in EEND with two types of encoders that process a multi-channel input: spatio-temporal and co-attention encoders. Both are independent of the number and geometry of microphones and suitable for distributed microphone settings. We also propose a model adaptation method using only single-channel recordings. With simulated and real-recorded datasets, we demonstrated that the proposed method outperformed conventional EEND when a multi-channel input was given while maintaining comparable performance with a single-channel input. We also showed that the proposed method performed well even when spatial information is inoperative given multi-channel inputs, such as in hybrid meetings in which the utterances of multiple remote participants are played back from the same loudspeaker.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/07/2022

Mutual Learning of Single- and Multi-Channel End-to-End Neural Diarization

Due to the high performance of multi-channel speech processing, we can u...
research
03/15/2023

Beamformer-Guided Target Speaker Extraction

We propose a Beamformer-guided Target Speaker Extraction (BG-TSE) method...
research
04/28/2020

Neural Speech Separation Using Spatially Distributed Microphones

This paper proposes a neural network based speech separation method usin...
research
02/17/2022

Non-Autoregressive ASR with Self-Conditioned Folded Encoders

This paper proposes CTC-based non-autoregressive ASR with self-condition...
research
05/05/2021

End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings

We present an end-to-end deep network model that performs meeting diariz...
research
10/12/2021

VarArray: Array-Geometry-Agnostic Continuous Speech Separation

Continuous speech separation using a microphone array was shown to be pr...
research
11/21/2017

Quantifying Performance of Bipedal Standing with Multi-channel EMG

Spinal cord stimulation has enabled humans with motor complete spinal co...

Please sign up or login with your details

Forgot password? Click here to reset