Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation

by   Christoph Boeddeker, et al.

Time-domain training criteria have proven to be very effective for the separation of single-channel non-reverberant speech mixtures. Likewise, mask-based beamforming has shown impressive performance in multi-channel reverberant speech enhancement and source separation. Here, we propose to combine neural network supported multi-channel source separation with a time-domain training objective function. For the objective we propose to use a convolutive transfer function invariant Signal-to-Distortion Ratio (CI-SDR) based loss. While this is a well-known evaluation metric (BSS Eval), it has not been used as a training objective before. To show the effectiveness, we demonstrate the performance on LibriSpeech based reverberant mixtures. On this task, the proposed system approaches the error rate obtained on single-source non-reverberant input, i.e., LibriSpeech test_clean, with a difference of only 1.2 percentage points, thus outperforming a conventional permutation invariant training based system and alternative objectives like Scale Invariant Signal-to-Distortion Ratio by a large margin.



There are no comments yet.


page 1

page 2

page 3

page 4


Monaural source separation: From anechoic to reverberant environments

Impressive progress in neural network-based single-channel speech source...

Surrogate Source Model Learning for Determined Source Separation

We propose to learn surrogate functions of universal speech priors for d...

SDR - half-baked or well done?

In speech enhancement and source separation, signal-to-noise ratio is a ...

Demystifying TasNet: A Dissecting Approach

In recent years time domain speech separation has excelled over frequenc...

SA-SDR: A novel loss function for separation of meeting style data

Many state-of-the-art neural network-based source separation systems use...

Unsupervised Source Separation via Self-Supervised Training

We introduce two novel unsupervised (blind) source separation methods, w...

Heterogeneous Target Speech Separation

We introduce a new paradigm for single-channel target source separation ...
This week in AI

Get the week's most popular data science and artificial intelligence research sent straight to your inbox every Saturday.