SDR - half-baked or well done?

11/06/2018
by   Jonathan Le Roux, et al.
0

In speech enhancement and source separation, signal-to-noise ratio is a ubiquitous objective measure of denoising/separation quality. A decade ago, the BSS_eval toolkit was developed to give researchers worldwide a way to evaluate the quality of their algorithms in a simple, fair, and hopefully insightful way: it attempted to account for channel variations, and to not only evaluate the total distortion in the estimated signal but also split it in terms of various factors such as remaining interference, newly added artifacts, and channel errors. In recent years, hundreds of papers have been relying on this toolkit to evaluate their proposed methods and compare them to previous works, often arguing that differences on the order of 0.1 dB proved the effectiveness of a method over others. We argue here that the signal-to-distortion ratio (SDR) implemented in the BSS_eval toolkit has generally been improperly used and abused, especially in the case of single-channel separation, resulting in misleading results. We propose to use a slightly modified definition, resulting in a simpler, more robust measure, called scale-invariant SDR (SI-SDR). We present various examples of critical failure of the original SDR that SI-SDR overcomes.

READ FULL TEXT
research
11/30/2020

Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation

Time-domain training criteria have proven to be very effective for the s...
research
11/07/2020

ESPnet-se: end-to-end speech enhancement and separation toolkit designed for asr integration

We present ESPnet-SE, which is designed for the quick development of spe...
research
06/24/2019

Single-Channel Speech Separation with Auxiliary Speaker Embeddings

We present a novel source separation model to decompose asingle-channel ...
research
06/02/2021

Should We Always Separate?: Switching Between Enhanced and Observed Signals for Overlapping Speech Recognition

Although recent advances in deep learning technology improved automatic ...
research
11/20/2019

Demystifying TasNet: A Dissecting Approach

In recent years time domain speech separation has excelled over frequenc...
research
05/27/2021

An Improved Measure of Musical Noise Based on Spectral Kurtosis

Audio processing methods operating on a time-frequency representation of...

Please sign up or login with your details

Forgot password? Click here to reset