A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction

03/31/2022
by   Zexu Pan, et al.
0

Speaker extraction algorithm extracts the target speech from a mixture speech containing interference speech and background noise. The extraction process sometimes over-suppresses the extracted target speech, which not only creates artifacts during listening but also harms the performance of downstream automatic speech recognition algorithms. We propose a hybrid continuity loss function for time-domain speaker extraction algorithms to settle the over-suppression problem. On top of the waveform-level loss used for superior signal quality, i.e., SI-SDR, we introduce a multi-resolution delta spectrum loss in the frequency-domain, to ensure the continuity of an extracted speech signal, thus alleviating the over-suppression. We examine the hybrid continuity loss function using a time-domain audio-visual speaker extraction algorithm on the YouTube LRS2-BBC dataset. Experimental results show that the proposed loss function reduces the over-suppression and improves the word error rate of speech recognition on both clean and noisy two-speakers mixtures, without harming the reconstructed speech quality.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/26/2019

Auxiliary Interference Speaker Loss for Target-Speaker Speech Recognition

In this paper, we propose a novel auxiliary loss function for target-spe...
research
05/09/2022

Speaker Reinforcement Using Target Source Extraction for Robust Automatic Speech Recognition

Improving the accuracy of single-channel automatic speech recognition (A...
research
07/03/2022

Towards Error-Resilient Neural Speech Coding

Neural audio coding has shown very promising results recently in the lit...
research
11/26/2020

Improving RNN Transducer With Target Speaker Extraction and Neural Uncertainty Estimation

Target-speaker speech recognition aims to recognize target-speaker speec...
research
02/07/2021

Time-Domain Speech Extraction with Spatial Information and Multi Speaker Conditioning Mechanism

In this paper, we present a novel multi-channel speech extraction system...
research
03/09/2023

X-SepFormer: End-to-end Speaker Extraction Network with Explicit Optimization on Speaker Confusion

Target speech extraction (TSE) systems are designed to extract target sp...
research
10/25/2020

Speakerfilter-Pro: an improved target speaker extractor combines the time domain and frequency domain

This paper introduces an improved target speaker extractor, referred to ...

Please sign up or login with your details

Forgot password? Click here to reset