Listen only to me! How well can target speech extraction handle false alarms?

04/11/2022
by   Marc Delcroix, et al.
0

Target speech extraction (TSE) extracts the speech of a target speaker in a mixture given auxiliary clues characterizing the speaker, such as an enrollment utterance. TSE addresses thus the challenging problem of simultaneously performing separation and speaker identification. There has been much progress in extraction performance following the recent development of neural networks for speech enhancement and separation. Most studies have focused on processing mixtures where the target speaker is actively speaking. However, the target speaker is sometimes silent in practice, i.e., inactive speaker (IS). A typical TSE system will tend to output a signal in IS cases, causing false alarms. This is a severe problem for the practical deployment of TSE systems. This paper aims at understanding better how well TSE systems can handle IS cases. We consider two approaches to deal with IS, (1) training a system to directly output zero signals or (2) detecting IS with an extra speaker verification module. We perform an extensive experimental comparison of these schemes in terms of extraction performance and IS detection using the LibriMix dataset and reveal their pros and cons.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
01/23/2020

Improving speaker discrimination of target speech extraction with time-domain SpeakerBeam

Target speech extraction, which extracts a single target source in a mix...
research
02/21/2022

L-SpEx: Localized Target Speaker Extraction

Speaker extraction aims to extract the target speaker's voice from a mul...
research
10/28/2022

Local-global speaker representation for target speaker extraction

Target speaker extraction is to extract the target speaker's voice from ...
research
02/01/2022

New Insights on Target Speaker Extraction

In recent years, researchers have become increasingly interested in spea...
research
06/16/2022

Strategies to Improve Robustness of Target Speech Extraction to Enrollment Variations

Target speech extraction is a technique to extract the target speaker's ...
research
10/18/2021

Similarity-and-Independence-Aware Beamformer with Iterative Casting and Boost Start for Target Source Extraction Using Reference

Target source extraction is significant for improving human speech intel...
research
06/28/2021

Sparsely Overlapped Speech Training in the Time Domain: Joint Learning of Target Speech Separation and Personal VAD Benefits

Target speech separation is the process of filtering a certain speaker's...

Please sign up or login with your details

Forgot password? Click here to reset