Disentangling the Impacts of Language and Channel Variability on Speech Separation Networks

03/30/2022
by   Fan-Lin Wang, et al.
0

Because the performance of speech separation is excellent for speech in which two speakers completely overlap, research attention has been shifted to dealing with more realistic scenarios. However, domain mismatch between training/test situations due to factors, such as speaker, content, channel, and environment, remains a severe problem for speech separation. Speaker and environment mismatches have been studied in the existing literature. Nevertheless, there are few studies on speech content and channel mismatches. Moreover, the impacts of language and channel in these studies are mostly tangled. In this study, we create several datasets for various experiments. The results show that the impacts of different languages are small enough to be ignored compared to the impacts of different channels. In our experiments, training on data recorded by Android phones leads to the best generalizability. Moreover, we provide a new solution for channel mismatch by evaluating projection, where the channel similarity can be measured and used to effectively select additional training data to improve the performance of in-the-wild test data.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/28/2022

Analyzing Language-Independent Speaker Anonymization Framework under Unseen Conditions

In our previous work, we proposed a language-independent speaker anonymi...
research
07/06/2021

Separation Guided Speaker Diarization in Realistic Mismatched Conditions

We propose a separation guided speaker diarization (SGSD) approach by fu...
research
10/27/2022

CasNet: Investigating Channel Robustness for Speech Separation

Recording channel mismatch between training and testing conditions has b...
research
01/16/2023

Multi-resolution location-based training for multi-channel continuous speech separation

The performance of automatic speech recognition (ASR) systems severely d...
research
05/18/2023

Speech Separation based on Contrastive Learning and Deep Modularization

The current monaural state of the art tools for speech separation relies...
research
02/25/2019

Channel adversarial training for cross-channel text-independent speaker recognition

The conventional speaker recognition frameworks (e.g., the i-vector and ...
research
06/25/2020

Speaker-Conditional Chain Model for Speech Separation and Extraction

Speech separation has been extensively explored to tackle the cocktail p...

Please sign up or login with your details

Forgot password? Click here to reset