Heterogeneous Target Speech Separation

04/07/2022
by   Efthymios Tzinis, et al.
1

We introduce a new paradigm for single-channel target source separation where the sources of interest can be distinguished using non-mutually exclusive concepts (e.g., loudness, gender, language, spatial location, etc). Our proposed heterogeneous separation framework can seamlessly leverage datasets with large distribution shifts and learn cross-domain representations under a variety of concepts used as conditioning. Our experiments show that training separation models with heterogeneous conditions facilitates the generalization to new concepts with unseen out-of-domain data while also performing substantially higher than single-domain specialist models. Notably, such training leads to more robust learning of new harder source separation discriminative concepts and can yield improvements over permutation invariant training with oracle source selection. We analyze the intrinsic behavior of source separation training with heterogeneous metadata and propose ways to alleviate emerging problems with challenging separation conditions. We release the collection of preparation recipes for all datasets used to further promote research towards this challenging task.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/11/2022

Optimal Condition Training for Target Source Separation

Recent research has shown remarkable performance in leveraging multiple ...
research
03/28/2022

Improving Source Separation by Explicitly Modeling Dependencies Between Sources

We propose a new method for training a supervised source separation syst...
research
11/30/2020

Convolutive Transfer Function Invariant SDR training criteria for Multi-Channel Reverberant Speech Separation

Time-domain training criteria have proven to be very effective for the s...
research
11/06/2018

Building Corpora for Single-Channel Speech Separation Across Multiple Domains

To date, the bulk of research on single-channel speech separation has be...
research
10/21/2022

Adversarial Permutation Invariant Training for Universal Sound Separation

Universal sound separation consists of separating mixes with arbitrary s...
research
07/15/2022

PodcastMix: A dataset for separating music and speech in podcasts

We introduce PodcastMix, a dataset formalizing the task of separating ba...
research
07/27/2023

Complete and separate: Conditional separation with missing target source attribute completion

Recent approaches in source separation leverage semantic information abo...

Please sign up or login with your details

Forgot password? Click here to reset