WildMix Dataset and Spectro-Temporal Transformer Model for Monoaural Audio Source Separation

11/21/2019
by   Amir Zadeh, et al.
5

Monoaural audio source separation is a challenging research area in machine learning. In this area, a mixture containing multiple audio sources is given, and a model is expected to disentangle the mixture into isolated atomic sources. In this paper, we first introduce a challenging new dataset for monoaural source separation called WildMix. WildMix is designed with the goal of extending the boundaries of source separation beyond what previous datasets in this area would allow. It contains diverse in-the-wild recordings from 25 different sound classes, combined with each other using arbitrary composition policies. Source separation often requires modeling long-range dependencies in both temporal and spectral domains. To this end, we introduce a novel trasnformer-based model called Spectro-Temporal Transformer (STT). STT utilizes a specialized encoder, called Spectro-Temporal Encoder (STE). STE highlights temporal and spectral components of sources within a mixture, using a self-attention mechanism. It subsequently disentangles them in a hierarchical manner. In our experiments, STT swiftly outperforms various previous baselines for monoaural source separation on the challenging WildMix dataset.

READ FULL TEXT

page 1

page 4

page 5

page 7

research
09/24/2021

Visual Scene Graphs for Audio Source Separation

State-of-the-art approaches for visually-guided audio source separation ...
research
05/11/2023

Universal Source Separation with Weakly Labelled Data

Universal source separation (USS) is a fundamental research task for com...
research
10/08/2021

TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation

In recent years, many deep learning techniques for single-channel sound ...
research
12/21/2019

Deep Audio Prior

Deep convolutional neural networks are known to specialize in distilling...
research
03/28/2023

A source separation approach to temporal graph modelling for computer networks

Detecting malicious activity within an enterprise computer network can b...
research
03/18/2022

RoSS: Utilizing Robotic Rotation for Audio Source Separation

This paper considers the problem of audio source separation where the go...
research
11/15/2022

Hybrid Transformers for Music Source Separation

A natural question arising in Music Source Separation (MSS) is whether l...

Please sign up or login with your details

Forgot password? Click here to reset