DeepAI AI Chat
Log In Sign Up

Spectrogram Feature Losses for Music Source Separation

by   Abhimanyu Sahai, et al.
ETH Zurich
Disney Research

In this paper we study deep learning-based music source separation, and explore using an alternative loss to the standard spectrogram pixel-level L2 loss for model training. Our main contribution is in demonstrating that adding a high-level feature loss term, extracted from the spectrograms using a VGG net, can improve separation quality vis-a-vis a pure pixel-level loss. We show this improvement in the context of the MMDenseNet, a State-of-the-Art deep learning model for this task, for the extraction of drums and vocal sounds from songs in the musdb18 database, covering a broad range of western music genres. We believe that this finding can be generalized and applied to broader machine learning-based systems in the audio domain.


page 1

page 2

page 3

page 4


End-to-end music source separation: is it possible in the waveform domain?

Most of the currently successful source separation techniques use the ma...

On loss functions and evaluation metrics for music source separation

We investigate which loss functions provide better separations via bench...

A Study of Transfer Learning in Music Source Separation

Supervised deep learning methods for performing audio source separation ...

GSEP: A robust vocal and accompaniment separation system using gated CBHG module and loudness normalization

In the field of audio signal processing research, source separation has ...

DeepSpace: Dynamic Spatial and Source Cue Based Source Separation for Dialog Enhancement

Dialog Enhancement (DE) is a feature which allows a user to increase the...

The 2018 Signal Separation Evaluation Campaign

This paper reports the organization and results for the 2018 community-b...