TF-Attention-Net: An End To End Neural Network For Singing Voice Separation

09/12/2019
by   Tingle Li, et al.
0

In terms of source separation task, most of deep neural networks have two main types: one is modeling in the spectrogram, and the other is in the waveform. Most of them use CNNs, LSTMs, but due to the high sampling rate of audio, whether it is LSTMs with a long-distance dependent or CNNs with sliding windows is difficult to extract long-term input context. In this case, we propose an end-to-end network: Time Frequency Attention Net(TF-Attention-Net), to study the ability of the attention mechanism in the source separation task. Later, we will introduce the Slice Attention, which can extract the acoustic features of time and frequency scales under different channels while the time complexity of which is less than Multi-head Attention. Also, attention mechanism can be efficiently parallelized while the LSTMs can not because of their time dependence. Meanwhile, the receptive field of attention mechanism is larger than the CNNs, which means we can use shallower layers to extract deeper features. Experiments for singing voice separation indicate that our model yields a better performance compared with the SotA model: spectrogram-based U-Net and waveform-based Wave-U-Net, given the same data.

READ FULL TEXT
research
06/08/2018

Wave-U-Net: A Multi-Scale Neural Network for End-to-End Audio Source Separation

Models for audio source separation usually operate on the magnitude spec...
research
10/05/2018

End-to-end Networks for Supervised Single-channel Speech Separation

The performance of single channel source separation algorithms has impro...
research
11/05/2018

End-to-End Sound Source Separation Conditioned On Instrument Labels

Can we perform an end-to-end sound source separation (SSS) with a variab...
research
03/07/2021

HTMD-Net: A Hybrid Masking-Denoising Approach to Time-Domain Monaural Singing Voice Separation

The advent of deep learning has led to the prevalence of deep neural net...
research
08/03/2022

Conv-NILM-Net, a causal and multi-appliance model for energy source separation

Non-Intrusive Load Monitoring (NILM) seeks to save energy by estimating ...
research
03/08/2021

Time and Frequency Network for Human Action Detection in Videos

Currently, spatiotemporal features are embraced by most deep learning ap...
research
08/30/2023

Dual-path Transformer Based Neural Beamformer for Target Speech Extraction

Neural beamformers, which integrate both pre-separation and beamforming ...

Please sign up or login with your details

Forgot password? Click here to reset