An empirical study of Conv-TasNet

02/20/2020
by   Berkan Kadioglu, et al.
7

Conv-TasNet is a recently proposed waveform-based deep neural network that achieves state-of-the-art performance in speech source separation. Its architecture consists of a learnable encoder/decoder and a separator that operates on top of this learned space. Various improvements have been proposed to Conv-TasNet. However, they mostly focus on the separator, leaving its encoder/decoder as a (shallow) linear operator. In this paper, we conduct an empirical study of Conv-TasNet and propose an enhancement to the encoder/decoder that is based on a (deep) non-linear variant of it. In addition, we experiment with the larger and more diverse LibriTTS dataset and investigate the generalization capabilities of the studied models when trained on a much larger dataset. We propose cross-dataset evaluation that includes assessing separations from the WSJ0-2mix, LibriTTS and VCTK databases. Our results show that enhancements to the encoder/decoder can improve average SI-SNR performance by more than 1 dB. Furthermore, we offer insights into the generalization capabilities of Conv-TasNet and the potential value of improvements to the encoder/decoder.

READ FULL TEXT

page 3

page 4

research
08/17/2019

Hard but Robust, Easy but Sensitive: How Encoder and Decoder Perform in Neural Machine Translation

Neural machine translation (NMT) typically adopts the encoder-decoder fr...
research
06/02/2016

Single-Model Encoder-Decoder with Explicit Morphological Representation for Reinflection

Morphological reinflection is the task of generating a target form given...
research
02/21/2022

Deep Residual Inception Encoder-Decoder Network for Amyloid PET Harmonization

Introduction Multiple positron emission tomography (PET) tracers are av...
research
02/15/2022

Predicting on the Edge: Identifying Where a Larger Model Does Better

Much effort has been devoted to making large and more accurate models, b...
research
06/26/2017

Generative Encoder-Decoder Models for Task-Oriented Spoken Dialog Systems with Chatting Capability

Generative encoder-decoder models offer great promise in developing doma...
research
07/27/2023

Exploiting the Potential of Seq2Seq Models as Robust Few-Shot Learners

In-context learning, which offers substantial advantages over fine-tunin...
research
10/12/2018

Improving Generalization of Sequence Encoder-Decoder Networks for Inverse Imaging of Cardiac Transmembrane Potential

Deep learning models have shown state-of-the-art performance in many inv...

Please sign up or login with your details

Forgot password? Click here to reset