Real-Time Target Sound Extraction

11/04/2022
by   Bandhav Veluri, et al.
0

We present the first neural network model to achieve real-time and streaming target sound extraction. To accomplish this, we propose Waveformer, an encoder-decoder architecture with a stack of dilated causal convolution layers as the encoder, and a transformer decoder layer as the decoder. This hybrid architecture uses dilated causal convolutions for processing large receptive fields in a computationally efficient manner, while also benefiting from the performance transformer-based architectures provide. Our evaluations show as much as 2.2-3.3 dB improvement in SI-SNRi compared to the prior models for this task while having a 1.2-4x smaller model size and a 1.5-2x lower runtime. Open-source code and datasets: https://github.com/vb000/Waveformer

READ FULL TEXT

page 1

page 2

page 3

page 4

research
09/19/2023

Spatial-Assistant Encoder-Decoder Network for Real Time Semantic Segmentation

Semantic segmentation is an essential technology for self-driving cars t...
research
02/24/2023

A Convolutional Vision Transformer for Semantic Segmentation of Side-Scan Sonar Data

Distinguishing among different marine benthic habitat characteristics is...
research
03/31/2021

Learning Spatio-Temporal Transformer for Visual Tracking

In this paper, we present a new tracking architecture with an encoder-de...
research
03/13/2023

TransNetR: Transformer-based Residual Network for Polyp Segmentation with Multi-Center Out-of-Distribution Testing

Colonoscopy is considered the most effective screening test to detect co...
research
11/26/2021

A Volumetric Transformer for Accurate 3D Tumor Segmentation

This paper presents a Transformer architecture for volumetric medical im...
research
10/07/2022

EmbryosFormer: Deformable Transformer and Collaborative Encoding-Decoding for Embryos Stage Development Classification

The timing of cell divisions in early embryos during the In-Vitro Fertil...
research
05/03/2023

CodeGen2: Lessons for Training LLMs on Programming and Natural Languages

Large language models (LLMs) have demonstrated remarkable abilities in r...

Please sign up or login with your details

Forgot password? Click here to reset