UX-NET: Filter-and-Process-based Improved U-Net for Real-time Time-domain Audio Separation

10/28/2022
by   Kashyap Patel, et al.
0

This study presents UX-Net, a time-domain audio separation network (TasNet) based on a modified U-Net architecture. The proposed UX-Net works in real-time and handles either single or multi-microphone input. Inspired by the filter-and-process-based human auditory behavior, the proposed system introduces novel mixer and separation modules, which result in cost and memory efficient modeling of speech sources. The mixer module combines encoded input in a latent feature space and outputs a desired number of output streams. Then, in the separation module, a modified U-Net (UX) block is applied. The UX block first filters the encoded input at various resolutions followed by aggregating the filtered information and applying recurrent processing to estimate masks of separated sources. The letter 'X' in UX-Net is a name placeholder for the type of recurrent layer employed in the UX block. Empirical findings on the WSJ0-2mix benchmark dataset show that one of the UX-Net configurations outperforms the state-of-the-art Conv-TasNet system by 0.85 dB SI-SNR while using only 16 low latency.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
02/16/2020

Real-time binaural speech separation with preserved spatial cues

Deep learning speech separation algorithms have achieved great success i...
research
01/26/2022

SkiM: Skipping Memory LSTM for Low-Latency Real-Time Continuous Speech Separation

Continuous speech separation for meeting pre-processing has recently bec...
research
11/01/2017

TasNet: time-domain audio separation network for real-time, single-channel speech separation

Robust speech processing in multi-talker environments requires effective...
research
11/17/2020

Rethinking the Separation Layers in Speech Separation Networks

Modules in all existing speech separation networks can be categorized in...
research
04/06/2022

S-R2F2U-Net: A single-stage model for teeth segmentation

Precision tooth segmentation is crucial in the oral sector because it pr...
research
10/08/2018

Recognizing Overlapped Speech in Meetings: A Multichannel Separation Approach Using Neural Networks

The goal of this work is to develop a meeting transcription system that ...
research
10/25/2019

A Multi-Phase Gammatone Filterbank for Speech Separation via TasNet

In this work, we investigate if the learned encoder of the end-to-end co...

Please sign up or login with your details

Forgot password? Click here to reset