MANNER: Multi-view Attention Network for Noise Erasure

03/04/2022
by   Hyun Joon Park, et al.
0

In the field of speech enhancement, time domain methods have difficulties in achieving both high performance and efficiency. Recently, dual-path models have been adopted to represent long sequential features, but they still have limited representations and poor memory efficiency. In this study, we propose Multi-view Attention Network for Noise ERasure (MANNER) consisting of a convolutional encoder-decoder with a multi-view attention block, applied to the time-domain signals. MANNER efficiently extracts three different representations from noisy speech and estimates high-quality clean speech. We evaluated MANNER on the VoiceBank-DEMAND dataset in terms of five objective speech quality metrics. Experimental results show that MANNER achieves state-of-the-art performance while efficiently processing noisy speech.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/22/2022

Multi-View Attention Transfer for Efficient Speech Enhancement

Recent deep learning models have achieved high performance in speech enh...
research
11/11/2021

Uformer: A Unet based dilated complex real dual-path conformer network for simultaneous speech enhancement and dereverberation

Complex spectrum and magnitude are considered as two major features of s...
research
10/12/2021

MetricGAN-U: Unsupervised speech enhancement/ dereverberation based only on noisy/ reverberated speech

Most of the deep learning-based speech enhancement models are learned in...
research
02/03/2018

Memory Fusion Network for Multi-view Sequential Learning

Multi-view sequential learning is a fundamental problem in machine learn...
research
06/09/2023

Efficient Encoder-Decoder and Dual-Path Conformer for Comprehensive Feature Learning in Speech Enhancement

Current speech enhancement (SE) research has largely neglected channel a...
research
01/13/2021

F3SNet: A Four-Step Strategy for QIM Steganalysis of Compressed Speech Based on Hierarchical Attention Network

Traditional machine learning-based steganalysis methods on compressed sp...
research
06/28/2019

Lipper: Synthesizing Thy Speech using Multi-View Lipreading

Lipreading has a lot of potential applications such as in the domain of ...

Please sign up or login with your details

Forgot password? Click here to reset