Log In Sign Up

Exploring Tradeoffs in Models for Low-latency Speech Enhancement

by   Kevin Wilson, et al.

We explore a variety of neural networks configurations for one- and two-channel spectrogram-mask-based speech enhancement. Our best model improves on previous state-of-the-art performance on the CHiME2 speech enhancement task by 0.4 decibels in signal-to-distortion ratio (SDR). We examine trade-offs such as non-causal look-ahead, computation, and parameter count versus enhancement performance and find that zero-look-ahead models can achieve, on average, within 0.03 dB SDR of our best bidirectional model. Further, we find that 200 milliseconds of look-ahead is sufficient to achieve equivalent performance to our best bidirectional model.


page 1

page 2

page 3

page 4


Iterative autoregression: a novel trick to improve your low-latency speech enhancement model

Streaming models are an essential component of real-time speech enhancem...

Full Attention Bidirectional Deep Learning Structure for Single Channel Speech Enhancement

As the cornerstone of other important technologies, such as speech recog...

On TasNet for Low-Latency Single-Speaker Speech Enhancement

In recent years, speech processing algorithms have seen tremendous progr...

Speech Enhancement using Separable Polling Attention and Global Layer Normalization followed with PReLU

Single channel speech enhancement is a challenging task in speech commun...

R3-DLA (Reduce, Reuse, Recycle): A More Efficient Approach to Decoupled Look-Ahead Architectures

Modern societies have developed insatiable demands for more computation ...