Iterative autoregression: a novel trick to improve your low-latency speech enhancement model

11/03/2022
by   Pavel Andreev, et al.
0

Streaming models are an essential component of real-time speech enhancement tools. The streaming regime constrains speech enhancement models to use only a tiny context of future information, thus, the low-latency streaming setup is generally assumed to be challenging and has a significant negative effect on the model quality. However, due to the sequential nature of streaming generation, it provides a natural possibility for autoregression, i.e., using previous predictions when making current ones. In this paper, we present a simple, yet effective trick for training of autoregressive low-latency speech enhancement models. We demonstrate that the proposed technique leads to stable improvement across different architectures and training scenarios.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/17/2020

Efficient Low-Latency Speech Enhancement with Mobile Audio Streaming Networks

We propose Mobile Audio Streaming Networks (MASnet) for efficient low-la...
research
02/26/2023

DFSNet: A Steerable Neural Beamformer Invariant to Microphone Array Configuration for Real-Time, Low-Latency Speech Enhancement

Invariance to microphone array configuration is a rare attribute in neur...
research
11/16/2018

Exploring Tradeoffs in Models for Low-latency Speech Enhancement

We explore a variety of neural networks configurations for one- and two-...
research
11/08/2022

Cross-Attention is all you need: Real-Time Streaming Transformers for Personalised Speech Enhancement

Personalised speech enhancement (PSE), which extracts only the speech of...
research
04/05/2021

Real-time Streaming Wave-U-Net with Temporal Convolutions for Multichannel Speech Enhancement

In this paper, we describe the work that we have done to participate in ...
research
09/20/2023

Speak While You Think: Streaming Speech Synthesis During Text Generation

Large Language Models (LLMs) demonstrate impressive capabilities, yet in...
research
06/29/2023

Modified Parametric Multichannel Wiener Filter for Low-latency Enhancement of Speech Mixtures with Unknown Number of Speakers

This paper introduces a novel low-latency online beamforming (BF) algori...

Please sign up or login with your details

Forgot password? Click here to reset