On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

01/25/2023
by   Philippe Gonzalez, et al.
0

The performance of neural network-based speech enhancement systems is primarily influenced by the model architecture, whereas training times and computational resource utilization are primarily affected by training parameters such as the batch size. Since noisy and reverberant speech mixtures can have different duration, a batching strategy is required to handle variable size inputs during training, in particular for state-of-the-art end-to-end systems. Such strategies usually strive a compromise between zero-padding and data randomization, and can be combined with a dynamic batch size for a more consistent amount of data in each batch. However, the effect of these practices on resource utilization and more importantly network performance is not well documented. This paper is an empirical study of the effect of different batching strategies and batch sizes on the training statistics and speech enhancement performance of a Conv-TasNet, evaluated in both matched and mismatched conditions. We find that using a small batch size during training improves performance in both conditions for all batching strategies. Moreover, using sorted or bucket batching with a dynamic batch size allows for reduced training time and GPU memory usage while achieving similar performance compared to random batching with a fixed batch size.

READ FULL TEXT
research
11/18/2022

Exploring WavLM on Speech Enhancement

There is a surge in interest in self-supervised learning approaches for ...
research
11/17/2021

BLOOM-Net: Blockwise Optimization for Masking Networks Toward Scalable and Efficient Speech Enhancement

In this paper, we present a blockwise optimization method for masking-ba...
research
09/11/2021

Incorporating Real-world Noisy Speech in Neural-network-based Speech Enhancement Systems

Supervised speech enhancement relies on parallel databases of degraded s...
research
03/24/2017

Batch-normalized joint training for DNN-based distant speech recognition

Improving distant speech recognition is a crucial step towards flexible ...
research
09/12/2023

Assessing the Generalization Gap of Learning-Based Speech Enhancement Systems in Noisy and Reverberant Environments

The acoustic variability of noisy and reverberant speech mixtures is inf...
research
06/29/2022

A light-weight full-band speech enhancement model

Deep neural network based full-band speech enhancement systems face chal...
research
05/20/2023

Taming Resource Heterogeneity In Distributed ML Training With Dynamic Batching

Current techniques and systems for distributed model training mostly ass...

Please sign up or login with your details

Forgot password? Click here to reset