Fast Random Approximation of Multi-channel Room Impulse Response

04/17/2023
by   Yi Luo, et al.
0

Modern neural-network-based speech processing systems are typically required to be robust against reverberation, and the training of such systems thus needs a large amount of reverberant data. During the training of the systems, on-the-fly simulation pipeline is nowadays preferred as it allows the model to train on infinite number of data samples without pre-generating and saving them on harddisk. An RIR simulation method thus needs to not only generate more realistic artificial room impulse response (RIR) filters, but also generate them in a fast way to accelerate the training process. Existing RIR simulation tools have proven effective in a wide range of speech processing tasks and neural network architectures, but their usage in on-the-fly simulation pipeline remains questionable due to their computational complexity or the quality of the generated RIR filters. In this paper, we propose FRAM-RIR, a fast random approximation method of the widely-used image-source method (ISM), to efficiently generate realistic multi-channel RIR filters. FRAM-RIR bypasses the explicit calculation of sound propagation paths in ISM-based algorithms by randomly sampling the location and number of reflections of each virtual sound source based on several heuristic assumptions, while still maintains accurate direction-of-arrival (DOA) information of all sound sources. Visualization of oracle beampatterns and directional features shows that FRAM-RIR can generate more realistic RIR filters than existing widely-used ISM-based tools, and experiment results on multi-channel noisy speech separation and dereverberation tasks with a wide range of neural network architectures show that models trained with FRAM-RIR can also achieve on par or better performance on real RIRs compared to other RIR simulation tools with a significantly accelerated training procedure. A Python implementation of FRAM-RIR is released.

READ FULL TEXT

page 1

page 6

research
08/08/2022

FRA-RIR: Fast Random Approximation of the Image-source Method

The training of modern speech processing systems often requires a large ...
research
10/08/2021

TRUNet: Transformer-Recurrent-U Network for Multi-channel Reverberant Sound Source Separation

In recent years, many deep learning techniques for single-channel sound ...
research
06/21/2021

Ensemble of ACCDOA- and EINV2-based Systems with D3Nets and Impulse Response Simulation for Sound Event Localization and Detection

This report describes our systems submitted to the DCASE2021 challenge t...
research
08/20/2018

Deep Residual Network for Sound Source Localization in the Time Domain

This study presents a system for sound source localization in time domai...
research
07/09/2019

Improving Reverberant Speech Training Using Diffuse Acoustic Simulation

We present an efficient and realistic geometric sound simulation approac...
research
12/10/2021

Shennong: a Python toolbox for audio speech features extraction

We introduce Shennong, a Python toolbox and command-line utility for spe...

Please sign up or login with your details

Forgot password? Click here to reset