Batch-normalized joint training for DNN-based distant speech recognition

03/24/2017
by   Mirco Ravanelli, et al.
0

Improving distant speech recognition is a crucial step towards flexible human-machine interfaces. Current technology, however, still exhibits a lack of robustness, especially when adverse acoustic conditions are met. Despite the significant progress made in the last years on both speech enhancement and speech recognition, one potential limitation of state-of-the-art technology lies in composing modules that are not well matched because they are not trained jointly. To address this concern, a promising approach consists in concatenating a speech enhancement and a speech recognition deep neural network and to jointly update their parameters as if they were within a single bigger network. Unfortunately, joint training can be difficult because the output distribution of the speech enhancement system may change substantially during the optimization procedure. The speech recognition module would have to deal with an input distribution that is non-stationary and unnormalized. To mitigate this issue, we propose a joint training approach based on a fully batch-normalized architecture. Experiments, conducted using different datasets, tasks and acoustic conditions, revealed that the proposed framework significantly overtakes other competitive solutions, especially in challenging environments.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/23/2017

A network of deep neural networks for distant speech recognition

Despite the remarkable progress recently made in distant speech recognit...
research
10/10/2017

Contaminated speech training methods for robust DNN-HMM distant speech recognition

Despite the significant progress made in the last years, state-of-the-ar...
research
07/15/2022

Direction-Aware Joint Adaptation of Neural Speech Enhancement and Recognition in Real Multiparty Conversational Environments

This paper describes noisy speech recognition for an augmented reality h...
research
11/15/2019

Sample Drop Detection for Distant-speech Recognition with Asynchronous Devices Distributed in Space

In many applications of multi-microphone multi-device processing, the sy...
research
06/13/2018

A Study of Enhancement, Augmentation, and Autoencoder Methods for Domain Adaptation in Distant Speech Recognition

Speech recognizers trained on close-talking speech do not generalize to ...
research
01/25/2023

On Batching Variable Size Inputs for Training End-to-End Speech Enhancement Systems

The performance of neural network-based speech enhancement systems is pr...
research
03/23/2018

Exploring the robustness of features and enhancement on speech recognition systems in highly-reverberant real environments

This paper evaluates the robustness of a DNN-HMM-based speech recognitio...

Please sign up or login with your details

Forgot password? Click here to reset