pMCT: Patched Multi-Condition Training for Robust Speech Recognition

07/11/2022
by   Pablo Peso Parada, et al.
0

We propose a novel Patched Multi-Condition Training (pMCT) method for robust Automatic Speech Recognition (ASR). pMCT employs Multi-condition Audio Modification and Patching (MAMP) via mixing patches of the same utterance extracted from clean and distorted speech. Training using patch-modified signals improves robustness of models in noisy reverberant scenarios. Our proposed pMCT is evaluated on the LibriSpeech dataset showing improvement over using vanilla Multi-Condition Training (MCT). For analyses on robust ASR, we employed pMCT on the VOiCES dataset which is a noisy reverberant dataset created using utterances from LibriSpeech. In the analyses, pMCT achieves 23.1 relative WER reduction compared to the MCT.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
07/25/2020

Robust Front-End for Multi-Channel ASR using Flow-Based Density Estimation

For multi-channel speech recognition, speech enhancement techniques such...
research
05/21/2023

On the Efficacy and Noise-Robustness of Jointly Learned Speech Emotion and Automatic Speech Recognition

New-age conversational agent systems perform both speech emotion recogni...
research
07/15/2013

Modified SPLICE and its Extension to Non-Stereo Data for Noise Robust Speech Recognition

In this paper, a modification to the training process of the popular SPL...
research
02/15/2022

Multi-style Training for South African Call Centre Audio

Mismatched data is a challenging problem for automatic speech recognitio...
research
02/28/2023

Practice of the conformer enhanced AUDIO-VISUAL HUBERT on Mandarin and English

Considering the bimodal nature of human speech perception, lips, and tee...
research
05/29/2017

DNN-based uncertainty estimation for weighted DNN-HMM ASR

In this paper, the uncertainty is defined as the mean square error betwe...

Please sign up or login with your details

Forgot password? Click here to reset