End-to-End Integration of Speech Recognition, Dereverberation, Beamforming, and Self-Supervised Learning Representation

10/19/2022
by   Yoshiki Masuyama, et al.
0

Self-supervised learning representation (SSLR) has demonstrated its significant effectiveness in automatic speech recognition (ASR), mainly with clean speech. Recent work pointed out the strength of integrating SSLR with single-channel speech enhancement for ASR in noisy environments. This paper further advances this integration by dealing with multi-channel input. We propose a novel end-to-end architecture by integrating dereverberation, beamforming, SSLR, and ASR within a single neural network. Our system achieves the best performance reported in the literature on the CHiME-4 6-channel track with a word error rate (WER) of 1.77 demonstrates promising results by itself, the end-to-end integration with the weighted power minimization distortionless response beamformer, which simultaneously performs dereverberation and denoising, improves WER significantly. Its effectiveness is also validated on the REVERB dataset.

READ FULL TEXT
research
07/23/2023

Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation

Neural speech separation has made remarkable progress and its integratio...
research
04/01/2022

End-to-End Integration of Speech Recognition, Speech Enhancement, and Self-Supervised Learning Representation

This work presents our end-to-end (E2E) automatic speech recognition (AS...
research
03/31/2022

Exploiting Single-Channel Speech for Multi-Channel End-to-End Speech Recognition: A Comparative Study

Recently, the end-to-end training approach for multi-channel ASR has sho...
research
04/19/2019

Dry, Focus, and Transcribe: End-to-End Integration of Dereverberation, Beamforming, and ASR

Sequence-to-sequence (S2S) modeling is becoming a popular paradigm for a...
research
04/19/2019

An Investigation of End-to-End Multichannel Speech Recognition for Reverberant and Mismatch Conditions

Sequence-to-sequence (S2S) modeling is becoming a popular paradigm for a...
research
12/20/2018

A unified convolutional beamformer for simultaneous denoising and dereverberation

This paper proposes a method for estimating a convolutional beamformer t...
research
12/07/2022

Progressive Multi-Scale Self-Supervised Learning for Speech Recognition

Self-supervised learning (SSL) models have achieved considerable improve...

Please sign up or login with your details

Forgot password? Click here to reset