Feature Replacement and Combination for Hybrid ASR Systems

04/09/2021
by   Peter Vieting, et al.
0

Acoustic modeling of raw waveform and learning feature extractors as part of the neural network classifier has been the goal of many studies in the area of automatic speech recognition (ASR). Recently, one line of research has focused on frameworks that can be pre-trained on audio-only data in an unsupervised fashion and aim at improving downstream ASR tasks. In this work, we investigate the usefulness of one of these front-end frameworks, namely wav2vec, for hybrid ASR systems. In addition to deploying a pre-trained feature extractor, we explore how to make use of an existing acoustic model (AM) trained on the same task with different features as well. Another neural front-end which is only trained together with the supervised ASR loss as well as traditional Gammatone features are applied for comparison. Moreover, it is shown that the AM can be retrofitted with i-vectors for speaker adaptation. Finally, the described features are combined in order to further advance the performance. With the final best system, we obtain a relative improvement of 4 previous best model on the LibriSpeech test-clean and test-other sets.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/06/2022

A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy

Acoustic Echo Cancellation (AEC) is essential for accurate recognition o...
research
08/08/2023

Comparative Analysis of the wav2vec 2.0 Feature Extractor

Automatic speech recognition (ASR) systems typically use handcrafted fea...
research
04/20/2021

On the Impact of Word Error Rate on Acoustic-Linguistic Speech Emotion Recognition: An Update for the Deep Learning Era

Text encodings from automatic speech recognition (ASR) transcripts and a...
research
06/11/2022

Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition

Self-supervised learning (SSL) based models have been shown to generate ...
research
06/25/2019

LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models

In recent years, deep learning based machine lipreading has gained promi...
research
12/02/2021

A higher order Minkowski loss for improved prediction ability of acoustic model in ASR

Conventional automatic speech recognition (ASR) system uses second-order...
research
06/26/2022

Improving the Training Recipe for a Robust Conformer-based Hybrid Model

Speaker adaptation is important to build robust automatic speech recogni...

Please sign up or login with your details

Forgot password? Click here to reset