Log In Sign Up

Feature Replacement and Combination for Hybrid ASR Systems

by   Peter Vieting, et al.

Acoustic modeling of raw waveform and learning feature extractors as part of the neural network classifier has been the goal of many studies in the area of automatic speech recognition (ASR). Recently, one line of research has focused on frameworks that can be pre-trained on audio-only data in an unsupervised fashion and aim at improving downstream ASR tasks. In this work, we investigate the usefulness of one of these front-end frameworks, namely wav2vec, for hybrid ASR systems. In addition to deploying a pre-trained feature extractor, we explore how to make use of an existing acoustic model (AM) trained on the same task with different features as well. Another neural front-end which is only trained together with the supervised ASR loss as well as traditional Gammatone features are applied for comparison. Moreover, it is shown that the AM can be retrofitted with i-vectors for speaker adaptation. Finally, the described features are combined in order to further advance the performance. With the final best system, we obtain a relative improvement of 4 previous best model on the LibriSpeech test-clean and test-other sets.


page 1

page 2

page 3

page 4


A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy

Acoustic Echo Cancellation (AEC) is essential for accurate recognition o...

Articulatory Features for ASR of Pathological Speech

In this work, we investigate the joint use of articulatory and acoustic ...

Investigation of Ensemble features of Self-Supervised Pretrained Models for Automatic Speech Recognition

Self-supervised learning (SSL) based models have been shown to generate ...

LipReading with 3D-2D-CNN BLSTM-HMM and word-CTC models

In recent years, deep learning based machine lipreading has gained promi...

A higher order Minkowski loss for improved prediction ability of acoustic model in ASR

Conventional automatic speech recognition (ASR) system uses second-order...

Introducing ECAPA-TDNN and Wav2Vec2.0 Embeddings to Stuttering Detection

The adoption of advanced deep learning (DL) architecture in stuttering d...