Comparative Analysis of the wav2vec 2.0 Feature Extractor

08/08/2023
by   Peter Vieting, et al.
0

Automatic speech recognition (ASR) systems typically use handcrafted feature extraction pipelines. To avoid their inherent information loss and to achieve more consistent modeling from speech to transcribed text, neural raw waveform feature extractors (FEs) are an appealing approach. Also the wav2vec 2.0 model, which has recently gained large popularity, uses a convolutional FE which operates directly on the speech waveform. However, it is not yet studied extensively in the literature. In this work, we study its capability to replace the standard feature extraction methods in a connectionist temporal classification (CTC) ASR model and compare it to an alternative neural FE. We show that both are competitive with traditional FEs on the LibriSpeech benchmark and analyze the effect of the individual components. Furthermore, we analyze the learned filters and show that the most important information for the ASR system is obtained by a set of bandpass filters.

READ FULL TEXT
research
04/09/2021

Feature Replacement and Combination for Hybrid ASR Systems

Acoustic modeling of raw waveform and learning feature extractors as par...
research
09/22/2017

Attention-based Wav2Text with Feature Transfer Learning

Conventional automatic speech recognition (ASR) typically performs multi...
research
12/17/2018

Fully Convolutional Speech Recognition

Current state-of-the-art speech recognition systems build on recurrent n...
research
06/21/2019

Multi-Span Acoustic Modelling using Raw Waveform Signals

Traditional automatic speech recognition (ASR) systems often use an acou...
research
05/06/2022

A Conformer-based Waveform-domain Neural Acoustic Echo Canceller Optimized for ASR Accuracy

Acoustic Echo Cancellation (AEC) is essential for accurate recognition o...
research
11/27/2018

Learning to detect dysarthria from raw speech

Speech classifiers of paralinguistic traits traditionally learn from div...
research
09/30/2019

Acoustic Model Adaptation from Raw Waveforms with SincNet

Raw waveform acoustic modelling has recently gained interest due to neur...

Please sign up or login with your details

Forgot password? Click here to reset