Exploration of Efficient End-to-End ASR using Discretized Input from Self-Supervised Learning

05/29/2023
by   Xuankai Chang, et al.
0

Self-supervised learning (SSL) of speech has shown impressive results in speech-related tasks, particularly in automatic speech recognition (ASR). While most methods employ the output of intermediate layers of the SSL model as real-valued features for downstream tasks, there is potential in exploring alternative approaches that use discretized token sequences. This approach offers benefits such as lower storage requirements and the ability to apply techniques from natural language processing. In this paper, we propose a new protocol that utilizes discretized token sequences in ASR tasks, which includes de-duplication and sub-word modeling to enhance the input sequence. It reduces computational cost by decreasing the length of the sequence. Our experiments on the LibriSpeech dataset demonstrate that our proposed protocol performs competitively with conventional ASR systems using continuous input features, while reducing computational and storage costs.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/30/2022

FeaRLESS: Feature Refinement Loss for Ensembling Self-Supervised Learning Features in Robust End-to-end Speech Recognition

Self-supervised learning representations (SSLR) have resulted in robust ...
research
06/09/2022

Joint Encoder-Decoder Self-Supervised Pre-training for ASR

Self-supervised learning (SSL) has shown tremendous success in various s...
research
04/23/2021

LeBenchmark: A Reproducible Framework for Assessing Self-Supervised Representation Learning from Speech

Self-Supervised Learning (SSL) using huge unlabeled data has been succes...
research
11/01/2022

Avoid Overthinking in Self-Supervised Models for Speech Recognition

Self-supervised learning (SSL) models reshaped our approach to speech, l...
research
09/14/2023

Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS

Self-supervised learning (SSL) proficiency in speech-related tasks has d...
research
03/25/2021

Residual Energy-Based Models for End-to-End Speech Recognition

End-to-end models with auto-regressive decoders have shown impressive re...
research
10/08/2021

SCaLa: Supervised Contrastive Learning for End-to-End Automatic Speech Recognition

End-to-end Automatic Speech Recognition (ASR) models are usually trained...

Please sign up or login with your details

Forgot password? Click here to reset