HuBERT-TR: Reviving Turkish Automatic Speech Recognition with Self-supervised Speech Representation Learning

10/13/2022
by   Ali Safaya, et al.
9

While the Turkish language is listed among low-resource languages, literature on Turkish automatic speech recognition (ASR) is relatively old. In this paper, we present HuBERT-TR, a speech representation model for Turkish, based on HuBERT. HuBERT-TR achieves state-of-the-art results on several Turkish ASR datasets. We investigate pre-training HuBERT for Turkish with large-scale data curated from online resources. We pre-train HuBERT-TR using over 6,500 hours of speech data curated from YouTube that includes extensive variability in terms of quality and genre. We show that language-specific models are superior to other pre-trained models, where our Turkish model HuBERT-TR/base performs better than the x10 times larger state-of-the-art multilingual XLS-R-1b model in low-resource settings. Moreover, we study the effect of scaling on ASR performance by scaling our models up to 1B parameters. Our best model yields a state-of-the-art word error rate of 4.97 dataset. Models are available at https://huggingface.co/asafaya

READ FULL TEXT
research
07/04/2023

Boosting Norwegian Automatic Speech Recognition

In this paper, we present several baselines for automatic speech recogni...
research
03/16/2021

Fast Development of ASR in African Languages using Self Supervised Speech Representation Learning

This paper describes the results of an informal collaboration launched d...
research
12/07/2022

Improved Speech Pre-Training with Supervision-Enhanced Acoustic Unit

Speech pre-training has shown great success in learning useful and gener...
research
07/14/2023

Representation Learning With Hidden Unit Clustering For Low Resource Speech Applications

The representation learning of speech, without textual resources, is an ...
research
10/05/2021

Disambiguation-BERT for N-best Rescoring in Low-Resource Conversational ASR

We study the inclusion of past conversational context through BERT langu...
research
09/14/2021

Performance-Efficiency Trade-offs in Unsupervised Pre-training for Speech Recognition

This paper is a study of performance-efficiency trade-offs in pre-traine...
research
02/24/2023

Improving Massively Multilingual ASR With Auxiliary CTC Objectives

Multilingual Automatic Speech Recognition (ASR) models have extended the...

Please sign up or login with your details

Forgot password? Click here to reset