N-Shot Benchmarking of Whisper on Diverse Arabic Speech Recognition

06/05/2023
by   Bashar Talafha, et al.
0

Whisper, the recently developed multilingual weakly supervised model, is reported to perform well on multiple speech recognition benchmarks in both monolingual and multilingual settings. However, it is not clear how Whisper would fare under diverse conditions even on languages it was evaluated on such as Arabic. In this work, we address this gap by comprehensively evaluating Whisper on several varieties of Arabic speech for the ASR task. Our evaluation covers most publicly available Arabic speech data and is performed under n-shot (zero-, few-, and full) finetuning. We also investigate the robustness of Whisper under completely novel conditions, such as in dialect-accented standard Arabic and in unseen dialects for which we develop evaluation data. Our experiments show that although Whisper zero-shot outperforms fully finetuned XLS-R models on all datasets, its performance deteriorates significantly in the zero-shot setting for five unseen dialects (i.e., Algeria, Jordan, Palestine, UAE, and Yemen).

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/22/2020

How Phonotactics Affect Multilingual and Zero-shot ASR Performance

The idea of combining multiple languages' recordings to train a single a...
research
06/29/2023

LyricWhiz: Robust Multilingual Zero-shot Lyrics Transcription by Whispering to ChatGPT

We introduce LyricWhiz, a robust, multilingual, and zero-shot automatic ...
research
11/04/2022

A Weakly-Supervised Streaming Multilingual Speech Model with Truly Zero-Shot Capability

In this paper, we introduce our work of building a Streaming Multilingua...
research
05/18/2023

Prompting the Hidden Talent of Web-Scale Speech Models for Zero-Shot Task Generalization

We investigate the emergent abilities of the recently proposed web-scale...
research
12/06/2022

Robust Speech Recognition via Large-Scale Weak Supervision

We study the capabilities of speech processing systems trained simply to...
research
12/21/2022

JASMINE: Arabic GPT Models for Few-Shot Learning

Task agnostic generative pretraining (GPT) has recently proved promising...
research
06/01/2023

On the Robustness of Arabic Speech Dialect Identification

Arabic dialect identification (ADI) tools are an important part of the l...

Please sign up or login with your details

Forgot password? Click here to reset