A Parameter-Efficient Learning Approach to Arabic Dialect Identification with Pre-Trained General-Purpose Speech Model

05/18/2023
by   Srijith Radhakrishnan, et al.
0

In this work, we explore Parameter-Efficient-Learning (PEL) techniques to repurpose a General-Purpose-Speech (GSM) model for Arabic dialect identification (ADI). Specifically, we investigate different setups to incorporate trainable features into a multi-layer encoder-decoder GSM formulation under frozen pre-trained settings. Our architecture includes residual adapter and model reprogramming (input-prompting). We design a token-level label mapping to condition the GSM for Arabic Dialect Identification (ADI). This is challenging due to the high variation in vocabulary and pronunciation among the numerous regional dialects. We achieve new state-of-the-art accuracy on the ADI-17 dataset by vanilla fine-tuning. We further reduce the training budgets with the PEL method, which performs within 1.86 parameters. Our study demonstrates how to identify Arabic dialects using a small dataset and limited computation with open source code and pre-trained models.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
03/25/2023

Fine-Tashkeel: Finetuning Byte-Level Models for Accurate Arabic Text Diacritization

Most of previous work on learning diacritization of the Arabic language ...
research
11/18/2021

Supporting Undotted Arabic with Pre-trained Language Models

We observe a recent behaviour on social media, in which users intentiona...
research
06/29/2021

New Arabic Medical Dataset for Diseases Classification

The Arabic language suffers from a great shortage of datasets suitable f...
research
11/04/2022

Integrated Parameter-Efficient Tuning for General-Purpose Audio Models

The advent of hyper-scale and general-purpose pre-trained models is shif...
research
11/09/2022

Accidental Learners: Spoken Language Identification in Multilingual Self-Supervised Models

In this paper, we extend previous self-supervised approaches for languag...
research
12/13/2020

SPARTA: Speaker Profiling for ARabic TAlk

This paper proposes a novel approach to an automatic estimation of three...
research
03/23/2020

Improving Yorùbá Diacritic Restoration

Yorùbá is a widely spoken West African language with a writing system ri...

Please sign up or login with your details

Forgot password? Click here to reset