MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge

by   Suwon Shon, et al.

In order to successfully annotate the Arabic speech con- tent found in open-domain media broadcasts, it is essential to be able to process a diverse set of Arabic dialects. For the 2017 Multi-Genre Broadcast challenge (MGB-3) there were two possible tasks: Arabic speech recognition, and Arabic Dialect Identification (ADI). In this paper, we describe our efforts to create an ADI system for the MGB-3 challenge, with the goal of distinguishing amongst four major Arabic dialects, as well as Modern Standard Arabic. Our research fo- cused on dialect variability and domain mismatches between the training and test domain. In order to achieve a robust ADI system, we explored both Siamese neural network models to learn similarity and dissimilarities among Arabic dialects, as well as i-vector post-processing to adapt domain mismatches. Both Acoustic and linguistic features were used for the final MGB-3 submissions, with the best primary system achieving 75 set.


page 1

page 2

page 3

page 4


Automatic Dialect Detection in Arabic Broadcast Speech

We investigate different approaches for dialect identification in Arabic...

Speech Recognition Challenge in the Wild: Arabic MGB-3

This paper describes the Arabic MGB-3 Challenge - Arabic Speech Recognit...

On the Robustness of Arabic Speech Dialect Identification

Arabic dialect identification (ADI) tools are an important part of the l...

Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition

Dialect identification (DID) is a special case of general language ident...

Similarities between Arabic Dialects: Investigating Geographical Proximity

The automatic classification of Arabic dialects is an ongoing research c...

Genetic approach for arabic part of speech tagging

With the growing number of textual resources available, the ability to u...

Please sign up or login with your details

Forgot password? Click here to reset