Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition

03/12/2018
by   Suwon Shon, et al.
0

Dialect identification (DID) is a special case of general language identification (LID), but a more challenging problem due to the linguistic similarity between dialects. In this paper, we propose an end-to-end DID system and a Siamese neural network to extract language embeddings. We use both acoustic and linguistic features for the DID task on the Arabic dialectal speech dataset: Multi-Genre Broadcast 3 (MGB-3). The end-to-end DID system was trained using three kinds of acoustic features: Mel-Frequency Cepstral Coefficients (MFCCs), log Mel-scale Filter Bank energies (FBANK) and spectrogram energies. We also investigated a dataset augmentation approach to achieve robust performance with limited data resources. Our linguistic feature research focused on learning similarities and dissimilarities between dialects using the Siamese network, so that we can reduce feature dimensionality as well as improve DID performance. The best system using a single feature set achieves 73 MGB-3 dialect test set consisting of 5 dialects. The experimental results indicate that FBANK features achieve slightly better results than MFCCs. Dataset augmentation via speed perturbation appears to add significant robustness to the system. Although the Siamese network with language embeddings did not achieve as good a result as the end-to-end DID system, the two approaches had good synergy when combined together in a fused system.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
08/28/2017

MIT-QCRI Arabic Dialect Identification System for the 2017 Multi-Genre Broadcast Challenge

In order to successfully annotate the Arabic speech con- tent found in o...
research
12/22/2019

end-to-end training of a large vocabulary end-to-end speech recognition system

In this paper, we present an end-to-end training framework for building ...
research
12/19/2019

LSTM-TDNN with convolutional front-end for Dialect Identification in the 2019 Multi-Genre Broadcast Challenge

This paper presents a novel Dialect Identification (DID) system develope...
research
10/27/2022

SAN: a robust end-to-end ASR model architecture

In this paper, we propose a novel Siamese Adversarial Network (SAN) arch...
research
08/01/2020

Singer Identification Using Convolutional Acoustic Motif Embeddings

Flamenco singing is characterized by pitch instability, micro-tonal orna...
research
09/29/2017

UTD-CRSS Submission for MGB-3 Arabic Dialect Identification: Front-end and Back-end Advancements on Broadcast Speech

This study presents systems submitted by the University of Texas at Dall...
research
08/20/2015

DeepWriterID: An End-to-end Online Text-independent Writer Identification System

Owing to the rapid growth of touchscreen mobile terminals and pen-based ...

Please sign up or login with your details

Forgot password? Click here to reset