LSTM-TDNN with convolutional front-end for Dialect Identification in the 2019 Multi-Genre Broadcast Challenge

12/19/2019
by   Xiaoxiao Miao, et al.
0

This paper presents a novel Dialect Identification (DID) system developed for the Fifth Edition of the Multi-Genre Broadcast challenge, the task of Fine-grained Arabic Dialect Identification (MGB-5 ADI Challenge). The system improves upon traditional DNN x-vector performance by employing a Convolutional and Long Short Term Memory-Recurrent (CLSTM) architecture to combine the benefits of a convolutional neural network front-end for feature extraction and a back-end recurrent neural to capture longer temporal dependencies. Furthermore we investigate intensive augmentation of one low resource dialect in the highly unbalanced training set using time-scale modification (TSM). This converts an utterance to several time-stretched or time-compressed versions, subsequently used to train the CLSTM system without using any other corpus. In this paper, we also investigate speech augmentation using MUSAN and the RIR datasets to increase the quantity and diversity of the existing training data in the normal way. Results show firstly that the CLSTM architecture outperforms a traditional DNN x-vector implementation. Secondly, adopting TSM-based speed perturbation yields a small performance improvement for the unbalanced data, finally that traditional data augmentation techniques yield further benefit, in line with evidence from related speaker and language recognition tasks. Our system achieved 2nd place ranking out of 15 entries in the MGB-5 ADI challenge, presented at ASRU 2019.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
11/12/2016

Multi-Language Identification Using Convolutional Recurrent Neural Network

Language Identification, being an important aspect of Automatic Speaker ...
research
09/26/2019

DARTS: Dialectal Arabic Transcription System

We present the speech to text transcription system, called DARTS, for lo...
research
09/18/2018

Language Identification with Deep Bottleneck Features

In this paper we proposed an end-to-end short utterances speech language...
research
03/12/2018

Convolutional Neural Networks and Language Embeddings for End-to-End Dialect Recognition

Dialect identification (DID) is a special case of general language ident...
research
08/12/2020

Mask Detection and Breath Monitoring from Speech: on Data Augmentation, Feature Representation and Modeling

This paper introduces our approaches for the Mask and Breathing Sub-Chal...
research
12/26/2021

Novel Dual-Channel Long Short-Term Memory Compressed Capsule Networks for Emotion Recognition

Recent analysis on speech emotion recognition has made considerable adva...
research
08/06/2019

Two-stage Training for Chinese Dialect Recognition

In this paper, we present a two-stage language identification (LID) syst...

Please sign up or login with your details

Forgot password? Click here to reset