Low-Resource Spoken Language Identification Using Self-Attentive Pooling and Deep 1D Time-Channel Separable Convolutions

05/31/2021
by   Roman Bedyakin, et al.
0

This memo describes NTR/TSU winning submission for Low Resource ASR challenge at Dialog2021 conference, language identification track. Spoken Language Identification (LID) is an important step in a multilingual Automated Speech Recognition (ASR) system pipeline. Traditionally, the ASR task requires large volumes of labeled data that are unattainable for most of the world's languages, including most of the languages of Russia. In this memo, we show that a convolutional neural network with a Self-Attentive Pooling layer shows promising results in low-resource setting for the language identification task and set up a SOTA for the Low Resource ASR challenge dataset. Additionally, we compare the structure of confusion matrices for this and significantly more diverse VoxForge dataset and state and substantiate the hypothesis that whenever the dataset is diverse enough so that the other classification factors, like gender, age etc. are well-averaged, the confusion matrix for LID system bears the language similarity measure.

READ FULL TEXT
research
04/24/2021

Language ID Prediction from Speech Using Self-Attentive Pooling and 1D-Convolutions

This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on pr...
research
06/01/2022

Snow Mountain: Dataset of Audio Recordings of The Bible in Low Resource Languages

Automatic Speech Recognition (ASR) has increasing utility in the modern ...
research
05/02/2023

The Pipeline System of ASR and NLU with MLM-based Data Augmentation toward STOP Low-resource Challenge

This paper describes our system for the low-resource domain adaptation t...
research
07/14/2023

Towards spoken dialect identification of Irish

The Irish language is rich in its diversity of dialects and accents. Thi...
research
06/01/2023

Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech

This work focuses on improving the Spoken Language Identification (LangI...
research
06/07/2021

SIGTYP 2021 Shared Task: Robust Spoken Language Identification

While language identification is a fundamental speech and language proce...
research
06/03/2023

Adapting Pretrained ASR Models to Low-resource Clinical Speech using Epistemic Uncertainty-based Data Selection

While there has been significant progress in ASR, African-accented clini...

Please sign up or login with your details

Forgot password? Click here to reset