Cross-Corpora Language Recognition: A Preliminary Investigation with Indian Languages

05/10/2021
by   Spandan Dey, et al.
0

In this paper, we conduct one of the very first studies for cross-corpora performance evaluation in the spoken language identification (LID) problem. Cross-corpora evaluation was not explored much in LID research, especially for the Indian languages. We have selected three Indian spoken language corpora: IIITH-ILSC, LDC South Asian, and IITKGP-MLILSC. For each of the corpus, LID systems are trained on the state-of-the-art time-delay neural network (TDNN) based architecture with MFCC features. We observe that the LID performance degrades drastically for cross-corpora evaluation. For example, the system trained on the IIITH-ILSC corpus shows an average EER of 11.80 when evaluated with the same corpora and LDC South Asian corpora, respectively. Our preliminary analysis shows the significant differences among these corpora in terms of mismatch in the long-term average spectrum (LTAS) and signal-to-noise ratio (SNR). Subsequently, we apply different feature level compensation methods to reduce the cross-corpora acoustic mismatch. Our results indicate that these feature normalization schemes can help to achieve promising LID performance on cross-corpora experiments.

READ FULL TEXT
research
07/14/2023

Towards dialect-inclusive recognition in a low-resource language: are balanced corpora the answer?

ASR systems are generally built for the spoken 'standard', and their per...
research
02/10/2023

Cross-Corpora Spoken Language Identification with Domain Diversification and Generalization

This work addresses the cross-corpora generalization issue for the low-r...
research
11/24/2020

Cross-Document Event Coreference Resolution Beyond Corpus-Tailored Systems

Cross-document event coreference resolution (CDCR) is an NLP task in whi...
research
02/03/2017

KU-ISPL Speaker Recognition Systems under Language mismatch condition for NIST 2016 Speaker Recognition Evaluation

Korea University Intelligent Signal Processing Lab. (KU-ISPL) developed ...
research
02/10/2020

On Cross-Corpus Generalization of Deep Learning Based Speech Enhancement

In recent years, supervised approaches using deep neural networks (DNNs)...
research
08/02/2020

Cross-Domain Adaptation of Spoken Language Identification for Related Languages: The Curious Case of Slavic Languages

State-of-the-art spoken language identification (LID) systems, which are...
research
04/05/2019

Cross-Corpora Evaluation and Analysis of Grammatical Error Correction Models --- Is Single-Corpus Evaluation Enough?

This study explores the necessity of performing cross-corpora evaluation...

Please sign up or login with your details

Forgot password? Click here to reset