Is Attention always needed? A Case Study on Language Identification from Speech

10/05/2021
by   Atanu Mandal, et al.
9

Language Identification (LID), a recommended initial step to Automatic Speech Recognition (ASR), is used to detect a spoken language from audio specimens. In state-of-the-art systems capable of multilingual speech processing, however, users have to explicitly set one or more languages before using them. LID, therefore, plays a very important role in situations where ASR based systems cannot parse the uttered language in multilingual contexts causing failure in speech recognition. We propose an attention based convolutional recurrent neural network (CRNN with Attention) that works on Mel-frequency Cepstral Coefficient (MFCC) features of audio specimens. Additionally, we reproduce some state-of-the-art approaches, namely Convolutional Neural Network (CNN) and Convolutional Recurrent Neural Network (CRNN), and compare them to our proposed method. We performed extensive evaluation on thirteen different Indian languages and our model achieves classification accuracy over 98 model is robust to noise and provides 91.2 proposed model is easily extensible to new languages.

READ FULL TEXT
research
08/04/2021

Dyn-ASR: Compact, Multilingual Speech Recognition via Spoken Language and Accent Identification

Running automatic speech recognition (ASR) on edge devices is non-trivia...
research
05/04/2023

Employing Hybrid Deep Neural Networks on Dari Speech

This paper is an extension of our previous conference paper. In recent y...
research
05/30/2022

Adversarial synthesis based data-augmentation for code-switched spoken language identification

Spoken Language Identification (LID) is an important sub-task of Automat...
research
05/19/2022

Automatic Spoken Language Identification using a Time-Delay Neural Network

Closed-set spoken language identification is the task of recognizing the...
research
05/31/2021

Singing Language Identification using a Deep Phonotactic Approach

Extensive works have tackled Language Identification (LID) in the speech...
research
11/15/2018

Robust universal neural vocoding

This paper introduces a robust universal neural vocoder trained with 74 ...
research
02/25/2016

Adaptive Frequency Cepstral Coefficients for Word Mispronunciation Detection

Systems based on automatic speech recognition (ASR) technology can provi...

Please sign up or login with your details

Forgot password? Click here to reset