Spoken Language Identification System for English-Mandarin Code-Switching Child-Directed Speech

06/01/2023
by   Shashi Kant Gupta, et al.
0

This work focuses on improving the Spoken Language Identification (LangId) system for a challenge that focuses on developing robust language identification systems that are reliable for non-standard, accented (Singaporean accent), spontaneous code-switched, and child-directed speech collected via Zoom. We propose a two-stage Encoder-Decoder-based E2E model. The encoder module consists of 1D depth-wise separable convolutions with Squeeze-and-Excitation (SE) layers with a global context. The decoder module uses an attentive temporal pooling mechanism to get fixed length time-independent feature representation. The total number of parameters in the model is around 22.1 M, which is relatively light compared to using some large-scale pre-trained speech models. We achieved an EER of 15.6 closed track and 11.1 curated additional LangId data from YouTube videos (having Singaporean speakers), which will be released for public use.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/18/2020

Multi-Encoder-Decoder Transformer for Code-Switching Speech Recognition

Code-switching (CS) occurs when a speaker alternates words of two or mor...
research
10/27/2022

AmberNet: A Compact End-to-End Model for Spoken Language Identification

We present AmberNet, a compact end-to-end neural network for Spoken Lang...
research
10/14/2020

Exploiting Spectral Augmentation for Code-Switched Spoken Language Identification

Spoken language Identification (LID) systems are needed to identify the ...
research
05/31/2021

Low-Resource Spoken Language Identification Using Self-Attentive Pooling and Deep 1D Time-Channel Separable Convolutions

This memo describes NTR/TSU winning submission for Low Resource ASR chal...
research
04/24/2021

Language ID Prediction from Speech Using Self-Attentive Pooling and 1D-Convolutions

This memo describes NTR-TSU submission for SIGTYP 2021 Shared Task on pr...
research
02/24/2022

Attentive Temporal Pooling for Conformer-based Streaming Language Identification in Long-form Speech

In this paper, we introduce a novel language identification system based...
research
08/13/2020

Large-scale Transfer Learning for Low-resource Spoken Language Understanding

End-to-end Spoken Language Understanding (SLU) models are made increasin...

Please sign up or login with your details

Forgot password? Click here to reset