Code-Switched Urdu ASR for Noisy Telephonic Environment using Data Centric Approach with Hybrid HMM and CNN-TDNN

07/24/2023
by   Muhammad Danyal Khan, et al.
0

Call Centers have huge amount of audio data which can be used for achieving valuable business insights and transcription of phone calls is manually tedious task. An effective Automated Speech Recognition system can accurately transcribe these calls for easy search through call history for specific context and content allowing automatic call monitoring, improving QoS through keyword search and sentiment analysis. ASR for Call Center requires more robustness as telephonic environment are generally noisy. Moreover, there are many low-resourced languages that are on verge of extinction which can be preserved with help of Automatic Speech Recognition Technology. Urdu is the 10^th most widely spoken language in the world, with 231,295,440 worldwide still remains a resource constrained language in ASR. Regional call-center conversations operate in local language, with a mix of English numbers and technical terms generally causing a "code-switching" problem. Hence, this paper describes an implementation framework of a resource efficient Automatic Speech Recognition/ Speech to Text System in a noisy call-center environment using Chain Hybrid HMM and CNN-TDNN for Code-Switched Urdu Language. Using Hybrid HMM-DNN approach allowed us to utilize the advantages of Neural Network with less labelled data. Adding CNN with TDNN has shown to work better in noisy environment due to CNN's additional frequency dimension which captures extra information from noisy speech, thus improving accuracy. We collected data from various open sources and labelled some of the unlabelled data after analysing its general context and content from Urdu language as well as from commonly used words from other languages, primarily English and were able to achieve WER of 5.2 well as in continuous spontaneous speech.

READ FULL TEXT

page 6

page 9

page 14

page 23

research
05/18/2023

Making More of Little Data: Improving Low-Resource Automatic Speech Recognition Using Data Augmentation

The performance of automatic speech recognition (ASR) systems has advanc...
research
10/11/2022

Automatic Speech Recognition of Low-Resource Languages Based on Chukchi

The following paper presents a project focused on the research and creat...
research
08/29/2021

Investigations on Speech Recognition Systems for Low-Resource Dialectal Arabic-English Code-Switching Speech

Code-switching (CS), defined as the mixing of languages in conversations...
research
03/31/2022

Effectiveness of text to speech pseudo labels for forced alignment and cross lingual pretrained models for low resource speech recognition

In the recent years end to end (E2E) automatic speech recognition (ASR) ...
research
03/20/2023

Code-Switching Text Generation and Injection in Mandarin-English ASR

Code-switching speech refers to a means of expression by mixing two or m...
research
05/30/2022

Adversarial synthesis based data-augmentation for code-switched spoken language identification

Spoken Language Identification (LID) is an important sub-task of Automat...
research
07/15/2018

Syllabification by Phone Categorization

Syllables play an important role in speech synthesis, speech recognition...

Please sign up or login with your details

Forgot password? Click here to reset