STRATA: Word Boundaries Phoneme Recognition From Continuous Urdu Speech using Transfer Learning, Attention, Data Augmentation

04/16/2022
by   Saad Naeem, et al.
0

Phoneme recognition is a largely unsolved problem in NLP, especially for low-resource languages like Urdu. The systems that try to extract the phonemes from audio speech require hand-labeled phonetic transcriptions. This requires expert linguists to annotate speech data with its relevant phonetic representation which is both an expensive and a tedious task. In this paper, we propose STRATA, a framework for supervised phoneme recognition that overcomes the data scarcity issue for low resource languages using a seq2seq neural architecture integrated with transfer learning, attention mechanism, and data augmentation. STRATA employs transfer learning to reduce the network loss in half. It uses attention mechanism for word boundaries and frame alignment detection which further reduces the network loss by 4 the word boundaries with 92.2 techniques to further reduce the loss by 1.5 signals both in terms of generalization and accuracy. STRATA is able to achieve a Phoneme Error Rate of 16.5 for TIMIT dataset (English) and 11.5

READ FULL TEXT

page 4

page 7

research
11/19/2021

Semi-supervised transfer learning for language expansion of end-to-end speech recognition models to low-resource languages

In this paper, we propose a three-stage training methodology to improve ...
research
07/14/2022

Data Augmentation for Low-Resource Quechua ASR Improvement

Automatic Speech Recognition (ASR) is a key element in new services that...
research
04/08/2020

Transfer learning and subword sampling for asymmetric-resource one-to-many neural translation

There are several approaches for improving neural machine translation fo...
research
10/25/2022

This joke is [MASK]: Recognizing Humor and Offense with Prompting

Humor is a magnetic component in everyday human interactions and communi...
research
05/24/2021

Unsupervised Speech Recognition

Despite rapid progress in the recent past, current speech recognition sy...
research
02/12/2022

Wav2Vec2.0 on the Edge: Performance Evaluation

Wav2Vec2.0 is a state-of-the-art model which learns speech representatio...
research
02/25/2020

Towards Learning a Universal Non-Semantic Representation of Speech

The ultimate goal of transfer learning is to reduce labeled data require...

Please sign up or login with your details

Forgot password? Click here to reset