Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training

07/06/2019
by   Astik Biswas, et al.
0

We present improvements in automatic speech recognition (ASR) for Somali, a currently extremely under-resourced language. This forms part of a continuing United Nations (UN) effort to employ ASR-based keyword spotting systems to support humanitarian relief programmes in rural Africa. Using just 1.57 hours of annotated speech data as a seed corpus, we increase the pool of training data by applying semi-supervised training to 17.55 hours of untranscribed speech. We make use of factorised time-delay neural networks (TDNN-F) for acoustic modelling, since these have recently been shown to be effective in resource-scarce situations. Three semi-supervised training passes were performed, where the decoded output from each pass was used for acoustic model training in the subsequent pass. The automatic transcriptions from the best performing pass were used for language model augmentation. To ensure the quality of automatic transcriptions, decoder confidence is used as a threshold. The acoustic and language models obtained from the semi-supervised approach show significant improvement in terms of WER and perplexity compared to the baseline. Incorporating the automatically generated transcriptions yields a 6.55% improvement in language model perplexity. The use of 17.55 hour of Somali acoustic data in semi-supervised training shows an improvement of 7.74% relative over the baseline.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
04/05/2020

Semi-supervised acoustic and language model training for English-isiZulu code-switched speech recognition

We present an analysis of semi-supervised acoustic and language model tr...
research
02/26/2019

Directional Embedding Based Semi-supervised Framework For Bird Vocalization Segmentation

This paper proposes a data-efficient, semi-supervised, two-pass framewor...
research
10/02/2018

Semi-supervised and Active-learning Scenarios: Efficient Acoustic Model Refinement for a Low Resource Indian Language

We address the problem of efficient acoustic-model refinement (continuou...
research
03/02/2020

Semi-supervised learning of glottal pulse positions in a neural analysis-synthesis framework

This article investigates into recently emerging approaches that use dee...
research
03/07/2019

Active and Semi-Supervised Learning in ASR: Benefits on the Acoustic and Language Models

The goal of this paper is to simulate the benefits of jointly applying a...
research
02/24/2020

Semi-Supervised Speech Recognition via Local Prior Matching

For sequence transduction tasks like speech recognition, a strong struct...
research
10/03/2022

Efficient acoustic feature transformation in mismatched environments using a Guided-GAN

We propose a new framework to improve automatic speech recognition (ASR)...

Please sign up or login with your details

Forgot password? Click here to reset