Semi-supervised acoustic and language model training for English-isiZulu code-switched speech recognition

04/05/2020
by   A. Biswas, et al.
0

We present an analysis of semi-supervised acoustic and language model training for English-isiZulu code-switched ASR using soap opera speech. Approximately 11 hours of untranscribed multilingual speech was transcribed automatically using four bilingual code-switching transcription systems operating in English-isiZulu, English-isiXhosa, English-Setswana and English-Sesotho. These transcriptions were incorporated into the acoustic and language model training sets. Results showed that the TDNN-F acoustic models benefit from the additional semi-supervised data and that even better performance could be achieved by including additional CNN layers. Using these CNN-TDNN-F acoustic models, a first iteration of semi-supervised training achieved an absolute mixed-language WER reduction of 3.4 after a second iteration. Although the languages in the untranscribed data were unknown, the best results were obtained when all automatically transcribed data was used for training and not just the utterances classified as English-isiZulu. Despite reducing perplexity, the semi-supervised language model was not able to improve the ASR performance.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
06/20/2019

Semi-supervised acoustic model training for five-lingual code-switched ASR

This paper presents recent progress in the acoustic modelling of under-r...
research
07/06/2019

Improved low-resource Somali speech recognition by semi-supervised acoustic and language model training

We present improvements in automatic speech recognition (ASR) for Somali...
research
06/16/2018

Study of Semi-supervised Approaches to Improving English-Mandarin Code-Switching Speech Recognition

In this paper, we present our overall efforts to improve the performance...
research
03/06/2020

Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced Languages

This paper reports on the semi-supervised development of acoustic and la...
research
02/26/2019

Directional Embedding Based Semi-supervised Framework For Bird Vocalization Segmentation

This paper proposes a data-efficient, semi-supervised, two-pass framewor...
research
05/30/2019

Lattice-based lightly-supervised acoustic model training

In the broadcast domain there is an abundance of related text data and p...
research
09/29/2020

Improving Device Directedness Classification of Utterances with Semantic Lexical Features

User interactions with personal assistants like Alexa, Google Home and S...

Please sign up or login with your details

Forgot password? Click here to reset