Building a Unified Code-Switching ASR System for South African Languages

07/28/2018
by   Emre Yilmaz, et al.
0

We present our first efforts towards building a single multilingual automatic speech recognition (ASR) system that can process code-switching (CS) speech in five languages spoken within the same population. This contrasts with related prior work which focuses on the recognition of CS speech in bilingual scenarios. Recently, we have compiled a small five-language corpus of South African soap opera speech which contains examples of CS between 5 languages occurring in various contexts such as using English as the matrix language and switching to other indigenous languages. The ASR system presented in this work is trained on 4 corpora containing English-isiZulu, English-isiXhosa, English-Setswana and English-Sesotho CS speech. The interpolation of multiple language models trained on these language pairs enables the ASR system to hypothesize mixed word sequences from these 5 languages. We evaluate various state-of-the-art acoustic models trained on this 5-lingual training data and report ASR accuracy and language recognition performance on the development and test sets of the South African multilingual soap opera corpus.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
10/31/2020

Multilingual Bottleneck Features for Improving ASR Performance of Code-Switched Speech in Under-Resourced Languages

In this work, we explore the benefits of using multilingual bottleneck f...
research
09/24/2018

Hindi-English Code-Switching Speech Corpus

Code-switching refers to the usage of two languages within a sentence or...
research
10/07/2021

Mandarin-English Code-switching Speech Recognition with Self-supervised Speech Representation Models

Code-switching (CS) is common in daily conversations where more than one...
research
05/01/2020

Style Variation as a Vantage Point for Code-Switching

Code-Switching (CS) is a common phenomenon observed in several bilingual...
research
12/10/2022

Punctuation Restoration for Singaporean Spoken Languages: English, Malay, and Mandarin

This paper presents the work of restoring punctuation for ASR transcript...
research
10/26/2022

Reducing Language confusion for Code-switching Speech Recognition with Token-level Language Diarization

Code-switching (CS) refers to the phenomenon that languages switch withi...
research
07/12/2020

The ASRU 2019 Mandarin-English Code-Switching Speech Recognition Challenge: Open Datasets, Tracks, Methods and Results

Code-switching (CS) is a common phenomenon and recognizing CS speech is ...

Please sign up or login with your details

Forgot password? Click here to reset