Semi-supervised acoustic model training for speech with code-switching

10/23/2018
by   Emre Yilmaz, et al.
0

In the FAME! project, we aim to develop an automatic speech recognition (ASR) system for Frisian-Dutch code-switching (CS) speech extracted from the archives of a local broadcaster with the ultimate goal of building a spoken document retrieval system. Unlike Dutch, Frisian is a low-resourced language with a very limited amount of manually annotated speech data. In this paper, we describe several automatic annotation approaches to enable using of a large amount of raw bilingual broadcast data for acoustic model training in a semi-supervised setting. Previously, it has been shown that the best-performing ASR system is obtained by two-stage multilingual deep neural network (DNN) training using 11 hours of manually annotated CS speech (reference) data together with speech data from other high-resourced languages. We compare the quality of transcriptions provided by this bilingual ASR system with several other approaches that use a language recognition system for assigning language labels to raw speech segments at the front-end and using monolingual ASR resources for transcription. We further investigate automatic annotation of the speakers appearing in the raw broadcast data by first labeling with (pseudo) speaker tags using a speaker diarization system and then linking to the known speakers appearing in the reference data using a speaker recognition system. These speaker labels are essential for speaker-adaptive training in the proposed setting. We train acoustic models using the manually and automatically annotated data and run recognition experiments on the development and test data of the FAME! speech corpus to quantify the quality of the automatic annotations. The ASR and CS detection results demonstrate the potential of using automatic language and speaker tagging in semi-supervised bilingual acoustic model training.

READ FULL TEXT

page 10

page 11

research
04/08/2020

Semi-supervised acoustic modelling for five-lingual code-switched ASR using automatically-segmented soap opera speech

This paper considers the impact of automatic segmentation on the fully-a...
research
07/28/2018

Acoustic and Textual Data Augmentation for Improved ASR of Code-Switching Speech

In this paper, we describe several techniques for improving the acoustic...
research
05/29/2020

Improving Unsupervised Sparsespeech Acoustic Models with Categorical Reparameterization

The Sparsespeech model is an unsupervised acoustic model that can genera...
research
01/23/2017

Characterisation of speech diversity using self-organising maps

We report investigations into speaker classification of larger quantitie...
research
03/06/2020

Semi-supervised Development of ASR Systems for Multilingual Code-switched Speech in Under-resourced Languages

This paper reports on the semi-supervised development of acoustic and la...
research
07/11/2022

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data

In this paper, we investigate the semi-supervised joint training of text...
research
06/19/2019

Large-Scale Speaker Diarization of Radio Broadcast Archives

This paper describes our initial efforts to build a large-scale speaker ...

Please sign up or login with your details

Forgot password? Click here to reset