Leveraging neural representations for facilitating access to untranscribed speech from endangered languages

03/26/2021
by   Nay San, et al.
3

For languages with insufficient resources to train speech recognition systems, query-by-example spoken term detection (QbE-STD) offers a way of accessing an untranscribed speech corpus by helping identify regions where spoken query terms occur. Yet retrieval performance can be poor when the query and corpus are spoken by different speakers and produced in different recording conditions. Using data selected from a variety of speakers and recording conditions from 7 Australian Aboriginal languages and a regional variety of Dutch, all of which are endangered or vulnerable, we evaluated whether QbE-STD performance on these languages could be improved by leveraging representations extracted from the pre-trained English wav2vec 2.0 model. Compared to the use of Mel-frequency cepstral coefficients and bottleneck features, we find that representations from the middle layers of the wav2vec 2.0 Transformer offer large gains in task performance (between 56 using the pre-trained English model yielded improved detection on all the evaluation languages, better detection performance was associated with the evaluation language's phonological similarity to English.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/22/2020

Applying wav2vec2.0 to Speech Recognition in various low-resource languages

Several domains own corresponding widely used feature extractors, such a...
research
07/06/2021

Kosp2e: Korean Speech to English Translation Corpus

Most speech-to-text (S2T) translation studies use English speech as a so...
research
01/04/2022

A Hierarchical Model for Spoken Language Recognition

Spoken language recognition (SLR) refers to the automatic process used t...
research
12/12/2021

Learning Nigerian accent embeddings from speech: preliminary results based on SautiDB-Naija corpus

This paper describes foundational efforts with SautiDB-Naija, a novel co...
research
09/09/2022

Overlapped speech and gender detection with WavLM pre-trained features

This article focuses on overlapped speech and gender detection in order ...
research
05/30/2023

Investigating model performance in language identification: beyond simple error statistics

Language development experts need tools that can automatically identify ...
research
09/12/2018

Multimodal neural pronunciation modeling for spoken languages with logographic origin

Graphemes of most languages encode pronunciation, though some are more e...

Please sign up or login with your details

Forgot password? Click here to reset