Intermediate Fine-Tuning Using Imperfect Synthetic Speech for Improving Electrolaryngeal Speech Recognition

11/02/2022
by   Lester Phillip Violeta, et al.
0

Research on automatic speech recognition (ASR) systems for electrolaryngeal speakers has been relatively unexplored due to small datasets. When training data is lacking in ASR, a large-scale pretraining and fine tuning framework is often sufficient to achieve high recognition rates; however, in electrolaryngeal speech, the domain shift between the pretraining and fine-tuning data is too large to overcome, limiting the maximum improvement of recognition rates. To resolve this, we propose an intermediate fine-tuning step that uses imperfect synthetic speech to close the domain shift gap between the pretraining and target data. Despite the imperfect synthetic data, we show the effectiveness of this on electrolaryngeal speech datasets, with improvements of 6.1 show how the intermediate fine-tuning stage focuses on learning the high-level inherent features of the imperfect synthetic data rather than the low-level features such as intelligibility.

READ FULL TEXT
research
06/15/2022

Exploring Capabilities of Monolingual Audio Transformers using Large Datasets in Automatic Speech Recognition of Czech

In this paper, we present our progress in pretraining Czech monolingual ...
research
01/30/2021

Speech Recognition by Simply Fine-tuning BERT

We propose a simple method for automatic speech recognition (ASR) by fin...
research
09/14/2021

Residual Adapters for Parameter-Efficient ASR Adaptation to Atypical and Accented Speech

Automatic Speech Recognition (ASR) systems are often optimized to work b...
research
07/29/2022

Domain Specific Wav2vec 2.0 Fine-tuning For The SE R 2022 Challenge

This paper presents our efforts to build a robust ASR model for the shar...
research
10/21/2021

Synt++: Utilizing Imperfect Synthetic Data to Improve Speech Recognition

With recent advances in speech synthesis, synthetic data is becoming a v...
research
09/18/2023

Electrolaryngeal Speech Intelligibility Enhancement Through Robust Linguistic Encoders

We propose a novel framework for electrolaryngeal speech intelligibility...
research
09/11/2018

Unbiasing Semantic Segmentation For Robot Perception using Synthetic Data Feature Transfer

Robot perception systems need to perform reliable image segmentation in ...

Please sign up or login with your details

Forgot password? Click here to reset