An ASR Guided Speech Intelligibility Measure for TTS Model Selection

06/02/2020
by   Arun Baby, et al.
0

The perceptual quality of neural text-to-speech (TTS) is highly dependent on the choice of the model during training. Selecting the model using a training-objective metric such as the least mean squared error does not always correlate with human perception. In this paper, we propose an objective metric based on the phone error rate (PER) to select the TTS model with the best speech intelligibility. The PER is computed between the input text to the TTS model, and the text decoded from the synthesized speech using an automatic speech recognition (ASR) model, which is trained on the same data as the TTS model. With the help of subjective studies, we show that the TTS model chosen with the least PER on validation split has significantly higher speech intelligibility compared to the model with the least training-objective metric loss. Finally, using the proposed PER and subjective evaluation, we show that the choice of best TTS model depends on the genre of the target domain text. All our experiments are conducted on a Hindi language dataset. However, the proposed model selection method is language independent.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
05/30/2023

Towards Selection of Text-to-speech Data to Augment ASR Training

This paper presents a method for selecting appropriate synthetic speech ...
research
12/16/2022

BLASER: A Text-Free Speech-to-Speech Translation Evaluation Metric

End-to-End speech-to-speech translation (S2ST) is generally evaluated wi...
research
02/26/2022

Integrating Text Inputs For Training and Adapting RNN Transducer ASR Models

Compared to hybrid automatic speech recognition (ASR) systems that use a...
research
07/11/2022

Speaker consistency loss and step-wise optimization for semi-supervised joint training of TTS and ASR using unpaired text data

In this paper, we investigate the semi-supervised joint training of text...
research
10/27/2022

TRScore: A Novel GPT-based Readability Scorer for ASR Segmentation and Punctuation model evaluation and selection

Punctuation and Segmentation are key to readability in Automatic Speech ...
research
10/31/2021

Towards Language Modelling in the Speech Domain Using Sub-word Linguistic Units

Language models (LMs) for text data have been studied extensively for th...
research
12/06/2017

Evaluating the Usability of Automatically Generated Captions for People who are Deaf or Hard of Hearing

The accuracy of Automated Speech Recognition (ASR) technology has improv...

Please sign up or login with your details

Forgot password? Click here to reset