Enhancing Speech Intelligibility in Text-To-Speech Synthesis using Speaking Style Conversion

08/13/2020
by   Dipjyoti Paul, et al.
0

The increased adoption of digital assistants makes text-to-speech (TTS) synthesis systems an indispensable feature of modern mobile devices. It is hence desirable to build a system capable of generating highly intelligible speech in the presence of noise. Past studies have investigated style conversion in TTS synthesis, yet degraded synthesized quality often leads to worse intelligibility. To overcome such limitations, we proposed a novel transfer learning approach using Tacotron and WaveRNN based TTS synthesis. The proposed speech system exploits two modification strategies: (a) Lombard speaking style data and (b) Spectral Shaping and Dynamic Range Compression (SSDRC) which has been shown to provide high intelligibility gains by redistributing the signal energy on the time-frequency domain. We refer to this extension as Lombard-SSDRC TTS system. Intelligibility enhancement as quantified by the Intelligibility in Bits (SIIB-Gauss) measure shows that the proposed Lombard-SSDRC TTS system shows significant relative improvement between 110 competing-speaker noise (CSN) against the state-of-the-art TTS approach. Additional subjective evaluation shows that Lombard-SSDRC TTS successfully increases the speech intelligibility with relative improvement of 455 and 104 method.

READ FULL TEXT

page 1

page 2

page 3

page 4

research
12/29/2022

StyleTTS-VC: One-Shot Voice Conversion by Knowledge Transfer from Style-Based TTS Models

One-shot voice conversion (VC) aims to convert speech from any source sp...
research
08/17/2017

An instrumental intelligibility metric based on information theory

We propose a new monaural intrusive instrumental intelligibility metric ...
research
03/23/2018

Style Tokens: Unsupervised Style Modeling, Control and Transfer in End-to-End Speech Synthesis

In this work, we propose "global style tokens" (GSTs), a bank of embeddi...
research
07/25/2022

Transplantation of Conversational Speaking Style with Interjections in Sequence-to-Sequence Speech Synthesis

Sequence-to-Sequence Text-to-Speech architectures that directly generate...
research
01/13/2021

Whispered and Lombard Neural Speech Synthesis

It is desirable for a text-to-speech system to take into account the env...
research
03/20/2022

Vocal effort modeling in neural TTS for improving the intelligibility of synthetic speech in noise

We present a neural text-to-speech (TTS) method that models natural voca...
research
07/04/2017

Hidden-Markov-Model Based Speech Enhancement

The goal of this contribution is to use a parametric speech synthesis sy...

Please sign up or login with your details

Forgot password? Click here to reset