Techniques and Challenges in Speech Synthesis

09/22/2017
by   David Ferris, et al.
0

The aim of this project was to develop and implement an English language Text-to-Speech synthesis system. This involved a study of mechanisms of human speech production, a review of techniques in speech synthesis, and analysis of tests used to evaluate the effectiveness of synthesized speech. It was determined that a diphone synthesis system was the most effective choice for the scope of this project. A method of automatically identifying and extracting diphones from prompted speech was designed, allowing for the creation of a diphone database by a speaker in less than 40 minutes. CMUdict was used to determine the pronunciation of known words. A system for smoothing the transitions between diphone recordings was designed and implemented. CMUdict was then used to train a maximum-likelihood prediction system to determine the correct pronunciation of unknown English language alphabetic words. Then, a Part Of Speech tagger was designed to find the lexical class of words within a sentence. A method of altering the pitch, duration, and volume of the produced voice over time was designed, being a combination of the TD-PSOLA algorithm and a novel approach referred to as Unvoiced Speech Duration Shifting. This minimises distortion of the voice when shifting the pitch or duration, while maximising computational efficiency by operating in the time domain. This approach was used to add correct lexical stress to vowels within words. A text tokenisation system was developed to handle arbitrary text input, allowing pronunciation of numerical input tokens and use of appropriate pauses for punctuation. Methods for further improving sentence level speech naturalness were discussed. Finally, the system was tested with listeners for its intelligibility and naturalness.

READ FULL TEXT

page 16

page 19

page 20

page 23

page 27

page 30

page 31

page 32

research
06/29/2016

Penambahan emosi menggunakan metode manipulasi prosodi untuk sistem text to speech bahasa Indonesia

Adding an emotions using prosody manipulation method for Indonesian text...
research
07/13/2023

Controllable Emphasis with zero data for text-to-speech

We present a scalable method to produce high quality emphasis for text-t...
research
09/25/2019

Speech Recognition with Augmented Synthesized Speech

Recent success of the Tacotron speech synthesis architecture and its var...
research
11/17/2021

Rapping-Singing Voice Synthesis based on Phoneme-level Prosody Control

In this paper, a text-to-rapping/singing system is introduced, which can...
research
09/25/2022

Neural inhibition during speech planning contributes to contrastive hyperarticulation

Previous work has demonstrated that words are hyperarticulated on dimens...
research
10/14/2022

Empirical Study Incorporating Linguistic Knowledge on Filled Pauses for Personalized Spontaneous Speech Synthesis

We present a comprehensive empirical study for personalized spontaneous ...
research
11/07/2020

Naturalization of Text by the Insertion of Pauses and Filler Words

In this article, we introduce a set of methods to naturalize text based ...

Please sign up or login with your details

Forgot password? Click here to reset